Back to the Flavius Josephus Home Page         Previous Page        Next Page
 

 

Quantitative Content Analysis of Jesus Texts

 

        Here I shall analyze the set of representative texts dating from early Christianity. While each of the chosen passages is a short description of Jesus, they differ in the events and attributes they choose to represent. By quantifying the similarities and differences in content one can better judge the significance of the correspondences I have shown to hold between Luke's Emmaus passage and the Testimonium Flavianum found in Josephus' Antiquities.
 
 

    The methodology I have chosen is as follows. Let us identify all the important themes presented in all the representative texts. We put these in a list, the "content set." Each text can be regarded as containing a subset of the full content set. The overlap of these subsets between pairs of texts can then be quantified. In this way, the Christian texts most similar to the Testimonium Flavianum of Josephus' Antiquities can be identified.
 
    For example, the Testimonium content could be broken out in this way:
 

    [Jesus] [wise man] [surprising][deeds] [teacher] [truth before God] [many people] [he was indicted] [by leaders] [of us] [sentenced to cross] [those who had him] [spending the third day] [he appeared to them] [prophets] [these things] [and numerous other things] [about him]

 
    Compare this with the excerpt from Justin Martyr, First Apology 31 (written some 50 years after the Antiquities):
 

    In the books of the prophets we find it announced beforehand that Jesus our Christ would appear, be born through a virgin, grow up, heal every disease and sickness and raise the dead, and be despised and unrecognized and crucified and die and be raised and ascend to the heavens and be called the Son of God, and that some would be sent by him to every nation, and that the Gentiles would believe.
    This shares some content with the Testimonium: the crucifixion, the prophets (but out of order, at the beginning rather than the end), being "raised," deeds and perhaps disciples. But there are also are important items in the Testimonium that are not included by Justin: teaching, explicit accusation, explicit sentencing, the leaders of the community, the third day. Conversely, there are elements in Justin that have no analogue in the Testimonium: virgin birth, healing of illness, lack of recognition, ascent to the heavens, called the Son of God, and sending out apostles to every nation. So there are both positive and negative correspondences that can be identified in comparing the Testimonium and Justin.

    One can perform this content analysis on all of the representative texts.  Some subjectivity is necessarily involved in this breakdown, but I believe the following table is a fair representation of the texts. I've ordered the rows so that the Testimonium content elements appear first, in black, followed by other members of the content set in red.
 
 

1 Co 15 
Mark 10 
Acts 2 
Acts 3 
Acts 5 
Acts 10 
Acts 13 
L 
TF 
Ign 
Justin  
Old Rom. Creed 
man  
X 
X 
       
X 
X 
X 
   
more than man          
X 
   
X 
     
deeds    
X 
   
X 
 
X 
X 
     
teacher          
X 
(X) 
(X) 
X 
     
truth                
X 
     
Jewish disciples  
X 
     
X 
 
(X) 
X 
     
Greek disciples              
(X) 
X 
     
Messiah
X 
 
X 
X 
     
X 
X 
 
X 
X 
accusation      
X 
   
X 
X 
X 
     
principal men  
X 
 
X 
   
X 
X 
X 
     
Pilate  
(X) 
 
X 
   
X 
 
X 
X 
 
X 
sentence              
X 
X 
     
cross, tree, killed
(X)  
X 
X 
 
X 
X 
X 
X 
X 
X 
X 
X 
witnesses are first to love          
(X) 
(X) 
(X) 
X 
     
appeared to witnesses
X 
 
X 
X 
(X) 
X 
X 
(X) 
X 
     
third day
X 
X 
     
X 
 
X 
X 
   
X 
prophets
(X) 
 
X 
X 
 
X 
X 
X 
X 
 
X 
 
foretold this
X 
(X) 
X 
X 
   
X 
X 
X 
 
X 
 
foretold other marvels    
X 
X 
 
X 
X 
X 
X 
 
X 
 
Christians                
X 
     
God's plan/work    
X 
X 
 
X 
X 
         
God raised
X 
 
X 
X 
X 
X 
X 
   
X 
X 
 
unwarranted death    
X 
X 
   
X 
         
of David's seed    
X 
     
X 
   
X 
   
Psalms quoted    
X 
     
X 
         
Prophets, etc quoted      
X 
   
X 
         
on right hand of God    
X 
 
X 
           
X 
ascends to, waits in Heaven      
X 
   
(X) 
     
X 
X 
received Holy Spirit    
X 
   
X 
         
(X) 
non-leaders involved  
X 
 
X 
   
X 
         
forgiveness of sins, savior        
X 
X 
X 
(X) 
 
X 
 
X 
John the Baptist          
X 
X 
         
healings, specific deeds          
X 
       
X 
 
apostolic mission          
X 
(X) 
     
X 
 
tomb
X 
 
X 
     
X 
X 
       
virgin birth, Son of God    
X 
             
X 
X 
come to judge, resurrect                      
X 
 

        In this table, an X indicates the presence of the content element; when in parenthesis, the (X) shows uncertainty. A conservative reading would exclude all these questionable cases, while a liberal reading would include all of them. It turns out the conclusions are not affected by this choice, so in the quantitative analysis I will take the liberal reading.

    One way to use this table is to count the correspondences of each text with the Testimonium Flavianum (TF). These are given in the following table.
 

 

Positive  Negative  Neg in TF Corr Coeff
Co
6 
2 
14 
0.22 
M
6 
1 
14 
0.31 
Acts2
8 
9 
12 
-0.13 
Acts3
8 
6 
12 
0.05 
Acts5
2 
3 
18 
-0.11 
Acts10
10 
7 
10 
0.09 
Acts13
10 
12 
10 
-0.21 
L
16 
2 
4 
0.68 
Ig
3 
3 
17 
-0.04 
Jus
5 
5 
15 
-0.05 
ORC
4 
6 
16 
-0.17 
average
7.1 
5.1 
12.9 
0.06 
st. dev.
4.0 
3.4 
4.0 
0.26 
average without Luke 
6.2
5.4
13.8
-0.04
st. dev. without Luke
2.8
3.4
2.8
0.17
 

        The "Positive" column shows the number of content elements that appear in both the TF and in the given text. The "Negative" column gives the number of content elements that appear in the given text but not in the TF.  The "Neg in TF" column shows the number of elements that appear in the TF but NOT in the indicated text. (Since the TF is hear shown to have 20 elements, the number is just 20 minus the first column.) The sample average and standard deviation are shown in the bottom row.

        In these counts, the Luke text stands out. Even though the sample is small, we can conclude from these numbers that the Luke text is correlated to the TF to a highly significant degree. The statistical reasoning is as follows.The 10 texts other than Luke are assumed to represent a typical distribution of descriptions about Jesus, as described previously. The sample without Luke has a mean number of positive correspondences with the TF is 6.2, the standard deviation is 2.8. Luke, with 16 elements, is more than 3.5 standard deviations away from the mean of X, which, if this were a large enough sample, would be significant to a 99% confidence level. For this  small sample one should instead use Student's t-distribution with 9 degrees of freedom. The t-statistic for comparing Luke is  (16 - X) / (S*sqrt(11/9)) = 3.2,  which is still significant at slightly below the 99% level. Even though the sample is fairly small, it is large enough so that the large deviation of Luke is  still highly significant.

     For negative correlations only, Luke is almost the lowest (next to Mark), at about 1 standard deviation.This alone is not enough to make a quantitative conclusion.

    The last column shows the correlation coefficient. This combines the positive and negative correspondences into a single number. It is calculated by representing each text by a string of 1's and 0's. All of the content elements are regarded as forming an ordered array, and then for each text a 1 is placed where the text has the content and a 0 if not. (In the table, p; then the column is the number assigned to the text.) The correlation of each text with the TF is then calculated by finding the correlation between this string of 1's and 0's and the TF's string (which is of the form 111.1000).

        The reader can perform the calculation by copying the first table into an Excel spreadsheet and putting a 1 where an X is and a 0 where a blank is; then use the CORREL function to find the correlation coefficient between any pair of columns.

        The resulting correlation coefficient is shown in the last column of the table. This number can be between -1 (completely opposite element content to the TF) and 1 (total agreement with the TF). If the texts were randomly created, one would expect the average correlation with the TF to be zero.

        We see again that the Luke passage stands out, in fact, its distinctiveness is heightened from the  combining of both the positive and negative correspondences. Luke has a correlation of 0.68 with the TF, which is 4 standard deviations away from the mean of the ex-Luke sample, significant at the 99% level when using the small-sample t-statistic. This analysis indicates the relation between Luke and the TF is very different from that of any other of our representative Jesus accounts.

    This reasoning is not complete, as it looks only at the relations of texts to the TF. In any collection of texts, there will be one that is closer to a given text than any other; we want to know if the degree of closeness for the "closest text" shown by L to TF is unusual when other texts are similarly examined. That is, in any collection of texts, two of them can happen to be close to each other by chance. Perhaps this is all that happened with TF and Luke. What should really be done is to find the correlation between ANY pair of texts, and then judge whether the Luke-TF relation is statistically significant in this set of correlations.

    The next page summarizes this more complete analysis.

                Previous Page        Next Page