Freethought & Rationalism Archive

DCHindley · 02-08-2009, 08:20 AM

Roger & Andrew,

Now I have not closely scoured the literature that statistcally compares the letter to Theodore to the known Clementine corpus, so perhaps I missed it, but one thing that bothers me about the comparisons I have seen to date is that, AFAICT, no one has applied the insights that have come from statistical analysis of individual letters of the Pauline corpus to the corpus as a whole, and related studies on other ancient and modern collections.

From what I have read of stylostatistical analysis of Pauline letters, and research on other historical collections, the matter is definitely NOT as simple as a comparison of vocabulary, including a few ratios of haplax legomena, etc.

Usually a statistially significant result, even under controlled conditions that far exceed what we have to work with here, requires analysis of multiple comparators (but not always all of those available, just a selection). Even the Pauline corpus falls short, making statistical analysis of style inconclusive. What is more, under controlled conditions, the selection of comparators that work with one corpus are often quite different than the selection that works with another.

In other words, the science of stylostatistics is not quite advanced enough to be of practical use unless certain very specific conditions apply, and even then there are factors at work we have yet to understand before any sort of universal set of tests can be established.

Has anyone brought up works like Kenneth Neumann's The Authenticity of the Pauline Epistles in the Light of Stylostatistical Analysis (Scholars, 1990) or Anthony Kenny's The Computation of Style (Pergamon, 1982)?

DCH

PS: Yes, I had read through Andrews earlier study, and have also read some other works on stylostatistical analysis in general (A Q Morton, etc), but admit to no special knowledge of statistics, although I am an auditor of financial records by trade (not an accountant, though).

Quote:

Originally Posted by Roger Viklund

Thanks for your clarification Andrew. It could of course be that your conclusions are correct. There are after all three subjects to be dealt with.

First of all there is the pure calculation which is not especially advanced. It says that Clement’s letter deviate 3.6 times from what to be expected (5/8 divided by 9/4 = 20/72 = 1/3.6). There are either 3.6 times as few words previously never used (would be 14 or 15 instead of 4) or 3.6 times as many words previously used only once (would be 2 or 3 instead of 9) or a combination of these two elements, i.e. there should perhaps be twice as many words previously never used (i.e. 8) and at the same time barely half as many previously used only once (i.e. 5).

Secondly, there is the input. Are all the figures that are put into the calculation correct? I.e. are we really to expect a relation of 5 – 8? Are Stählin’s figures accurate? Has he been able to isolate all expressions which Clement takes from the Bible or from other sources but does not quote, just incorporate into his expressions? Have you been able to isolate all non-Clement expressions in the letter (apart from the quotations from SM)? How are we to decide if a word actually is a word? You and Smith obviously disagreed and words are not figures. It is impossible for instance to compare how many words there are in different languages. The problem is that words have different forms and often are combined with one another and it is difficult to isolate them in their primary form.

Thirdly, there is the relevance of the result. How can we really adapt these figures on at short letter, even if the figures are correct? As far as I know, we have no letter of Clement (apart from this one – correct me if I’m wrong). All the rest is just material from books. Can a letter be compared to a book? Further, can a short text be seen as representative, and how much deviation could be seen as a normal variation? Everyone should realize that in a text of only 50 words there could be 2 words previously used only once and only one or perhaps no word at all previously never used, without it being any suspicious. But how are we to deal with a short text that still is ten times as long? We cannot say anything about the 4 words previously never used and why no other words previously never used, do not occur. Bur what about the 9 words previously used only once? Could the context require that these words were used again?

I find many difficulties with using this method to evaluate authenticity on this letter.

Kindly, Roger

Roger Viklund · 02-08-2009, 08:52 AM

Quote:

Originally Posted by DCHindley

Roger & Andrew,

Now I have not closely scoured the literature that statistcally compares the letter to Theodore to the known Clementine corpus, so perhaps I missed it, but one thing that bothers me about the comparisons I have seen to date is that, AFAICT, no one has applied the insights that have come from statistical analysis of individual letters of the Pauline corpus to the corpus as a whole, and related studies on other ancient and modern collections.

From what I have read of stylostatistical analysis of Pauline letters, and research on other historical collections, the matter is definitely NOT as simple as a comparison of vocabulary, including a few ratios of haplax legomena, etc.

Usually a statistially significant result, even under controlled conditions that far exceed what we have to work with here, requires analysis of multiple comparators (but not always all of those available, just a selection). Even the Pauline corpus falls short, making statistical analysis of style inconclusive. What is more, under controlled conditions, the selection of comparators that work with one corpus are often quite different than the selection that works with another.

In other words, the science of stylostatistics is not quite advanced enough to be of practical use unless certain very specific conditions apply, and even then there are factors at work we have yet to understand before any sort of universal set of tests can be established.

Has anyone brought up works like Kenneth Neumann's The Authenticity of the Pauline Epistles in the Light of Stylostatistical Analysis (Scholars, 1990) or Anthony Kenny's The Computation of Style (Pergamon, 1982)?

Hi David! (it is David, isn’t it?)

Walter M. Shandruk has in a blog post named Statistics and Hapax Legomena in the Mar Saba Letter dealt with Andrew's statistical analysis. In that he compared both the “authentic” letters and the pseudepigraphical ones with the Mara Saba letter using the same criteria as Andrew has. When it came to the Paulines he found no real difference between the “authentic” and the pseudepigraphical letters. He writes:

Quote:

“For the Pauline corpus as a whole the expected ratio of new words to reused is about 1.8/1, which the authentic letters, taken on average, overshoot, while the pseudo letters match reasonably well. This only made me more suspicious of the ratio’s utility for sorting out issues of authorship.”

Yuri Kuchinsky · 02-08-2009, 04:37 PM

I would like to ask a general question now about Andrew Criddle's study.

(A. H. Criddle, "On the Mar Saba Letter Attributed to Clement of Alexandria," _Journal of Early Christian Studies_ 2,3 (Summer 1995) 215-220.)

Andrew performed all these intricate statistical calculations, and on this basis derived that there's this small chance that Clement's letter is an imitation. Now, assuming that Andrew's data is all correct, and that all his statistical calculations are likewise all valid, how can we quantify his results?

Would it then be accurate to say that Andrew's study demonstrated that there's about 10% chance that Clement's letter isn't really by Clement? Is this about right?

So this question is really meant to establish the wider significance of Andrew's study, i.e. how much weight can really be given to it in determining the authorship of Mar Saba MS.

All the best,

Yuri.

Roger Viklund · 02-09-2009, 06:52 AM

Quote:

Originally Posted by Yuri Kuchinsky

Would it then be accurate to say that Andrew's study demonstrated that there's about 10% chance that Clement's letter isn't really by Clement? Is this about right?

Hi Yuri!

I would say that this is impossible to calculate. It needs to be tested empirically. If we are tossing a dice, then we can count probabilities and deviations. But I see no way of calculating how likely it would be to have a 3.6 times deviation from a hypothetical relation between the usage of words previously never used and those previously used only once, in a letter of less than 600 words and less than 300 unique words. We can only speculate. The reason is of course that the author did not choose his words by chance, but consciously (or unconsciously, but still in processed beyond our possibility to establish) in order to make a point.

Kindly, Roger

Yuri Kuchinsky · 02-09-2009, 09:17 AM

Quote:

Originally Posted by Roger Viklund

Hi Yuri!

I would say that this is impossible to calculate. It needs to be tested empirically.

Roger, have you read post #56 in this thread?

http://www.freeratio.org/showthread....68#post5777868

That was a long quote from Scott Brown on Criddle, and in particular mentioning that his statistical methodology was tested on Shakespeare’s writings and "shown to be unreliable in determining authorship". But Andrew disputed what Brown said.

All the best,

Yuri.

Roger Viklund · 02-09-2009, 09:50 AM

Quote:

Originally Posted by Yuri Kuchinsky

Roger, have you read post #56 in this thread?

http://www.freeratio.org/showthread....68#post5777868

That was a long quote from Scott Brown on Criddle, and in particular mentioning that his statistical methodology was tested on Shakespeare’s writings and "shown to be unreliable in determining authorship". But Andrew disputed what Brown said.

All the best,

Yuri.

Yes Yuri, I’ve read your quote and I had so previously, since I’ve studied Browns book and all but one of his articles thoroughly, as I’ve read most of what Carlson has written and also what Andrew Criddle and what you has written on the subject. I have also written on the subject myself but mostly in Swedish. I find this issue to be very important and I do believe that Secret Mark actually existed and was written before the canonical GMark. But I’m not totally convinced, and the only other really possible scenario that I can see, is that Smith forged the letter. I would be surprised though if he would have managed to accomplish such a “superhuman” task.

Kindly, Roger

Yuri Kuchinsky · 02-09-2009, 11:51 AM

Quote:

Originally Posted by Roger Viklund

Yes Yuri, I’ve read your quote and I had so previously, since I’ve studied Browns book and all but one of his articles thoroughly, as I’ve read most of what Carlson has written and also what Andrew Criddle and what you has written on the subject. I have also written on the subject myself but mostly in Swedish. I find this issue to be very important and I do believe that Secret Mark actually existed and was written before the canonical GMark. But I’m not totally convinced, and the only other really possible scenario that I can see, is that Smith forged the letter. I would be surprised though if he would have managed to accomplish such a “superhuman” task.

Kindly, Roger

Hi, Roger,

Well, it's great to know that you're up to speed on this whole thing...

So then Criddle's methodology _was_ tested empirically by using Shakespeare for comparison, and some problems were found. Whether or not these problems are very serious remains to be determined.

But my original question goes beyond this, because I'm saying that let's assume that there are no problems with Criddle's methodology or calculations, and on this basis estimate the degree to which he manages to clarify the issue of authenticity (a sort of a 'best case scenario').

I assume that the results of any serious scientific study can be quantified, so why not Criddle's results?

So how much weight can we really put on this study?

All the best,

Yuri.

Roger Viklund · 02-09-2009, 12:32 PM

Quote:

Originally Posted by Yuri Kuchinsky

Hi, Roger,

Well, it's great to know that you're up to speed on this whole thing...

So then Criddle's methodology _was_ tested empirically by using Shakespeare for comparison, and some problems were found. Whether or not these problems are very serious remains to be determined.

But my original question goes beyond this, because I'm saying that let's assume that there are no problems with Criddle's methodology or calculations, and on this basis estimate the degree to which he manages to clarify the issue of authenticity (a sort of a 'best case scenario').

I assume that the results of any serious scientific study can be quantified, so why not Criddle's results?

So how much weight can we really put on this study?

All the best,

Yuri.

Well that was what I was trying to answer previously Yuri. There is of course a natural variation and this variation will be larger the shorter the text. If you throw a dice 6000 times you would expect to get the number 6, 1000 times. A variation between let say 960 and 1040 would perhaps seem to be normal and only very seldom would the result be outside this span.

If you throw a dice 600 times you would expect to get number 6, 100 times. A variation between let say 90 and 110 would perhaps seem to be normal and only very seldom would the result be outside this span. In the first place, although a variation of 40 in each direction and much smaller than the variation of 10 in the second example, the percentage of the deviation is yet much smaller (4 % – 10 %).

Now, if you throw the dice just 6 times, what would you say if you got the number 6 four times? Impossible? Of course not! Yet the deviation is 300 %. The smaller the figures, the more variations one would suspect.

The letter of Clement is rather short, especially when all the quotations, SM and other stuff, are eliminated. What we should ask ourselves is how often one could suspect to find deviations like the one we find in this letter in other short letters by other authors when compared to their writings in books? It is impossible to say how likely a 3.6 times deviation in a letter of this size is, by pure calculation. Yet, this is what we have and as I see it, the real question is if it is so unlikely that it should be seen as such a strong element that it outweigh the other, IMO, very strong indications of the letter and the gospel being genuine.

I think Andrew need to show HOW unlikely a scenario of 4/9 compared to 8/5 really is. Because even if it would happen only one time in ten occasions, I still can’t see this as a strong indicator. Because these things happen. What if the relation would have been 8/5, wouldn’t that seem suspicious? I mean, even if Clement would write ten letter of the same length, I doubt that any of them would score exactly 8/5 (even if they combined would show that figure). Too good to be true!

And as I said, I cannot see how it would be possible to calculate a probability of a deviation of 3.6 times in a letter of this size. There are just too many unknowns. The only possible way of doing this would be to use real examples, maybe a hundred letters of the same size, and then compare them to what would be expected from the writings of their books. How many times out of a hundred would the deviation be 3.6 or more?

One should also know that this is not really comparable to the example with the dice, since we are only dealing with words which the author hardly ever used. Any variation in these words will immediately strike as a high-scorer. It’s almost like throwing the dice just 6 times.

And by the way, did you read my long article on the examination of Carlson’s handwriting analysis? http://www.jesusgranskad.se/theodore.htm

Kindly, Roger

DCHindley · 02-09-2009, 02:23 PM

Yes, Robert, I am the author known as David.

What Neumann did was test about 617 stylistic indices by means of discriminant analysis for their effectiveness in classifying samples from seven, and four, authors.

Those indices previously used [by other researchers] which show large difference between [undisputed] Paul and the disputed letters are among the weakest and most ineffective stylistic criteria, e. g., hapax legomena, common conjunctions (kai, de, alla), sentence length, and dependent genitives: therefore, the conclusions are also not reliable.

I don't know what this means WRT the Theodore matter, as there the issue is whether the letter too closely matches Clement's style, although the indices developed might help quantify or debunk the claim. The problem with such a claim is that normally closeness of style is considered an indication of genuineness, not ungenuineness. The matter could be restated as "High correlation in use of vocabulary or matters of style cannot indicate genuineness because the subject matter should not be admitted to have been written by such an author."

Interestingly enough, Neumann can be criticized for allowing preconceived ideas about genuineness of certain Pauline letters to govern their selection as the golden standard for all things Pauline.

Before these indexes can be productively used, they should be tested on samples of writings selected from a range of genres from a variety of known authors, if such works could be identified. The circularity of it all is hard to deny.

DCH

Quote:

Originally Posted by Roger Viklund

Quote:

Originally Posted by DCHindley

Roger & Andrew,

Now I have not closely scoured the literature that statistcally compares the letter to Theodore to the known Clementine corpus, so perhaps I missed it, but one thing that bothers me about the comparisons I have seen to date is that, AFAICT, no one has applied the insights that have come from statistical analysis of individual letters of the Pauline corpus to the corpus as a whole, and related studies on other ancient and modern collections.

From what I have read of stylostatistical analysis of Pauline letters, and research on other historical collections, the matter is definitely NOT as simple as a comparison of vocabulary, including a few ratios of haplax legomena, etc.

Usually a statistially significant result, even under controlled conditions that far exceed what we have to work with here, requires analysis of multiple comparators (but not always all of those available, just a selection). Even the Pauline corpus falls short, making statistical analysis of style inconclusive. What is more, under controlled conditions, the selection of comparators that work with one corpus are often quite different than the selection that works with another.

In other words, the science of stylostatistics is not quite advanced enough to be of practical use unless certain very specific conditions apply, and even then there are factors at work we have yet to understand before any sort of universal set of tests can be established.

Has anyone brought up works like Kenneth Neumann's The Authenticity of the Pauline Epistles in the Light of Stylostatistical Analysis (Scholars, 1990) or Anthony Kenny's The Computation of Style (Pergamon, 1982)?

Hi David! (it is David, isn’t it?)

Walter M. Shandruk has in a blog post named Statistics and Hapax Legomena in the Mar Saba Letter dealt with Andrew's statistical analysis. In that he compared both the “authentic” letters and the pseudepigraphical ones with the Mara Saba letter using the same criteria as Andrew has. When it came to the Paulines he found no real difference between the “authentic” and the pseudepigraphical letters. He writes:

Quote:

“For the Pauline corpus as a whole the expected ratio of new words to reused is about 1.8/1, which the authentic letters, taken on average, overshoot, while the pseudo letters match reasonably well. This only made me more suspicious of the ratio’s utility for sorting out issues of authorship.”

andrewcriddle · 02-09-2009, 02:44 PM

Quote:

Originally Posted by Yuri Kuchinsky

I would like to ask a general question now about Andrew Criddle's study.

(A. H. Criddle, "On the Mar Saba Letter Attributed to Clement of Alexandria," _Journal of Early Christian Studies_ 2,3 (Summer 1995) 215-220.)

Andrew performed all these intricate statistical calculations, and on this basis derived that there's this small chance that Clement's letter is an imitation. Now, assuming that Andrew's data is all correct, and that all his statistical calculations are likewise all valid, how can we quantify his results?

Would it then be accurate to say that Andrew's study demonstrated that there's about 10% chance that Clement's letter isn't really by Clement? Is this about right?

So this question is really meant to establish the wider significance of Andrew's study, i.e. how much weight can really be given to it in determining the authorship of Mar Saba MS.

All the best,

Yuri.

Hi Yuri

You are raising good points IE several rather different points of interest and I shall have to try and answer them one by one.

If we regard the number of new words and words previously used once only in the Mar Saba letter as being a binomial distribution with expected values 8 and 5 and observed values 4 and 9 then the probability of a greater or equal excess of words previously used once, over new words is about one in forty. (This amounts to doing a one-tailed significance test.) Various possible criticisms could be made of this result but it is a straightforward result from the claims and assumptions being made.

However, this is certainly not a claim that the probability that the letter is authentic is only one in forty. In order to make these sort of probability calculations one would need to use something like Bayes theorem and have good estimates of things like the prior probability of a deliberate imitation of Clement's style compared to the prior probability of the discovery of an authentic but totally unsuspected work of Clement copied in a 17th century book, and the probability that a deliberate imitator of Clement's style would exaggerate Clementine features as much as seems to have happened. Unfortunately, we do not have any solid basis for assigning these probabilities. I can make guesses but I doubt if you (or anyone else) would be convinced. We can IMO claim that the vocabulary and other stylistic features of the letter show characteristics which are both unusual enough to intrinsically deserve notice and markedly more compatible with a deliberate imitation than with an authentic letter but I don't think we can really quantify this.

What then is the significance of the claims in my paper ? IMO the most important result is to make very problematic the use of the Clementine stylistic usages listed by Morton Smith as real evidence for authenticity, given that these Clementine characteristics of the letter seem suspiciously too good, more what one would expect in a deliberate imitation than in an authentic work. A claim that there are good reasons for supporting authenticity, quite apart from the strikingly Clementine linguistic features of the letter, would IMO be much less vulnerable to the arguments in my paper. The problem is that if one ignores the linguistic arguments for authenticity then there are IMO few other grounds in its favour.

Andrew Criddle

02-08-2009, 04:37 PM	#123
Yuri Kuchinsky Veteran Member Join Date: Aug 2002 Location: Toronto, Canada Posts: 1,146	The wider significance of Criddle's study I would like to ask a general question now about Andrew Criddle's study. (A. H. Criddle, "On the Mar Saba Letter Attributed to Clement of Alexandria," _Journal of Early Christian Studies_ 2,3 (Summer 1995) 215-220.) Andrew performed all these intricate statistical calculations, and on this basis derived that there's this small chance that Clement's letter is an imitation. Now, assuming that Andrew's data is all correct, and that all his statistical calculations are likewise all valid, how can we quantify his results? Would it then be accurate to say that Andrew's study demonstrated that there's about 10% chance that Clement's letter isn't really by Clement? Is this about right? So this question is really meant to establish the wider significance of Andrew's study, i.e. how much weight can really be given to it in determining the authorship of Mar Saba MS. All the best, Yuri.

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search

Freethought & Rationalism Archive

The archives are read only.