Freethought & Rationalism Archive

Ben C Smith · 09-05-2005, 11:02 AM

For any who can assist....

There has been an online statistical approach to the synoptic problem for some time now, and I am wondering what a mathematical mind might make of the process described there. I myself know very little about statistics. Once the discussion arrives at this point...:

Luke group: 012, 002, 112
Matthew group: 212, 211, 210
Mark group: 222, 221, 220, 122, 022*, 121, 020, 021, 120
Sayings group 1: 200, 201, 202
Sayings group 2: 102

...I feel (at least somewhat) qualified to evaluate the results. But most of it is a blur till then.

So my question is for statistics buffs. Is the process sound (mathematically)? Is there any way of dumbing it down for the likes of me? (If not, I understand....)

Thanks.

Ben.

Peter Kirby · 09-05-2005, 11:23 AM

David Gentile divides the words in the Synoptics into groups (following an existing concordance/synopsis):

Quote:

============

222 - Triple tradition, triple agreement.

Instances of words occurring in a sentence or clause in Mark, and also
occurring in a corresponding parallel sentence or clause in Matthew, and
also occurring in a corresponding parallel sentence or clause in Luke.

On the 2SH these are words authored by Mark and copied by Matthew and Luke.
On the FH these are words authored by Mark and copied by Matthew and Luke.
On the GH these are words authored by Matthew and copied by Luke and Mark.

============

221 - Triple tradition, agreement between Matthew and Mark.

Instances of words occurring in a sentence or clause in Mark, and also
occurring in a corresponding parallel sentence or clause in Matthew, but
not occurring in corresponding parallel sentence(s) or clause(s) in Luke.

On the 2SH these are words authored by Mark and copied by Matthew, but not
copied by Luke.
On the FH these are words authored by Mark and copied by Matthew, but not
copied by Luke, from either Mark or Matthew.
On the GH these are words authored by Matthew, and then not copied by Luke,
and then copied from Matthew by Mark.

============

122 - Triple tradition, agreement between Mark and Luke.

Instances of words occurring in a sentence or clause in Mark, and also
occurring in a corresponding parallel sentence or clause in Luke, but not
occurring in corresponding parallel sentence(s) or clause(s) in Matthew.

On the 2SH these are words authored by Mark and copied by Luke, but not
copied by Matthew.
On the FH these are words authored by Mark, and then not copied by Matthew,
but then copied by Luke from Mark.
On the GH these are words authored by Luke while editing Matthew, and then
copied by Mark.

==============

121 - Triple tradition, unique Mark.

Instances of words occurring in a sentence or clause in Mark, but not
occurring in corresponding parallel sentence(s) or clause(s) in Matthew,
and also not occurring in corresponding parallel sentence(s) or clause(s)
in Luke.

On the 2SH these are words authored by Mark, but not copied by Matthew, and
not copied by Luke.
On the FH these are words authored by Mark, but then not copied by Matthew
or Luke.
On the GH these are words authored by Mark while editing Matthew and Luke.

=============

212 - Triple tradition - minor agreements.

Instances of words occurring in a sentence or clause in Matthew, and also
occurring in a corresponding parallel sentence or clause in Luke, but not
occurring in corresponding parallel sentence(s) or clause(s) in Mark.

On the 2SH these words are authored by both Matthew and Luke
independently, while editing Mark.
On the FH these words are authored by Matthew, and may have been copied by
Luke, generated indirectly by Luke's knowledge of Matthew, or generated
essentially independently by Luke.
On the GH these are words authored by Matthew, and then copied by Luke, but
then not copied by Mark, from either Matthew or Luke.

=============

211 - Triple tradition - unique Matthew.

Instances of words occurring in a sentence or clause in Matthew, but not
occurring in corresponding parallel sentence(s) or clause(s) in Mark, and
also not occurring in corresponding parallel sentence(s) or clause(s) in
Luke.

On the 2SH these are words authored by Matthew while editing Mark.
On the FH these are words authored by Matthew while editing Mark, and then
not copied by Luke.
On the GH these are words authored by Matthew, and then not copied by Luke
or Mark.

=============

112 - Triple tradition - unique Luke.

Instances of words occurring in a sentence or clause in Luke, but not
occurring in corresponding parallel sentence(s) or clause(s) in Mark, and
also not occurring in corresponding parallel sentence(s) or clause(s) in
Matthew.

On the 2SH these are words authored by Luke while editing Mark.
On the FH these are words authored by Luke while editing Mark and Matthew.
On the GH these are words authored by Luke while editing Matthew, and then
not copied by Mark.

=============

022 - Mark/Luke tradition - double agreements.

Instances of words occurring in a sentence or clause in Mark and also
occurring in corresponding parallel sentence or clause in Luke, where there
is no corresponding parallel sentence or clause in Matthew. (And often not
even a corresponding pericope.)

On the 2SH these are words authored by Mark, and copied by Luke, where
Matthew omitted the sentence or clause.
On the FH these are words authored by Mark, and then copied by
Luke, where Matthew had omitted the sentence or clause,
On the GH these are words authored by Luke, and then copied by Mark, where
Matthew did not have the sentence or clause.

=============

021 - Mark/Luke tradition - unique Mark.

Instances of words occurring in a sentence or clause in Mark but not
occurring in corresponding parallel sentence(s) or clause(s) in Luke, where
there is no corresponding parallel sentence or clause in Matthew. (And often
not even a corresponding pericope.)

On the 2SH these are words authored by Mark, and not copied by Luke, where
Matthew omitted the sentence or clause.
On the FH these are words authored by Mark, and then not copied by Luke,
where Matthew had omitted the sentence or clause,
On the GH these are words authored by Mark while editing Luke, where
Matthew did not have the sentence or clause.

=============

012 - Mark/Luke tradition - unique Luke.

Instances of words occurring in a sentence or clause in Luke but not
occurring in corresponding parallel sentence(s) or clause(s) in Mark, where
there is no corresponding parallel sentence or clause in Matthew. (And often
not even a corresponding pericope.)

On the 2SH these are words authored by Luke, while editing Mark, where
Matthew omitted the sentence or clause.
On the FH these are words authored by Luke, while editing Mark, where
Matthew had omitted the sentence or clause.
On the GH these are words authored by Luke, and then not copied by Mark,
where Matthew did not have the sentence or clause.

=============

220 - Mark/Matthew tradition (mostly Bethesda section) - double
agreements.

Instances of words occurring in a sentence or clause in Mark and also
occurring in corresponding parallel sentence or clause in Matthew, where
there is no corresponding parallel sentence or clause in Luke. (And often
not even a corresponding pericope.)

On the 2SH these are words authored by Mark, and copied by Matthew, where
Luke has omitted the sentence or clause.
On the FH these are words authored by Mark, and then copied by Matthew,
where Luke then omitted the sentence or clause.
On the GH these are words authored by Matthew, and then copied by Mark,
where Luke then omitted the sentence or clause.

=============

120 - Mark/Matthew tradition (mostly Bethesda section) - unique
Mark.

Instances of words occurring in a sentence or clause in Mark but not
occurring in corresponding parallel sentence(s) or clause(s) in Matthew,
where there is no corresponding parallel sentence or clause in Luke. (And
often not even a corresponding pericope.)

On the 2SH these are words authored by Mark, and not copied by Matthew,
where Luke omitted the sentence or clause.
On the FH these are words authored by Mark, and then not copied by Matthew,
where Luke then omitted the sentence or clause.
On the GH these are words authored by Mark while editing Matthew, where
Luke had omitted the sentence or clause.

=============

210 - Mark/Matthew tradition (mostly Bethesda section) - unique
Matthew.

Instances of words occurring in a sentence or clause in Matthew but not
occurring in corresponding parallel sentence(s) or clause(s) in Mark, where
there is no corresponding parallel sentence or clause in Luke. (And often
not even a corresponding pericope.)

On the 2SH these are words authored by Matthew, while editing Mark, where
Luke omitted the sentence or clause.
On the FH these are words authored by Matthew, while editing Mark, where
Luke then omitted the sentence or clause.
On the GH these are words authored by Matthew, and then not copied by Mark,
where Luke had omitted the sentence or clause.

=============

020 - Sondergut Mark

Instances of words occurring in a sentence or clause in Mark, where there
is no corresponding parallel sentence or clause in Matthew, (and often not
even a corresponding pericope), and there also is no corresponding parallel
sentence or clause in Luke, (and often not even a corresponding pericope.)

On the 2SH these are words authored by Mark, where both Matthew and Luke
then omitted the sentence or clause.
On the FH these are words authored by Mark, where both Matthew and Luke then
omitted the sentence or clause.
On the GH these are words authored by Mark, where neither Matthew nor Luke
had the sentence or clause.

=============

202 - Matthew/Luke double tradition - double agreements.

Instances of words occurring in a sentence or clause in Matthew and also
occurring in corresponding parallel sentence or clause in Luke, where there
is no corresponding parallel sentence or clause in Mark. (And often not even
a corresponding pericope.)

On the 2SH these are words authored by Q, and copied by both Matthew and
Luke.
On the FH these are words authored by Matthew, and then copied by Luke,
where Mark did not have the sentence or clause.
On the GH these are words authored by Matthew, and then copied by Luke, but
where Mark then omitted the sentence or clause.

=============

201 - Matthew/Luke double tradition - unique Matthew.

Instances of words occurring in a sentence or clause in Matthew but not
occurring in corresponding parallel sentence(s) or clause(s) in Luke, where
there is no corresponding parallel sentence or clause in Mark. (And often
not even a corresponding pericope.)

On the 2SH these are words authored by Q, and copied by Matthew but not
copied by Luke, or these are words authored by Matthew while editing Q.
On the FH these are words authored by Matthew, and then not copied by Luke,
where Mark did not have the sentence or clause.
On the GH these are words authored by Matthew, and then not copied by Luke,
and where Mark then omitted the sentence or clause.

=============

102 - Matthew/Luke double tradition - unique Luke.

Instances of words occurring in a sentence or clause in Luke but not
occurring in corresponding parallel sentence(s) or clause(s) in Matthew,
where there is no corresponding parallel sentence or clause in Mark. (And
often not even a corresponding pericope.)

On the 2SH these are words authored by Q, and copied by Luke but not copied
by Matthew, or these are words authored by Luke while editing Q.
On the FH these are words authored by Luke while editing Matthew, where
Mark did not have the sentence or clause.
On the GH these are words authored by Luke, while editing Matthew, and where
Mark then omitted the sentence or clause.

=============

200 - Sondergut Matthew.

Instances of words occurring in a sentence or clause in Matthew, where
there is no corresponding parallel sentence or clause in Mark, (and often
not even a corresponding pericope), and there also is no corresponding
parallel sentence or clause in Luke, (and often not even a corresponding per
icope.)

On the 2SH these are words authored by Matthew, where Mark did not have the
sentence or clause.
On the FH these are words authored by Matthew, where Mark did not have the
sentence or clause, and where Luke then omitted the sentence or clause.
On the GH these are words authored by Matthew, where Luke and Mark then

omitted the sentence or clause.

=============

002 - Sondergut Luke.

Instances of words occurring in a sentence or clause in Luke, where there
is no corresponding parallel sentence or clause in Mark, (and often not even
a corresponding pericope), and there also is no corresponding parallel
sentence or clause in Matthew, (and often not even a corresponding
pericope.)

On the 2SH these are words authored by Luke, where Mark did not have the
sentence or clause .
On the FH these are words authored by Luke, where Mark did not have the
sentence or clause .
On the GH these are words authored by Luke where Matthew did not have the
sentence or clause and where then Mark then omits the sentence or clause.

=============

Again, from the definitions it should be apparent that in each case,
the first digit in the symbol represents Matthew, the second digit represents Mark,
and the third digit represents Luke.

Also, a "2" in any one of these positions indicates that the gospel contains
an instance of the key word, in a specific sentence or clause.
A "1" in any one of these positions indicated that the gospel contains
a parallel sentence or clause,
but does not contain an instance of the key word in that sentence or clause.
A "0" in any one of these positions indicated that the gospel does not
contain a parallel sentence or clause.

On the assumption that 200 characterizes Matthean vocabulary, 020 characterizes Markan vocabulary, and 002 characterizes Lukan vocabulary, David Gentile would like to see which of the three is the closest match for, say, 222, the words shared in common in the triple tradition. His algorithm says that 020 is. It also says that Markan (020) vocabulary is closest to 220 and 221, material and agreements between Matthew and Mark against Luke, as well as 022 and 122, material and agreements between Luke and Mark against Matthew. His interpretation of this is Markan priority.

I haven't actually tinkered with his algorithm, but at the root of it, I suppose, is word frequency.

kind thoughts,
Peter Kirby

Ben C Smith · 09-05-2005, 08:19 PM

Ah, I must have been imprecise. I understand the 0-1-2 system that he uses to distinguish parts of the synoptic tradition. What I was after was more the math of it (such as what a Poisson gamma test does).

Thanks.

Ben.

Peter Kirby · 09-05-2005, 08:30 PM

Ha! I think you were precise enough, but that I most fully answered the question about which I had the most readily available information.

Like I said I have not scrutinized his maths. I probably should do so in preparation for the formal discussion of the synoptic problem.

In the meantime, have you considered contacting him? (There is also an old thread about this lying around on IIDB somewhere.)

I do believe that he bases it somehow on lexical frequency rather than combinations of letters, combinations of parts of speech, or some other stylometric.

kind thoughts,
Peter Kirby

Toto · 09-05-2005, 08:53 PM

Prior thread: Statistical Approach to the Synoptic Problem

GentDave · 09-05-2005, 09:40 PM

The math is pretty much straight out of an Econometrics textbook. There is an example in it that uses the maximum likelihood method with a Poisson distribution, and the math is right from that example. As for making it simpler, I tried my best...

One thing I might add is that the process is very similiar to regression. But regression assumes a normal distribution, and word frequencies are better discribed by a Poisson distrabution.

The proceedure is even closer to something known as Logistic regression, which is used all the time in credit risk. It's only a very slight change form that.

If you have specific questions, I could try to explain more.

Dave Gentile

Ben C Smith · 09-06-2005, 11:35 AM

Nice to meet you, Dave! Thanks for dropping in.

Quote:

Originally Posted by GentDave

The math is pretty much straight out of an Econometrics textbook. There is an example in it that uses the maximum likelihood method with a Poisson distribution, and the math is right from that example. As for making it simpler, I tried my best...

The problem is not yours; it is mine. I just know next to nothing about statistics (as a mathematical field). So when you start explaining the processes on your site in terms such as...:

Quote:

One thing I might add is that the process is very similiar to regression. But regression assumes a normal distribution, and word frequencies are better discribed by a Poisson distrabution.

...well, that pretty much goes right over my head.

I asked if the math could be dumbed down a bit, and the impression I am getting is that no, it is already as simple as it is going to get.

Let me ask one very specific question and see if the answer still eludes me. Suppose a certain Greek word is found 10 times in 020 material. Does your process try to predict, so to speak, that this same word should be found 10 times in 002 material too, and then, failing that, draw the conclusion that 020 and 002 material must not be very closely related (after, of course, 800 other words have been similarly tried)? Also, if the above is even slightly accurate as a description of your process, does it take into account that the 020 material might be more or less lengthy than the 002 material?

BTW, I have been drawn of late to the 3SH (in some form or other) for reasons completely unbased in statistics, word counts, or vocabulary comparisons. Which is why I asked about your site in the first place.

Thanks.

Ben.

Ben C Smith · 09-06-2005, 11:46 AM

Is there any way, I wonder, of making distinctions within each category (such as 200)? Streeter once claimed that much of the M material is parasitic on Mark (for example, Peter walking on the water depends on Jesus walking on the water, and the exchange between John the baptist and Jesus in the Jordan depends on there being a baptismal scene). I wonder what would happen if we took that kind of material and compared it with other parts of M that are not modifying Marcan pericopes (one example would be the speech about fasting, prayer, and alms in the sermon on the mount).

Ben.

S.C.Carlson · 09-06-2005, 12:41 PM

Quote:

Originally Posted by Ben C Smith

I asked if the math could be dumbed down a bit, and the impression I am getting is that no, it is already as simple as it is going to get.

Let me ask one very specific question and see if the answer still eludes me. Suppose a certain Greek word is found 10 times in 020 material. Does your process try to predict, so to speak, that this same word should be found 10 times in 002 material too, and then, failing that, draw the conclusion that 020 and 002 material must not be very closely related (after, of course, 800 other words have been similarly tried)? Also, if the above is even slightly accurate as a description of your process, does it take into account that the 020 material might be more or less lengthy than the 002 material?

Let me try to explain. The fundamental assumption of corpus linguistics is that a person's usage (here word frequency) is roughly consistent among the person's writings but distinct when compared with others. Thus, an author's usage in a given set of texts should a better predictor for the usage in another text of that author than someone else's usage or the usage of many different authors averaged together.

As best as I can tell, Dave Gentile investigates e.g., whether the distribution of words in the 002 material is a better predictor of the distribution of words in the 020 material as compared with the word distribution averaged across in the synoptics (think of this as his control). Although I have not studied it in great detail, the particular estimators Gentile used (i.e. a Poisson) appear reasonable for the nature of the problem he is investigating and they do certainly take into the account the relative sizes of the different bodies of material (e.g. 002 and 020).

At any rate, the key things to understand about Gentile's approach are:

1. He is using a prima facie reasonable way to evaluate how well corpus predicts the the distribution of words in another corpus.

2. He is also using a "control" corpus to estimate the distribution of words in that other corpus.

3. He then compares the first prediction with the "control" prediction is to assess whether the difference is statistically significant.

Issues of Gentile's work to explore include:

a. Is his estimator appropriate?
b. Is his control appropriate?
c. Is his comparison of the estimator and the control appropriate?
d. Is his interpretation of the statistically significant comparisons appropriate?

I have not studied it in sufficient detail to obtain answers I would be satisfied with, but, on the other hand, nothing blatantly flawed is obvious either. In other words, futher investigation along Gentile's approach is not likely to be futile, though the level of competence in statistics requires may preclude most people actually interested in the synoptic problem from attempting to follow up with Gentile's work.

Stephen

Ben C Smith · 09-06-2005, 01:22 PM

Quote:

Originally Posted by S.C.Carlson

Let me try to explain. The fundamental assumption of corpus linguistics is that a person's usage (here word frequency) is roughly consistent among the person's writings but distinct when compared with others. Thus, an author's usage in a given set of texts should a better predictor for the usage in another text of that author than someone else's usage or the usage of many different authors averaged together.

As best as I can tell, Dave Gentile investigates e.g., whether the distribution of words in the 002 material is a better predictor of the distribution of words in the 020 material as compared with the word distribution averaged across in the synoptics (think of this as his control). Although I have not studied it in great detail, the particular estimators Gentile used (i.e. a Poisson) appear reasonable for the nature of the problem he is investigating and they do certainly take into the account the relative sizes of the different bodies of material (e.g. 002 and 020).

At any rate, the key things to understand about Gentile's approach are:

1. He is using a prima facie reasonable way to evaluate how well corpus predicts the the distribution of words in another corpus.

2. He is also using a "control" corpus to estimate the distribution of words in that other corpus.

3. He then compares the first prediction with the "control" prediction is to assess whether the difference is statistically significant.

Issues of Gentile's work to explore include:

a. Is his estimator appropriate?
b. Is his control appropriate?
c. Is his comparison of the estimator and the control appropriate?
d. Is his interpretation of the statistically significant comparisons appropriate?

I have not studied it in sufficient detail to obtain answers I would be satisfied with, but, on the other hand, nothing blatantly flawed is obvious either. In other words, futher investigation along Gentile's approach is not likely to be futile, though the level of competence in statistics requires may preclude most people actually interested in the synoptic problem from attempting to follow up with Gentile's work.

Stephen

Good, a lot of that helps a great deal.

A follow-up.... From your description it appears that if both 020 and 002 (just my running example) lacked a particular word that happened to appear somewhat frequently in other parts of the synoptic traditions, the correlation between 020 and 002 would be improved, as it were. Is that correct?

Also, it appears to me that this method really deals only with the broadest level of synoptic relations. For example, it does not appear capable of identifying any given pericope within 002 that may actually more closely resemble 200 than 020. Correct?

Thanks.

Ben.

09-05-2005, 11:02 AM	#1
Ben C Smith Veteran Member Join Date: May 2005 Location: Midwest Posts: 4,787	Inquiry on synoptic statistics. For any who can assist.... There has been an online statistical approach to the synoptic problem for some time now, and I am wondering what a mathematical mind might make of the process described there. I myself know very little about statistics. Once the discussion arrives at this point...: Luke group: 012, 002, 112 Matthew group: 212, 211, 210 Mark group: 222, 221, 220, 122, 022*, 121, 020, 021, 120 Sayings group 1: 200, 201, 202 Sayings group 2: 102 ...I feel (at least somewhat) qualified to evaluate the results. But most of it is a blur till then. So my question is for statistics buffs. Is the process sound (mathematically)? Is there any way of dumbing it down for the likes of me? (If not, I understand....) Thanks. Ben.

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search

09-05-2005, 08:19 PM	#3
Ben C Smith Veteran Member Join Date: May 2005 Location: Midwest Posts: 4,787	Ah, I must have been imprecise. I understand the 0-1-2 system that he uses to distinguish parts of the synoptic tradition. What I was after was more the math of it (such as what a Poisson gamma test does). Thanks. Ben.

09-05-2005, 08:30 PM	#4
Peter Kirby Veteran Member Join Date: Jul 2001 Location: the reliquary of Ockham's razor Posts: 4,035	Ha! I think you were precise enough, but that I most fully answered the question about which I had the most readily available information. Like I said I have not scrutinized his maths. I probably should do so in preparation for the formal discussion of the synoptic problem. In the meantime, have you considered contacting him? (There is also an old thread about this lying around on IIDB somewhere.) I do believe that he bases it somehow on lexical frequency rather than combinations of letters, combinations of parts of speech, or some other stylometric. kind thoughts, Peter Kirby

09-05-2005, 08:53 PM	#5
Toto Contributor Join Date: Jun 2000 Location: Los Angeles area Posts: 40,549	Prior thread: Statistical Approach to the Synoptic Problem

09-05-2005, 09:40 PM	#6
GentDave Junior Member Join Date: Aug 2003 Location: Illinois Posts: 70	The math is pretty much straight out of an Econometrics textbook. There is an example in it that uses the maximum likelihood method with a Poisson distribution, and the math is right from that example. As for making it simpler, I tried my best... One thing I might add is that the process is very similiar to regression. But regression assumes a normal distribution, and word frequencies are better discribed by a Poisson distrabution. The proceedure is even closer to something known as Logistic regression, which is used all the time in credit risk. It's only a very slight change form that. If you have specific questions, I could try to explain more. Dave Gentile

09-06-2005, 11:46 AM	#8
Ben C Smith Veteran Member Join Date: May 2005 Location: Midwest Posts: 4,787	Is there any way, I wonder, of making distinctions within each category (such as 200)? Streeter once claimed that much of the M material is parasitic on Mark (for example, Peter walking on the water depends on Jesus walking on the water, and the exchange between John the baptist and Jesus in the Jordan depends on there being a baptismal scene). I wonder what would happen if we took that kind of material and compared it with other parts of M that are not modifying Marcan pericopes (one example would be the speech about fasting, prayer, and alms in the sermon on the mount). Ben.

Freethought & Rationalism Archive

The archives are read only.