FRDB Archives

Freethought & Rationalism Archive

The archives are read only.


Go Back   FRDB Archives > Archives > Religion (Closed) > Biblical Criticism & History
Welcome, Peter Kirby.
You last visited: Today at 03:12 PM

 
 
Thread Tools Search this Thread
Old 03-28-2005, 01:25 PM   #11
Veteran Member
 
Join Date: Jul 2001
Location: the reliquary of Ockham's razor
Posts: 4,035
Default

Quote:
Originally Posted by Vivisector
Have you done any experimentation with algorithms that include the proximity of various words to one another? Seems this could be pretty powerful in terms of detecting similarities in phrases and construction, though the coding would likely be a little more hairy (and you might have to develop your criteria and weights from scratch).
I have thought of doing pairs of words, such as pairing off adjacent words or words separated only by one other words. I won't be doing any coding along those lines soon, however.

best,
Peter Kirby
Peter Kirby is online now   Edit/Delete Message
Old 03-28-2005, 01:43 PM   #12
Veteran Member
 
Join Date: Jan 2005
Location: USA
Posts: 1,307
Default

I blogged about something like this a couple of months ago. See Three-Word Phrases in the Pastoral Epistles.

Stephen
S.C.Carlson is offline  
Old 03-28-2005, 02:04 PM   #13
BDS
Veteran Member
 
Join Date: Jul 2003
Location: Eugene, OR, USA
Posts: 3,187
Default

Never mind. I fond the answer to my question.
BDS is offline  
Old 03-29-2005, 11:03 PM   #14
Banned
 
Join Date: Oct 2003
Location: Alaska
Posts: 9,159
Default

Hi Peter.

Very interesting and I think quite powerful. I meant to think about the math more, but I've been lazy and wanted to just pose something before too much time passes. two things, actually.

There is a technique known as cluster analysis that can help in terms of making the decision about how many groups there are. It is a matter of comparing the within-group variation to the between-group variation. (You want low within-group variation relative to between-group variation).

I'll think more about how to apply it here, but that seems to me the ultimate question here. How many authors or "groups". At present the model is taking that as a given and then working out which sets of data belong to the assumed number of groups. But you can decide how many groups with cluster analysis.


The first matter I was going to raise though was the issue of different subject matter appearing in the text - and since you are so much more familiar with these books than I am I'll just ask you to mull this over.

At first pass, it looks like you have large enough samples to avoid the problem posed by having different subject matters that would confound word counts. In the extreme, as an example, if we threw in a short letter from Paul to his mistress on where he hid the keys to the motel then it would appear to be a different author when it is merely different subject matter.

Having long data sets covering basically the same subject matter obviates this problem. But it can make a difference where it is a close call, and there would be nothing wrong with introducing a control for that. Prior to involving this in a technical approach, I instead ask in a general way if you think any of the books have a significant amount of unique subject matter that might make a difference.

Again, very interesting and elegant.

Sincerely, rlogan
rlogan is offline  
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump


All times are GMT -8. The time now is 07:34 PM.

Top

This custom BB emulates vBulletin® Version 3.8.2
Copyright ©2000 - 2015, Jelsoft Enterprises Ltd.