FRDB Archives

Freethought & Rationalism Archive

The archives are read only.


Go Back   FRDB Archives > Archives > IIDB ARCHIVE: 200X-2003, PD 2007 > IIDB Philosophical Forums (PRIOR TO JUN-2003)
Welcome, Peter Kirby.
You last visited: Yesterday at 05:55 AM

 
 
Thread Tools Search this Thread
Old 03-26-2003, 04:05 PM   #1
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Question SIPF dipeptides too close of a fit to nature to be coincidental?

1999 Rode paper vastly overestimates link between SIPF and primitive proteins

DNAunion: For years now I just could not get over the statistical significance of the correlation between the dipeptides formed by the SIPF [salt-induced peptide formation] reaction and those found in ‘primitive’ proteins as reported by Bernd Michael Rode in his 1999 article “Peptides and the Origin of Life”. If he was correct, then the probability that the SIPF would preferentially produce the same dipeptides found in ‘primitive’ proteins, by chance alone, was 1 in 10^18. Who would argue against those odds by claiming that the SIPF reaction was not involved? I have invested the time needed to look at his claim and find it erroneous. Before explaining in detail why his calculations are wrong, I will present his case from the article.

Quote:
“One of the important features of the SIPF [salt-induced peptide formation] reaction resulting from investigations of a larger number of amino acids as single and binary systems is the preferential formation of certain dipeptides, which implies a preference of certain amino acid sequences. This reaction-inherent preference of sequences is quite essential in connection with the often-heard argument that ‘time of evolution (even age of the earth) was not sufficient in statistical terms’ to produce a certain, biologically relevant peptide or protein by chance from the amino acids, considering the enormous number of possible combinations. If the possible combinations are governed by a reaction leading to a limited number of sequences, this argument loses much of its validity. This consideration encouraged a comparison of the preferred SIPF sequences to those found in ‘primitive’ proteins, i.e., those found in organisms dating back to the beginnings of biological evolution. …

For comparison, the membrane proteins of archaebacteria and prokaryonta have been analyzed with respect to the relative occurrence of peptide linkages between the same amino acids (data from Ref. [70]). In table 8a and b, the four most frequently found A-B and B-A sequences of the proteins are compared with the corresponding sequences produced in the largest yields by the SIPF reaction.

In the case of archaebacteria (Table 8a), mostly two [or] three of the sequences coincide. This does not mean much for a single amino acid, where a single coincidence occurs with a probability of 4/9 in this comparison, a double one with 1/6, and a triple one with 1/21. However, the probability for the cumulative coincidence found for all 18 amino acid pairings A-B and B-A investigated is only 10^-18 by chance. For prokaryonta in general (Table 8b), coincidences are still frequent, the corresponding ‘by chance’-value is merely 10^-16.

These findings provide strong support for the assumption that the SIPF reaction indeed has been responsible for the production of the first peptides on the primitive earth and that, once created, peptide sequences have been conserved to a considerable extent in the course of further chemical evolution into the beginning of life and biologic evolution.” (Bernd Michael Rode, Peptides and the Origin of Life, Peptides 20, 1999, p781-783)
DNAunion: Here’s the basic argument.

1) The SIPF [salt-induced peptide formation] reaction tends to favor production of certain amino acid pairings (dipeptides) over others. By restricting the number of dipeptides that form frequently, the SIPF also limits the number of longer sequences that could form from the joining of those shorter sequences. In a prebiotic context, this could be seen as a blessing in that a search through all long sequences would be, for all practical purposes, impossible. Since a vast many sequences would tend not to form, saturation of a restricted search space might occur, finding all functional proteins that exist within it. The problem is, if the proteins needed to kickstart life are not within that restricted sequence space, then the SIPF would actually hinder the origin of life by leading chemistry away from where it needs to go. That brings us to point 2.

2) The amino-acid pairings (dipeptides) that are preferentially formed by the SIPF reaction match up extremely closely to those in some of the earliest proteins. In fact, by chance alone, the likelihood of the match being as tight as it is is only about one chance in a million trillion.

A key point here is that the smaller the probability of the correspondence between SIPF and primitive proteins is, the more likely is the possibility that the SIPF was involved in the creation of those primitive proteins. For example, if the match between the SIPF dipeptides and those in primitive proteins was only, say, 1 in 5, then there would be little to no statistical significance to the match: chance alone would be an adequate explanation . But with a probability of only 1 in 10^18, chance alone can hardly be relied upon as being the best explanation: there must be some connection between the two.

Again, crucial here is how small or large the probability is. Unfortunately for the argument, it is nowhere near 1 in 10^18.

Before proceeding, perhaps we should answer the obvious question, “How did Rode arrive at his figure of 1 in 10^18?” Clues can be found in the following repeated material.

Quote:
”… a single coincidence occurs with a probability of 4/9 in this comparison, a double one with 1/6, and a triple one with 1/21. However, the probability for the cumulative coincidence found for all 18 amino acid pairings A-B and B-A investigated is only 10^-18 by chance.” (Bernd Michael Rode, Peptides and the Origin of Life, Peptides 20, 1999, p783)
DNAunion: How is the probability of a single coincidence 4/9? Why not 4/20? First of all, not all 20 biological amino acids are taken into consideration. In fact, only nine were. In addition to clues that can be found in already-quoted material, Rode states on page 782, as the text accompanying his table,…

Quote:
”The four most frequently occurring A-B and B-A linkages of the investigated amino acids in membrane proteins of (a) archaebacteria ‘AB’ and (b) prokaryotic cells ‘PK’, in comparison with the dipeptides most readily formed by these amino acids in the SIPF reaction”
DNAunion: That still may not be too clear. The SIPF reaction was tested with only nine different amino acids, as is indicated in tables 6, 7, and 8. The comparison between SIPF and ‘primitive’ proteins takes only those nine amino acids tested in the SIPF reaction into account.

So to begin with, Rhode’s comparison is incomplete. What if other amino acids occur more frequently in the ‘primitive’ dipeptides than the nine he looked at? Apparently they are ignored, and the nine of interest are bumped up, possibly moving from outside of the top four to being within it. If so, they would be counted as hits even though they were actually too far down in line originally to be counted as such. For the rest of the discussion, this potential flaw will be overlooked.

Now, since the comparison is made to “the four most frequently occurring” amino acids joined to a given one, then a match will exist for four out of the nine possible ‘primitive’ amino acids. Thus, the probability for a single coincidence between a ‘primitive’ amino acid and one of the four SIPF ones is 4/9.

This logic can be extended to see how Rode arrived at his other probabilities. For two coincidences, there is a 4/9 chance of a match with the first ‘primitive’ amino acid, and then a 3/8 chance for the second (we have to assume the first one matches before we can calculate the probability for the second match; with one ‘primitive’ amino acid already matched up, that leaves eight remaining, of which three will match to SIPF ones). Thus we have P(2 matches) = 4/9 * 3/8 = 1/6.

For 3 coincidences, we assume the first ‘primitive’ amino acid matches one of the four target SIPF amino acids, and that the second does too. That leaves us with seven remaining ‘primitive’ amino acids of which two will match SIPF ones. Therefore, all together, P(3 matches) = 4/9 * 3/8 * 2/7 = 1/21.

Extending this just one more time gives us P(4 matches) = 4/9 * 3/8 * 2/7 * 1/6 = 1/126.

To find the overall probability of correspondence between SIPF dipeptides and ‘primitive’ ones, Rode apparently takes these individual probabilities and looks at how many times each occurs, using that value as an exponent. For example, he lists two single matches in table 8a, so the combined probability is 4/9 * 4/9, or simply, (4/9)^2. When all single, double, triple, and quadruple matches are taken into account, the overall probability of correspondence comes to 6.438 in 10^18, which he rounds down to 1 in 10^18.


RODE’S BASIC PROBABILITIES ARE WRONG
So far, I have managed only to confirm Rode’s probability. But there is a problem in his fundamental calculations – that is, his probability for a single match is wrong, as is his probability for a double match, as is his probability for a triple match, etc.

Rode takes into account only enough trials to cover the number of coincidences. For example, for a single coincidence, Rode considers only a single trial. Sure, if you are only going to get one shot at an event with a probability of 4/9, then of course your chance of success is 4/9. But that is not the case here. There are four chances to get a single match. For example, one of his single matches is for the amino acid Ala, in which the archaebacteria have joined to it either Ala, Glu, Val, Leu and the SIPF has joined to it Ala, Pro, Gly, and His. So there were four attempts – Ala, Glu, Val, and Leu – at matching any of the four SIPF amino acids. That changes the probability of a single match dramatically: let’s take a look.

What we will do first is calculate the probability that none of the four ‘primitive’ amino acids match the SIPF ones, then from it calculate the opposite probability (that at least one would match).

‘Primitive’ Amino Acid 1: To start with, there are four SIPF targets and nine possible ‘primitive’ amino acids that could be compared to them. So the probability of ‘primitive’ aa-1 matching one of the four target SIPF amino acids is 4/9. Therefore, the probability of its not matching is 1 – 4/9 = 5/9.

‘Primitive’ Amino Acid 2: Since the probability of this aa is dependent upon the previous one, we have to assume that aa-1 did not match. That leaves eight possible ‘primitive’ amino acids and still four target SIPF ones. So the probability of getting a match here is 4/8, which means the probability of not getting a match is 1 – 4/8 = 4/8 = 1/2.

‘Primitive’ Amino Acid 3: We have to assume that the previous attempt failed to match, leaving seven ‘primitive’ amino acids and still four target SIPF ones. So the probability of matching here is 4/7, meaning that the probability of a non-match is 1 – 4/7 = 3/7.

‘Primitive’ Amino Acid 4: Since the last one failed to match also, we are left with six ‘primitive’ amino acids and still have four target SIPF ones. So the probability of a match on this final step is 4/6, which means the probability of a non-match is 1 – 46 = 2/6 = 1/3.

To figure out the overall probability – that is, what is the probability of not getting any matches in four attempts -- we just multiply each of the four individual probabilities for non-matches.

P(no matches) = 5/9 * 1/2 * 3/7 * 2/3 = 5/126

And, looking at the opposite case…

P(at least one match) = 1 – P(no matches) = 1 – 5/126 = 121/126 = 96.03%

So whereas Rode tells us the probability of a single match is a mere 44.44%, we see that it is actually more than twice that: 96.03%.

The same calculations show his other probabilities to be vastly underestimated: two matches are much more likely to occur than Rode leads us to believe, as are triple and quadruple matches.

In addition, Rode does not take into account that in either of the eighteen instances, five attempts to match are used since there are five ‘primitive’ amino acids listed. This raises the probability that those rows will have a match even higher.



COMPUTER MODEL MORE ACCURATE THAN RODE’S
After spotting the error in Rode’s methodology, I modeled the comparison in silico. Unlike Rode’s simple (and flawed) calculations, my model took into account a couple additional factors (those in addition to how many coincidences each row had) for each entry in the table: how many ‘primitive’ amino acids are listed (how many attempts at making a match are used), and how many target SIPF amino acids are listed. The program (see code at end) performed one million iterations for each table entry to calculate an empirical probability (the law of large numbers indicates the empirical probability should be close to the theoretical probability). Then, the individual values (which were now vastly more accurate than those Rode used) were multiplied together as in the Rode method to arrive at a final overall probability.



!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Was the final value close to Rode’s 1 in 10^18? Nope, not at all. In fact, the probability of correspondence between the ‘primitive’ and SIPF dipeptides was many, many orders of magnitudes greater than what Rode stated; that is, billions of times more likely to be due to chance. The computer model produced a probability of 2.916 in 10^7.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!



Computer code (Language = Visual FoxPro 6.0)


*calculate_empirical_probability.prg
************************************************** ****
* This program models choosing lettered tiles from an urn in order
* to calculate an empirical probability.
************************************************** ****

CLEAR

lcDiscardOrReplaceTilesOnceChosen = "DISCARD"

lnNumberOfLetteredTiles = 9
lnNumberOfTargetTiles = 4
lnMatchesNeededForSuccess = 1
lnTrialsPerIteration = 4

lnIterations = 1000000
lnSuccessfulIterations = 0

* Initially seed pseudo-random number generator using the system clock
=RAND(-1)

* Column 1 = LETTER (a unique symbol on a tile that gets placed into the urn)
*
* Column 2 = CHOSEN (has the letter/tile already been chosen from the urn?
* if so, it may be discarded and not available any more, or
* may be used again, depending upon the value of the variable
* lcDiscardOrReplaceTilesOnceChosen)
*
* Column 3 = TARGET (is this tile one of the target letters?)
LOCAL ARRAY aUrn[lnNumberOfLetteredTiles , 3]
FOR lnIndex = 1 TO lnNumberOfLetteredTiles
lcLetter = CHR(64 + lnIndex)
aUrn[lnIndex, 1] = lcLetter
aUrn[lnIndex, 2] = "F"
aUrn[lnIndex, 3] = "F"
ENDFOR

* Choose x number of different targets from the urn
FOR lnLooper = 1 TO lnNumberOfTargetTiles
DO WHILE .T.
lnIndex = GetRandomNumber(1, lnNumberOfLetteredTiles)
IF (aUrn[lnIndex, 3] = "F")
aUrn[lnIndex, 3] = "T"
EXIT
ENDIF
ENDDO
ENDFOR

* Should have at least 100 iterations in order to figure out perCENT of
* success. Much larger numbers give more accurate results.
FOR lnIteration = 1 TO lnIterations
IF (lnIteration % 9999 = 0)
WAIT WINDOW NOWAIT "Iteration " + ;
STR(lnIteration) + " of " + STR(lnIterations)
ENDIF

* New iteration - clear all CHOSEN flags
FOR lnIndex = 1 TO lnNumberOfLetteredTiles
aUrn[lnIndex, 2] = "F"
ENDFOR
lnMatches = 0

* Select multiple tiles to try to match a target
FOR lnTrial = 1 TO lnTrialsPerIteration

* First, randomly choose a single lettered tile from the urn
DO WHILE .T.
lnIndex = GetRandomNumber(1, lnNumberOfLetteredTiles)
DO CASE
CASE lcDiscardOrReplaceTilesOnceChosen == "REPLACE"
* Whether tile #x has been chosen before or
* not does not matter it is in the urn now
* available to be chosen
EXIT
CASE lcDiscardOrReplaceTilesOnceChosen == "DISCARD"
* A tile that has been chosen is discarded after use, and so
* cannot be chosen a second time. Need to check the value
* of this tile's CHOSEN column.
DO CASE
CASE aUrn[lnIndex, 2] = "T"
* This tile has already been chosen -
* it can't be used again. Allow the
* program to loop to try choosing
* a different tile
CASE aUrn[lnIndex, 2] = "F"
* This tile has not been chosen before -
* okay to choose it
EXIT
OTHERWISE
WAIT WINDOW "Invalid value of " +
aUrn[lnIndex, 2] + “ for aUrn[" +
ALLTRIM(STR(lnIndex)) + ", 2]"
ENDCASE
OTHERWISE
WAIT WINDOW "Invalid value of " +
lcDiscardOrReplaceTilesOnceChosen + ;
" for lcDiscardOrReplaceTilesOnceChosen"
ENDCASE
ENDDO
* This tile has now been chosen - flag it as such
aUrn[lnIndex, 2] = "T"

* Does the chosen tile match one of the targets?
DO CASE
CASE aUrn[lnIndex, 3] == "F"
* Does not match a target
CASE aUrn[lnIndex, 3] == "T"
* Does match one of the targets
lnMatches = lnMatches + 1
* No need to continue pulling tiles if we have
* enough matches already
IF (lnMatches >= lnMatchesNeededForSuccess)
EXIT
ENDIF
OTHERWISE
WAIT WINDOW "Invalid value of " + aUrn[lnIndex, 3] +
" for aUrn[" + ALLTRIM(STR(lnIndex)) + ", 3]"
ENDCASE
ENDFOR

* Did we get enough matches?
IF (lnMatches >= lnMatchesNeededForSuccess)
lnSuccessfulIterations = lnSuccessfulIterations + 1
ENDIF
ENDFOR
WAIT CLEAR

? "Chosen tiles discarded or replaced: " + lcDiscardOrReplaceTilesOnceChosen
? "Number of lettered tiles: " + ALLTRIM(STR(lnNumberOfLetteredTiles))
? "Number of target tiles: " + ALLTRIM(STR(lnNumberOfTargetTiles))
? "Number of matches needed: " + ALLTRIM(STR(lnMatchesNeededForSuccess))
? "Trials per iteration: " + ALLTRIM(STR(lnTrialsPerIteration))
? "Total iterations: " + ALLTRIM(STR(lnIterations))
? "Successful iterations: " + ALLTRIM(STR(lnSuccessfulIterations))
? "Empirical probability: " + ALLTRIM(STR((lnSuccessfulIterations / lnIterations) * 100, 10, 4))

*************************
* ********************* *
* * FUNCTIONS * *
* ********************* *
*************************

FUNCTION GetRandomNumber(lnMin, lnMax)
LOCAL lnRandomNumber
* The pseudo-random number generator was already seeded with the system
* clock - all calls after that initialization should not pass any value
DO WHILE .T.
lnRandomNumber = (FLOOR(RAND() * 10000) % lnMax) + 1
IF (lnRandomNumber >= lnMin AND lnRandomNumber <= lnMax)
EXIT
ENDIF
ENDDO
RETURN lnRandomNumber
ENDFUNC
DNAunion is offline  
Old 03-26-2003, 09:33 PM   #2
Veteran Member
 
Join Date: Nov 2001
Location: NCSU
Posts: 5,853
Default

DNAUnion,

On first note, "in silico" is not correct Latin. "In silice" is. (I point this out to every one I see using it!)

I'm going to have a look at Rhode's paper and get back to you on the other points.
RufusAtticus is offline  
Old 03-27-2003, 09:01 AM   #3
Veteran Member
 
Join Date: Apr 2001
Location: St Louis area
Posts: 3,458
Default

Quote:
Originally posted by RufusAtticus

I'm going to have a look at Rhode's paper and get back to you on the other points.
I found a copy of a PDF of the paper here.
MortalWombat is offline  
Old 03-27-2003, 11:27 AM   #4
Veteran Member
 
Join Date: Jun 2000
Posts: 1,302
Default

Why don'y you submit your critique as a letter to the journal?
pangloss is offline  
Old 03-27-2003, 01:08 PM   #5
Veteran Member
 
Join Date: Mar 2002
Location: anywhere
Posts: 1,976
Default

Quote:
pangloss:Why don'y you submit your critique as a letter to the journal?
Quite simply, because DNAunion's "analysis" doesn't add any useful information to the body of scientific knowledge. Did Rode do a sloppy job in the probabilistic analysis? Yes. Did DNAunion catch it? Yes. But in the end that's all, folks. Let me elaborate on why a letter should be rejected:

1) Whether it's 1e-7 or 1e-18, the actual magnitude does not matter so much as the statistical significance of the number. The test here is presumably a rejection of the null hypothesis of a uniform probability. But wait. Where is the statistical test? What's the p-value? Rode had a good reason not to provide one, since he thought his calculation of 1e-18 was small enuf to beat any statistical challenge. DNAunion however claims that 1e-7 is not significant enough.
Quote:
Again, crucial here is how small or large the probability is. Unfortunately for the argument, it is nowhere near 1 in 10^18.
At this point, any good scientist would ask: So what? The burden of proof is clearly on DNAunion, since he is the one asserting that
Quote:
I have invested the time needed to look at [Rode's] claim and find it erroneous.
Indeed, at 1e-7 DNAunion easily dismisses the SIPF hypothesis as "erroneous" . . . to the detriment of his credibility. Yet, when the optimality of the standard genetic code was found to be one in a million, some of DNAunion's IDiot buddies are already saying that's significantly different from chance.

2) DNAunion, in his zeal, to "correct" Rode's probability analysis forgets to look at the overall picture. Did the evidence show a bias in dipeptide formation? Yes (or more exactly, this was not challenged). Did the evidence show a bias in dipeptide content of early organisms? Maybe (but once again, this was not challenged). Did the evidence show that SIPF produced a bias in dipeptide formation? Yes (but, this was not challenged). Was there a plausible mechanism for the SIPF bias? Yes (the CuII coordination hypothesis was especially intriguing). In light of the accomplishments published in the paper, is there sufficient reason to doubt the results on the basis of one errant probability analysis? No. Logically speaking, even if the bias seen in SIPF and in nature is not statistically significant, this alone does not rule out SIPF as a possible mechanism of generating prebiotic peptides.

3) As a matter of fact, there exists better probability studies than the one proposed by Rode. That is to say, there exists several flaws that are more significant than the ones that DNAunion picked up on, which of course still remained in DNAunion's analysis. First, certain linkages are counted twice in the analysis -- a fact which completely escaped DNAunion. For instance Ala-Ala is counted in both the A-B and B-A linkages. Why didn't DNAunion notice something this obvious, especially when he used the Ala example?

Second of all, and more importantly, the organization of the data is weak. The top four amino acids that preferentially links to a particular residue may in fact be relatively weak compared to other linkages. Yet, both DNAunion's and Rode's analyses assume that only ranking matters. What is needed is a complete ordering of prevalence from all 81 possible dipeptides in archaebacteria, compared with a complete ordering of yields for SIPF. Then, perform a statistical test for the significance of each ordering against a suitable null hypothesis. In this way, one avoids the faulty conclusion that A-B linkage preferences for one 'A' amino acid is independent of any linkage preferences for any other 'A' amino acid. This is what is tacitly assumed when one simply multiplies probabilities together (as did DNAunion in his "fixed" analysis).

4) A Monte Carlo analysis written in Visual Foxpro (!! ) for a combinatorial analysis of 9 sequence elements? I think there's a more scientific way of skinning this cat.

There are other issues that I will bring up when I have time.
Principia is offline  
Old 03-27-2003, 07:21 PM   #6
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

DNAunion: Principia, since you seem to be so good in math, I have a question for you.

I was tutoring someone tonight in college algebra and one of the problems we had was as follows:

|x^2 - 4| = x - 2

Here's what I did.

1) "Split" it into two equations to eliminate the absolute value sign:

a. x^2 - 4 = x - 2

b. x^2 - 4 = -(x - 2)


2) Recognizing x^2 - 4 as being the difference of two perfect squares, with one of the two factors being the same as the right side of the equation, I factored the left side.

a. (x + 2)(x - 2) = x - 2

b. (x + 2)(x - 2) = -(x - 2)


3) I then divided both sides of the equation by the common (x - 2).


a. [(x + 2)(x - 2)] / (x - 2) = (x - 2) / (x - 2)

b. [(x + 2)(x - 2)] / (x - 2) = -(x - 2) / (x - 2)



4) Reducing leads to:

a. x + 2 = 1

b. x + 2 = -1


5) Subtracting 2 from bot sides of both equations gives to isolate the variable gives:

a. x = -1

b. x = -3


6) Finally, I checked my answers by plugging them back into the original equation (the one that has the absolute value in it). None of the "solutions" worked. This appears to indicate that there is no solution to the problem.


However, x = 2 is a solution.

My question is, how can I follow perfectly valid rules of algebra at every step and yet fail to come up with the solution?



PS: I am not asking how to obtain the solution: I know that. I just don't get how I can do nothing illegal yet fail to get the solution. How does one get x = 2 following the method I used?
DNAunion is offline  
Old 03-27-2003, 10:29 PM   #7
Veteran Member
 
Join Date: Nov 2001
Location: NCSU
Posts: 5,853
Default

Because when you divided you forgot to check if x-2 = 0. In other words your algebra is valid for every value of x except for when x=2.
RufusAtticus is offline  
Old 03-29-2003, 09:33 AM   #8
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

Quote:
RufusAtticus: Because when you divided you forgot to check if x-2 = 0. In other words your algebra is valid for every value of x except for when x=2.
DNAunion: DOH! (as Homer would say)
DNAunion is offline  
Old 03-29-2003, 09:56 AM   #9
Regular Member
 
Join Date: Jun 2000
Location: St. Louis, MO
Posts: 417
Default Don't feel bad

You actually stumbled across one of my favorite math tricks...

Proof that 2 = 1

Let
1) a=b

then multiply both sides by a, giving
2) a^2 = a*b

then subtract b^2 from both sides, giving
3) a^2 - b^2 = a*b - b^2

then factor both sides, giving
4) (a+b)(a-b) = b*(a-b)

then we can cancel (a-b) from both sides, giving
5) (a+b) = b

then substitute a for b (based on #1) giving
6) a+a = a

which simplifies to
7) 2*a = a

dividing both sides by a gives:
8) 2=1


Being very brisque with the phrase "we can cancel (a-b)", I've even left fellow Math majors scratching their heads over this .
Baloo is offline  
Old 03-29-2003, 12:50 PM   #10
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

Quote:
Principia: 4) A Monte Carlo analysis written in Visual Foxpro (!! ) ...
DNAunion: What, you don't consider Visual FoxPro to be a "real" programming language? Fine, give us reasons for your (flawed) personal opinion.

Perhaps it's just that you can't "speak" VFP: maybe you can only decipher a "real" programming language like C++. Well here, I took the time to recode it in that language. Now you can examine the code and point out my errors.


// calcprob.cpp
// This program models choosing lettered tiles from an urn in
// order to calculate an empirical probability

#include <iostream>
#include <stdlib.h> // needed for the rand function
#include <time.h> // needed to get current time to seed rand function
using namespace std;

long GetRandomNumber(int nMin, int nMax);

int main()
{
const int nDiscardTilesOnceChosen = 1;
const int nLetteredTiles = 9;
const int nTargetTiles = 4;
const int nMatchesNeededForSuccess = 1;
const int nTrialsPerIteration = 4;
const long lIterations = 1000000;

int nTrial = 0;
int nMatches = 0;
int nIndex = 0;
int nLooper = 0;
int nFoundOne = 0;
long lIteration = 0;
long lSuccessfulIterations = 0;
char cLetter = ' ';
char cUrn[nLetteredTiles][3];


// Before doing anything else, intialize the random number generator
// using the system clock
srand((unsigned)time(NULL));

// Initialize array (fill the urn with lettered tiles)
// The columns of the multidimensional array breakdown as follows:
// [1] = LETTER: a unique symbol on a tile that gets placed into the urn.
// As the program currently stands, this value is not used.
// [2] = CHOSEN: has this letter/tile already been chosen from the urn?
// If so, it may have been disarded and so not available
// any more, or it may have been replaced and available to
// be selected again. Which occurs for an already selected
// tile depends upon the value of the const variable
// nDiscardTilesOnceChosen.
// [3] = TARGET: is this letter/tile one of the targets?
for (nIndex = 0; nIndex < nLetteredTiles; nIndex++)
{
cLetter = 64 + nIndex;
cUrn[nIndex][1] = cLetter;
cUrn[nIndex][2] = 'F';
cUrn[nIndex][3] = 'F';
}

// Choose x number of tiles from the Urn to serve as targets
for (nLooper = 1; nLooper <= nTargetTiles; nLooper++)
{
nFoundOne = 0;
while (nFoundOne == 0)
{
nIndex = (int) GetRandomNumber(0, nLetteredTiles - 1);
if (cUrn[nIndex][3] == 'F')
{
cUrn[nIndex][3] = 'T';
nFoundOne = 1;
}
}
}



// Begin selecting tiles from the Urn.
for (lIteration = 1; lIteration <= lIterations; lIteration++)
{
// New iteration: need to clear all CHOSEN flags
for (nIndex = 0; nIndex < nLetteredTiles; nIndex++)
{
cUrn[nIndex][2] = 'F';
}
nMatches = 0;


// Give the user some output throughout the process
cout << "Iteration " << lIteration << " of " << lIterations << endl;

for (nTrial = 1; nTrial <= nTrialsPerIteration; nTrial++)
{

// Pull a single tile out of the urn
nFoundOne = 0;
while (nFoundOne == 0)
{
nIndex = (int) GetRandomNumber(0, nLetteredTiles - 1);
if (nDiscardTilesOnceChosen == 0)
{
// Doesn't matter if the tile has been
// chosen previously because selected
// tiles are placed back into the urn.
nFoundOne = 1;
}
else if (cUrn[nIndex][2] == 'T')
{
// This tile has already been chosen and
// discarded: it can't be selected again.
nFoundOne = 0;
}
else if (cUrn[nIndex][2] == 'F')
{
// This tile has not been chosen previously.
nFoundOne = 1;
}
}
// A tile has been chosen: flag it as such
cUrn[nIndex][2] = 'T';

// Does the chosen tile match one of the targets?
if (cUrn[nIndex][3] == 'T')
{
nMatches += 1;
}

// No need to continue pulling tiles for this iteration
// if we've obtained enough matches for success
if (nMatches >= nMatchesNeededForSuccess)
{
nTrial = nTrialsPerIteration + 1;
}
}

// Did we get enough matches for this iteration?
if (nMatches >= nMatchesNeededForSuccess)
{
lSuccessfulIterations += 1;
}
}


cout << "Tiles discarded after being chosen? ";
cout << (nDiscardTilesOnceChosen == 1?"Yes":"No") << endl;
cout << "Number of lettered tiles in Urn: ";
cout << nLetteredTiles << endl;
cout << "Number of target tiles: ";
cout << nTargetTiles << endl;
cout << "Number of matches needed: ";
cout << nMatchesNeededForSuccess << endl;
cout << "Trials per iteration: ";
cout << nTrialsPerIteration << endl;
cout << "Total iterations: ";
cout << lIterations << endl;
cout << "Successful iterations: ";
cout << lSuccessfulIterations << endl;
cout << "Empirical probability: ";
cout << ((float)lSuccessfulIterations / lIterations) * 100 << "%" << endl;

return (0);
}


long GetRandomNumber(int nMin, int nMax)
{
long lRandomNumber;

lRandomNumber = rand();
while (lRandomNumber < nMin || lRandomNumber > nMax)
{
lRandomNumber = rand();
}
return lRandomNumber;
}
DNAunion is offline  
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump


All times are GMT -8. The time now is 03:07 AM.

Top

This custom BB emulates vBulletin® Version 3.8.2
Copyright ©2000 - 2015, Jelsoft Enterprises Ltd.