Freethought & Rationalism Archive

DNAunion · 03-30-2003, 10:46 AM

Quote:

Principia: I am still waiting to hear why 1e-7 is such a bad value so as to invalidate Rode's conclusions.

DNAunion: Who are you waiting for? I didn�t say a probability of 1 in 10^7 invalidated Rode�s conclusions. It does, however, invalidate his calculation: 1 in 10^18.

Furthermore, the probability of matching being due to chance is probably much larger than 10^-7. Remember what I said in my first post?

Quote:

DNAunion: So to begin with, Rhode�s comparison is incomplete. What if other amino acids occur more frequently in the �primitive� dipeptides than the nine he looked at? Apparently they are ignored, and the nine of interest are bumped up, possibly moving from outside of the top four to being within it. If so, they would be counted as hits even though they were actually too far down in line originally to be counted as such. For the rest of the discussion, this potential flaw will be overlooked.

DNAunion: It was ignored because Rode did not provide the information needed to �unignore� it.

However, since the rule is that the number of amino acids found in biological proteins is 20, and Rode took only 9 into consideration, it is very likely that looking at more than twice the number of amino acids that Rode did would result in many of the amino acids in the top 4 being moved out of the top 4 because some of the other 11 would be inserted. Thus, the number of matches would decrease, and the probability of the degree of combined matches would become larger.

Off the top of my head � if Principia wants to show us the calculation, he is more than welcome to � I will assume (yes, assume) that doubling the number of amino acids would cut in half the number of matches and raise the probability to something like 1 in 10^4.

Principia · 03-30-2003, 10:48 AM

Quote:

Off the top of my head � if Principia wants to show us the calculation, he is more than welcome to � I will assume (yes, assume) that doubling the number of amino acids would cut in half the number of matches and raise the probability to something like 1 in 10^4.

Off the top of his head? Gee, and just earlier he told us that doing calculations without a computer program allowed errors "to sneak in somewhere." Nope, I guess we can't trust DNAunion's judgment here, if he can't trust himself.

Go ahead and assume all you want on the basis of negative evidence. Isn't that what your IDiot buddies do all the time?

DNAunion · 03-30-2003, 10:57 AM

Quote:

Principia: I guess this begs the question of just how useful DNAunion's code really is. Let's start:

1) The code checks on an individual basis each case. That is to say, it only calculates P(n exact matches). Yet, all of the discussion is around P(at least n matches)...

DNAunion: Wrong. My code calculates �at least x matches�, not �exactly x matches�. Here, let me point the lines of code that show this.

From Visual FoxPro (the original program)

Code:

* No need to continue pulling tiles if we have
* enough matches already
IF (lnMaches >= lnMatchesNeededForSuccess)
	EXIT
ENDIF

And from the quickly thrown together C++ translation of the VFP program

Code:

// No need to continue pulling tiles for this iteration 
// if we�ve obtained enough matches for success
if (nMatches >= nMatchesNeededForSuccess)
{
	nTrial = nTrialsPerIteration + 1;
}

What those lines of code do is stop pulling tiles from the urn once enough matches have been found. Therefore, if it were allowed to continue, we could have gotten either matches or misses. Thus, the calculation is for AT LEAST x MATCHES.

DNAunion · 03-30-2003, 11:00 AM

Quote:

Principia: ... not to mention, Rode is talking about cumulative probabilities. Gee, not very useful.

DNAunion: No, very useful.

To show me wrong, show us how you would do all the calculations for the entire table in one shot.

Principia · 03-30-2003, 11:05 AM

Quote:

To show me wrong, show us how you would do all the calculations for the entire table in one shot.

Oh, I have shown you to be wrong plenty on this thread. But, let's count the number of trivial requests you have made of me this thread:
1) to solve the high school algebra problem
2) to solve the one case of Rode's problem that turned out to be 5/14
3) to demonstrate the uselessness and inefficiency of your code
I might be missing a couple, but I have replied to all of these requests. It is quite clear however that DNAunion is going to keep pestering for me to jump through his hoops. I am sorry, but I have better things to do; and I have been more than generous thus far.

Principia · 03-30-2003, 11:11 AM

More to the point, I contend that DNAunion's approach to combinatorial analysis is obviously inefficient. I have challenged him to tell us in detail how to interpret the significant digits of his code on one run alone. But, instead, he is giving us lectures on the differences between C and C++.

I have challenged him to show us how long it takes to run the algorithm for the entire table. But instead he is giving me lectures on how to read his spaghetti code. I think the pattern is clear

DNAunion · 03-30-2003, 11:19 AM

Quote:

Principia: 3) In order to accomodate running multiple scenarios quickly (i.e. to have efficient code), one needs to minimize bloated code. Let me give just one example of how DNAunion's code is not efficient:

Code:

long GetRandomNumber(int nMin, int nMax)
{
	long lRandomNumber;

	lRandomNumber = rand();
	while (lRandomNumber < nMin || lRandomNumber > nMax)
	{
		lRandomNumber = rand();
	}
	return lRandomNumber;
}

If you are a programmer, open your eyes. You're not dreaming. That's right. DNAunion generates a random number between nMin and nMax by doing a while loop to discard all random numbers outside of these bounds!

DNAunion: Too bad you can't read Visual FoxPro code. If you could, you would see that in the original program - not the quickly thrown together C++ translation I made while also posting at two discussion forums on the web - I performed this task more efficiently.

Code:

FUNCTION GetRandoumNumber(lnMin, lnMax)
	LOCAL lnRandomNumber
	* The pseudo-random number generator was already seeded with the system
	* clock � all calls after that initialization should not pass any value
	DO WHILE .T.
		lnRandomNumber = (FLOOR(RAND() * 1000) % lnMax) + 1
		IF (lnRandomNumber >= lnMin AND lnRandomNumber <= lnMax)
			EXIT
		ENDIF
	ENDDO
	RETURN lnRandomNumber
ENDFUNC

The modulus operation guarantees a result between 0 and 1 less than the lnMax. 1 is then added to it to obtain a number that lies between 1 and lnMax, inclusive. This should work on the very first iteration � the DO WHILE�ENDDO loop construct is used only to ensure that if I overlooked anything that an invalid result would not be returned.

When making the quick translation to C++ (as I said above, during which time I continued to post on the web at two discussion forums) I opted for the quickest method of getting the program up and running. Note also what I said:

Quote:

DNAunion: In addition, once the code is setup and debugged,�

DNAunion: As this suggests, the C++ program is not completed. It performs the function it was designed for, but it is not ready for handing over to others (for example, instead of defining the parameters as constant variables it should allow the user to pass values into the program from a command prompt).

Also, I didn�t ask you to critique the style of the program, I asked you to find errors. You haven�t.

Finally , I would point out that your comments about my writing a program to do the calculations related directly to the Visual FoxPro program�not the later C++ translation of it.

PS: You might want to learn how to spell accommodate.

Principia · 03-30-2003, 11:24 AM

Quote:

But more importantly, it wouldn�t make a difference for Ala-Ala. Even if Ala-Ala values are halved in the two tables (table 6 and table 7), Ala would still be one of the top 4 for A-B and for B-A, so the number of coincidences would not change: no amino acids would drop out of the top 4 and none would be added.

I'd like to revisit this matter briefly. Remember that both Rode and DNAunion multiplied probabilities from each amino acid for both A-B and B-A linkages. Now, mathematically, to permit such a calculation, one assumes independence. In other words: p(A|B) = p(A). In this case A and B represent the event of coincidences for a particular amino acid. To see how this is false, let's merely take a look at the case of Ala-Ala homodipeptides:

p(Ala-Ala in SIPF and Archae for B-A linkages | Ala-Ala in SIPF and Archae for A-B linkages) = 1

since if it happens in A-B linkages for a dimer, it must happen in B-A linkages as well. Remember that Rode does not distinguish (e.g. by labeling) the A or B residues. His table 8 has homodipeptides on one diagonal, not two. So, in any event, there is only one number to work with.

I have no idea what DNAunion is talking about wrt to "splitting" A-B and B-A values. I never advocated such a process. If anything, I suggested that double-counting indicates strongly that Rode should have used another statistical model.

DNAunion · 03-30-2003, 11:29 AM

Quote:

Principia: Oh, I have shown you to be wrong plenty on this thread.

DNAunion: And I you. Learn to read computer code yet LOL!!!!!!!

Quote:

Principia: But, let's count the number of trivial requests you have made of me this thread:

1) to solve the high school algebra problem

DNAunion: Which you did in a caustic, acrimonious style - twice - instead of like the other two respondents who acted like mature, friendly adults.

As soon as RufusAtticus pointed out my oversight - which unlike you Principia, he did in manner consistent with the rules of this board - I slapped myself in the head for not having seen it sitting right there in front of me.

Principia · 03-30-2003, 11:33 AM

Quote:

Also, I didn�t ask you to critique the style of the program, I asked you to find errors. You haven�t.

You want an error in your code? Fine:

Code:

FLOOR(RAND() * 1000)

results in a biased distribution. More to the point, if RAND() generates a random number between 0 to 1, then RAND()*1000 will generate a number between 0 to 1000. FLOORing it will generate a number between 0 to 999 since it truncates. So you have biased your uniform distribution towards 0 and established a different uniform distribution (between 0 to 999 and never 1000) than you had expected. Satisfied?

PS: Let me give a clear demonstration of this. Suppose I just take FLOOR(RAND()) (where I assume RAND() generates floats between 0 and 1), then that will return 0 always. Clearly, not a very uniform distributiono between 0 and 1 (where I'd expect 50% of each).

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search

03-30-2003, 11:11 AM	#46
Principia Veteran Member Join Date: Mar 2002 Location: anywhere Posts: 1,976	More to the point, I contend that DNAunion's approach to combinatorial analysis is obviously inefficient. I have challenged him to tell us in detail how to interpret the significant digits of his code on one run alone. But, instead, he is giving us lectures on the differences between C and C++. I have challenged him to show us how long it takes to run the algorithm for the entire table. But instead he is giving me lectures on how to read his spaghetti code. I think the pattern is clear

Freethought & Rationalism Archive

The archives are read only.