Freethought & Rationalism Archive

Principia · 03-29-2003, 04:10 PM

You know, on a second reading, maybe what DNAunion is really after is some pat on the back for taking the time to write some Visual FoxPro and C++ code, even though it is based on a flawed model. After all, he has taken great pains in his last post to explain his code to me, as if to suggest that I cannot read code. But, assumptions seemed to be the entire basis of that post.

Anyway, by all means, let's not deprive him of that accolade:

<pat pat>

Principia · 03-29-2003, 04:34 PM

Quote:

DNAunion: It was those more involved ones that I wrote the program to handle. For example, what is the probability of getting 3 matches when there is a subset of 4 out of 9 SIPF dipeptides and a subset of 5 out of 9 prokaryotic ones? Can you show me how to do that in one simple step?

Much has been made about my not answering this "other" problem that DNAunion posed for me. So, without further ado:

P(getting at least 3 matches) =
P(getting exactly 3 or 4 matches) =
(5C3 * 4C1 + 5C4 * 4C0)/9C4 = 5/14

Now the question here is whether or not this demonstration is sufficient to satisfy DNAunion? Or will he just keep peppering me with academic questions because he himself doesn't know of a more elegant way of doing combinatorial analysis?

NB: there is ambiguity in DNAunion's language. He asked for "getting 3 matches" when I calculated how to get "at least 3 matches." I assume the SIPF sequence is the one being matched (all in keeping with Rode's paper). Anyway let's see where this goes.

Principia · 03-29-2003, 06:07 PM

Well, DNA pretty much asked for a critique of his code. So I shall oblige.

First off, a quick comment to the following:

Quote:

In addition, once the code is setup and debugged, the results are guaranteed to be correct � first time, last time, every time - no matter what combination of parameters one uses. But each time someone does such a calculation by hand, there is a chance that an error will sneak in somewhere and the result will be off. I guess Principia considers guaranteeing �accurate� results to be �unscientific�.

Is that so? That's odd, because I would assume that every time I run your code I ought to get a different result. After all you are calling the rand() many times. Who cares how "accurate" is your computer code when the analytical result is exact? Of course, the irony here is in the statement that "once the code is setup and debugged," can DNAunion guarantee a "correct" result. Yet, he demands that a calculation by hand succeed the first time, completely unforgiving of "an error [that] will sneak in somewhere." So whereas DNAunion has the luxury of debugging computer code, people aren't given the same opportunity to fix theoretical results. Right...

Quote:

I guess Principia considers the ability to empower others � providing them with a useful tool that allows them to perform calculations they otherwise could not - to be �unscientific�.

I guess this begs the question of just how useful DNAunion's code really is. Let's start:
1) The code checks on an individual basis each case. That is to say, it only calculates P(n exact matches). Yet, all of the discussion is around P(at least n matches)... not to mention, Rode is talking about cumulative probabilities. Gee, not very useful.

2) The code gives no sense of how many significant figures to believe in the final result. In fact, it reports as many as is permissible in the OS. So, suppose I get the result 38.0542%, what is the actual significant result: 0.4? 0.38? 0.381? 0.3805? Yet, the only way to get this information is to run multiple lIterations to get a sense of the significant digit. (Of course, there is a theoretical answer to this matter, but why would DNAunion believe that?) In any case this brings us to the complexity of the code.

3) In order to accomodate running multiple scenarios quickly (i.e. to have efficient code), one needs to minimize bloated code. Let me give just one example of how DNAunion's code is not efficient:

Code:

long GetRandomNumber(int nMin, int nMax)
{
	long lRandomNumber;

	lRandomNumber = rand();
	while (lRandomNumber < nMin || lRandomNumber > nMax)
	{
		lRandomNumber = rand();
	}
	return lRandomNumber;
}

If you are a programmer, open your eyes. You're not dreaming. That's right. DNAunion generates a random number between nMin and nMax by doing a while loop to discard all random numbers outside of these bounds! So, right off the bat, the code is no longer O(lIterations), but more like O(lIterations * (randMax - (nMax - nMin))). I'd love to see running times and platform information.

4) In any case, I'd like to see what other applications this is useful for. Suppose the SIPF sequences also have equalities. In other words, suppose an amino acid is not listed in the archae dipeptides but has equivalent SIPF yields with another amino acid that is listed with the target archae dipeptides. Can this program handle that scenario? Nope, I don't believe so. Let's check another. I mentioned checking for duplicates between trials. Does the program do that? Nope. (In fact this error "sneaked in somewhere" into DNAunion's result.) This all goes to show:

5) The model as presented by Rode is only good for a quick back of the hand calculation, but further elaborating it won't produce any more significant results. See my first post for what I mean. The probabilistic analysis, though sloppy, does not discredit Rode's work by itself. Neither, for that matter, do I see how DNAunion's 1e-7 discredits Rode's work.

6) DNAunion speaks as if his code ought to be used by others. He talks about "empowering" people and "reusability" and so forth. Having evaluated the code, I can only be thankful that there are many programmers in the world, as well as mathematicians.

Art · 03-30-2003, 07:12 AM

Hi DNAunion (and others),

One minor quibble about the opening post. DNAunion remarked (among other things):

Quote:

So to begin with, Rhode�s comparison is incomplete. What if other amino acids occur more frequently in the �primitive� dipeptides than the nine he looked at? Apparently they are ignored, and the nine of interest are bumped up,...

I think the nine amino acids were chosen because these were the ones for which salt-induced peptide bond formation were available. I'd agree with the insinuations that comparing the SIPF results with the occurrence of didpeptides in extant genomes doesn't make much sense, if for no other reason than that the SIPF studies do not deal with the same set of amino acids.

Having scanned the review by Rhodes but once, I am spurred to wonder just how many mechanisms for the generation of nonrandom dipeptide frequencies can be thought of (or have been established). This consideration has significant bearing on the comparison.

Principia · 03-30-2003, 08:41 AM

Quote:

Originally posted by Principia
RA,

I am guessing for the moment that it means approximately equal distributions. What I'd really like to know are the numbers besides the SIPF ones that Rode used (e.g. the archae dipeptide distributions).

Let me see if I can track them down.

Well, I found the numbers. They come from the original paper that studied dipeptide distributions:

Quote:

Rode BM, Eder AH, Yongyai Y. Amino acid sequence preferences of the salt-induced peptide formation reaction in comparison to archaic cell protein composition. Inorgan Chim Acta 1997;254:309.

A relevant comment from the discussion in the paper:

Quote:

A direct quantitative correlation of the values by statistical means did not appear very promising, at least with linear statistical methods. Too many other factors would have influenced a relation (if existing) between the probability of peptide linkage formation in chemical evolution and the probability to recognize these preferences in the biological proteins. First of all, the question of the relative availabiltiy of amino acids formed from simple inorganic matter by atmospheric and surface eprocesses is not solved, but would have had its impact on the actual opportunity to form small peptide units. It can, therefore, only be assumed that the simpler amino acids should have had a greater chance to be formed in such processes. Further, despite of some encouraging recent results [20, 21] we do not know many details about possible mechanisms by which chain elongation of small oligopeptides to larger subunits could have taken place, and such a mechanism would certainly influence also preferred of peptide links too.

Nevertheless, the prevailing dipeitide units formed under the condititions of the primitive earth should be reflected to some extent in the composition of biomatter, especially if it was mainly a single type of reaction that had created them. Thus, if the salt-induced peptid formation were this particular reaction, it should be possible to find at least some similarities between the reaction-inherent sequence preferences and those found in primitive organisms, where the mechanisms of protein biosynthesis should not have been the determining factor for them. Such a coincidence would certainly also support the assumption that biological protein syntehsis via RNA was developed following an already existing, very basic framework of peptide matrices, which was reproduced (with certain modifications) by the new, more efficient synthesis mechanism.

Needless to say, the authors seemed a little more careful in their initial presentation of the data than in the review article cited in the OP. I agree with them to the extent that their data isn't really amenable to simple correlation studies, and thus explaining why they resorted to their kludgy rank-ordered statistic (which turned out to be in error as well). But, as Art points out above, do we really have a competing hypothesis? Absent that, I think there are some interesting coincidences presented by the authors that won't go away regardless of the statistics.

PS: Some interesting prebiotic peptide synthesis articles from other groups:

Quote:

N-carbamoyl amino acid solid-gas nitrosation by NO/NOx: A new route to oligopeptides via alpha-amino acid N-carboxyanhydride. Prebiotic implications
Taillades J, Collet H, Garrel L, Beuzelin I, Boiteau L, Choukroun H, Commeyras A
JOURNAL OF MOLECULAR EVOLUTION 48 (6): 638-645 JUN 1999
Abstract: Abstract:
alpha-N-Carbamoyl amino acid (CAA), whose conditions of formation in a prebiotic hydrosphere have been described previously (Taillades et al. 1998), could have been an important intermediate in prebiotic peptide synthesis through reaction with atmospheric NO,. Nitrosation of solid CAA (glycine or valine derivative) by a 4/1 NO/O-2 gaseous mixture (1 atm) yields N-carboxyanhydride (NCA) quantitatively in less than 1 h at room temperature. The crude solid NCA undergoes quantitative oligomerization (from trimer to nonamer under the conditions we used) when treated with a (bi)carbonate aqueous buffer at pH 9. We therefore suggest that part of the prebiotic amino acid activation/polymerization process may have taken place in a dry phase ("drying-lagoon" scenario).

Quote:

Peptides by activation of amino acids with CO on (Ni,Fe)S surfaces: Implications for the origin of life
Huber C, Wachtershauser G
SCIENCE 281 (5377): 670-672 JUL 31 1998
Abstract: In experiments modeling volcanic or hydrothermal settings amino acids were converted into their peptides by use of coprecipitated (Ni,Fe)S and CO in conjunction with H2S (or CH3SH) as a catalyst and condensation agent at 100 degrees C and pH 7 to 10 under anaerobic, aqueous conditions. These results demonstrate that amino acids can be activated under geochemically relevant conditions. They support a thermophilic origin of Life and an early appearance of peptides in the evolution of a primordial metabolism.

Quote:

Prebiotic oligomerization on or inside lipid vesicles in hydrothermal environments
Tsukahara H, Imai EI, Honda H, Hatori K, Matsuno K
ORIGINS OF LIFE AND EVOLUTION OF THE BIOSPHERE 32 (1): 13-21 FEB 2002
Abstract: Oligomerization of amino acids proceeded on or inside lipid vesicles as a model of prebiotic cells in a simulated hydrothermal environment. When the suspension of lipid vesicles taking up monomeric glycine underwent a sudden temperature drop by traversing from a hot (180 degreesC) to a cold (0 degreesC) region repeatedly while circulating through a closed reaction circuit, oligopeptides up to heptaglycine were formed even in the absence of condensing agents.

Quote:

Peptide bond formation in gas-phase ion/molecule reactions of amino acids: a novel proposal for the synthesis of prebiotic oligopeptides
Wincel H, Fokkens RH, Nibbering NMM
RAPID COMMUNICATIONS IN MASS SPECTROMETRY 14 (3): 135-140 2000
Abstract: There is a general fascination with regard to the origin of life on Earth. There is an intriguing possibility that prebiotic precursors of life occurred in the interstellar space and were then transported to the early Earth by comets, asteroids and meteorites. It is probable that some part of the prebiotic molecules may have been generated by gas-phase ion/molecule reactions, Here we show experimentally that gaseous ion/molecule reactions of the amino acids, Glu and Met, may promote the synthesis of protonated dipeptides such as (Glu-Glu)H+ and (Glu-Met)H+ and their chemical growth to larger protonated peptides.

DNAunion · 03-30-2003, 10:01 AM

Quote:

Principia: 3) As a matter of fact, there exists better probability studies than the one proposed by Rode. That is to say, there exists several flaws that are more significant than the ones that DNAunion picked up on, which of course still remained in DNAunion's analysis. First, certain linkages are counted twice in the analysis -- a fact which completely escaped DNAunion.

DNAunion: That I didn�t mention something does not mean it completely escaped me. More on this just below.

Quote:

Principia: For instance Ala-Ala is counted in both the A-B and B-A linkages. Why didn't DNAunion notice something this obvious, especially when he used the Ala example?[/i]

DNAunion: I did notice this and pondered it. I concluded that it is not incorrect for Ala-Ala to be counted in both the A-B and B-A linkages.

First point is that proteins � and even dipeptides � have an intrinsic directionality: one end is the C-terminus and the other the N-terminus. By convention the numbering of amino acid residues � and thus the order of the amino acids - starts at the N-terminus and ends at the C-terminus. Thus, A-B and B-A linkages are different. Here's an explanatory example.

Looking at the Archaebacteria entries for Ala we can construct the following, where we are looking at Ala being the third amino acid in a chain of five (with x denoting any amino acid).

A-B
N-x-x-Ala-Ala-x-C
N-x-x-Ala-Asp-x-C
N-x-x-Ala-Glu-x-C
N-x-x-Ala-Gly-x-C
N-x-x-Ala-Leu-x-C

B-A
N-x-Ala-Ala-x-x-C
N-x-Glu-Ala-x-x-C
N-x-Val-Ala-x-x-C
N-x-Leu-Ala-x-x-C

So we can see that the A-B entries are looking at a given amino acid (here Ala) and showing which amino acids are most likely to FOLLOW it in the examined proteins from archaebacteria, whereas the B-A entries show which amino acids are most likely to PRECEDE it.

The fact that Ala is most likely to FOLLOW Ala and that it is most likely to PRECEDE Ala are two separate facts that just happen to coincide.

To further show that A-B and B-A are different facts, note that the entries for archaebacteria for Ala are not symmetric.

1) The A-B amino acids listed do not include Val, whereas the B-A list does.

2) The B-A amino acids listed do not include Gly, whereas the A-B list does.

3) The B-A amino acids listed do not include Asp, whereas the A-B list does.

So Ala-Ala needed to be included in both the A-B and B-A linkages.

What Principia may mean � his statement is a bit too vague for me to be absolutely sure what he is saying � is that the total number of Ala-Ala occurrences should be "split" between A-B and B-A because they are two separate facts; and that this would throw Ala further down in the rankings. If that is what he means, then he should have stated so...clearly.

But more importantly, it wouldn�t make a difference for Ala-Ala. Even if Ala-Ala values are halved in the two tables (table 6 and table 7), Ala would still be one of the top 4 for A-B and for B-A, so the number of coincidences would not change: no amino acids would drop out of the top 4 and none would be added.

DNAunion · 03-30-2003, 10:19 AM

Quote:

Principia: DNAunion however claims that 1e-7 is not significant enough.

***********************************
Again, crucial here is how small or large the probability is. Unfortunately for the argument, it is nowhere near 1 in 10^18.
***********************************

At this point, any good scientist would ask: So what? The burden of proof is clearly on DNAunion, since he is the one asserting that

***********************************
I have invested the time needed to look at [Rode�s] claim and find it erroneous.
***********************************

Indeed, at 1e-7 DNAunion easily dismisses the SIPF hypothesis as "erroneous" . . . to the detriment of his credibility.

DNAunion: Quoting out of context and your typical caustic deriding. Good job Principia!

Let�s look at my statement IN CONTEXT, shall we.

Quote:

DNAunion: If he was correct, then the probability that the SIPF would preferentially produce the same dipeptides found in �primitive� proteins, by chance alone, was 1 in 10^18. Who would argue against those odds by claiming that the SIPF reaction was not involved? I have invested the time needed to look at his claim and find it erroneous. Before explaining in detail why his calculations are wrong, I will present his case from the article.

DNAunion: Note how in full context my statement �his claim is erroneous� is tied to both �his calculations are wrong� and to �If he was correct, then the probability that the SIPF would preferentially produce the same dipeptides found in �primitive� proteins, by chance, was 1 in 10^18.�

What I found erroneous was his claim that the match between the SIPF dipeptides and the �primitive� proteins had a probability of occurring by chance of only 1 in 10^18. And I was correct: his claim was erroneous - even Principia noted this�

Quote:

Princpia: Did Rode do a sloppy job in the probabilistic analysis? Yes. Did DNAunion catch it? Yes.

DNAunion · 03-30-2003, 10:34 AM

Quote:

Principia: 1) Whether it's 1e-7 or 1e-18, the actual magnitude does not matter so much as the statistical significance of the number. � DNAunion however claims that 1e-7 is not significant enough.

DNAunion: No, that�s a misrepresentation of what I said. Take another look at my statements about the statistical significance of the probabilities.

Quote:

DNAunion: A key point here is that the smaller the probability of the correspondence between SIPF and primitive proteins is, the more likely is the possibility that the SIPF was involved in the creation of those primitive proteins. For example, if the match between the SIPF dipeptides and those in primitive proteins was only, say, 1 in 5, then there would be little to no statistical significance to the match: chance alone would be an adequate explanation . But with a probability of only 1 in 10^18, chance alone can hardly be relied upon as being the best explanation: there must be some connection between the two.

DNAunion: I said that a 1 in 5 probability (0.2) would have no (or little) statistical significance. I didn�t say a probability of 1 in 10^7 wouldn�t.

My only other statement about statistical significance was as follows:

Quote:

DNAunion: For years now I just could not get over the statistical significance of the correlation between the dipeptides formed by the SIPF [salt-induced peptide formation] reaction and those found in �primitive� proteins as reported by Bernd Michael Rode in his 1999 article �Peptides and the Origin of Life�. If he was correct, then the probability that the SIPF would preferentially produce the same dipeptides found in �primitive� proteins, by chance alone, was 1 in 10^18. Who would argue against those odds by claiming that the SIPF reaction was not involved?

DNAunion: So concerning the statistical significance of the probability of matches between the SIPF and archaebaterial proteins I mentioned only a lower and an upper bound: 1 in 5 is too large of a probability to reject chance and 1 in 10^18 is so small that few if any would argue that it was chance alone.

Now, last time I checked, 1 in 10^7 falls somewhere in between 1 in 5 and 1 in 10^18. Therefore, it does not meet or exceed either of the probabilities to which I attached any statistical significance.

PS: As I will point out in my next post, the probability is likely much larger than 1 in 10^7 because of a problem with Rode�s calculation I pointed out but could not eliminate in my calculations due to Rode's incomplete information.

Principia · 03-30-2003, 10:39 AM

Quote:

To further show that A-B and B-A are different facts, note that the entries for archaebacteria for Ala are not symmetric.

1) The A-B amino acids listed do not include Val, whereas the B-A list does.

2) The B-A amino acids listed do not include Gly, whereas the A-B list does.

3) The B-A amino acids listed do not include Asp, whereas the A-B list does.

So Ala-Ala needed to be included in both the A-B and B-A linkages

False. The experimental protocol that Rode followed (had you bothered to research it carefully) did not distinguish between either residue for homodimers. Try again.

EDIT: Note, DNAunion is saying here that Ala joining Ala via the Nterminal of the 2nd Ala is a different process than Ala joining Ala via the Cterminal of the 2nd Ala.

Principia · 03-30-2003, 10:43 AM

Quote:

DNAunion: So concerning the statistical significance of the probability of matches between the SIPF and archaebaterial proteins I mentioned only a lower and an upper bound: 1 in 5 is too large of a probability to reject chance and 1 in 10^18 is so small that few if any would argue that it was chance alone.

Really? What happened to Dembski's UPB of 1e-500? I thought that was absolutely required to eliminate chance hypotheses?

Quote:

Now, last time I checked, 1 in 10^7 falls somewhere in between 1 in 5 and 1 in 10^18. Therefore, it does not meet or exceed either of the probabilities to which I attached any statistical significance.

And what is the threshold that you require to exceed? Better yet, how does one objectively draw that threshold?

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search

03-29-2003, 04:10 PM	#31
Principia Veteran Member Join Date: Mar 2002 Location: anywhere Posts: 1,976	You know, on a second reading, maybe what DNAunion is really after is some pat on the back for taking the time to write some Visual FoxPro and C++ code, even though it is based on a flawed model. After all, he has taken great pains in his last post to explain his code to me, as if to suggest that I cannot read code. But, assumptions seemed to be the entire basis of that post. Anyway, by all means, let's not deprive him of that accolade: <pat pat>

Freethought & Rationalism Archive

The archives are read only.