FRDB Archives

Freethought & Rationalism Archive

The archives are read only.


Go Back   FRDB Archives > Archives > IIDB ARCHIVE: 200X-2003, PD 2007 > IIDB Philosophical Forums (PRIOR TO JUN-2003)
Welcome, Peter Kirby.
You last visited: Yesterday at 05:55 AM

 
 
Thread Tools Search this Thread
Old 04-01-2003, 04:31 PM   #81
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

DNAunion: I’ll try to deal with only one point at a time.

Quote:
DNAunion: ... the ARCHAEBACTERIAL Ala-Ala has to be included in both A-B and B-A sides of the table, and that is correct.

Now, if one is going to compare the archaebacterial pairs with the same pairs from the SIPF reaction to see how matches occur, then that person has to compare archaebacteria A-B Ala-Ala to SIPF A-B Ala-Ala, and archaebacterial B-A Ala-Ala to SIPF B-A Ala-Ala. Since the dipeptide Ala-Ala is the same for A-B and B-A which is NOT the case for the archaebacterial proteins...
Quote:
Principia: And this premise is patently false. Look, there are many ways of saying this, but it seems like I have but tried them all. Perhaps a visual demonstration is in order. Suppose I give DNAunion the following sequence: D-E-Ala-Ala-R-Q in an archaebacterium, how does one tell whether or not the Ala-Ala is an A-B linkage or a B-A linkage a posteriori?
DNAunion: That sequence would count as several things in regards to Ala linkages.

A-B linkages:
...Ala = 1
...R = 1

B-A linkages:
...E = 1
...Ala = 1

In the protein, that is not double counting Ala-Ala: it should be counted both ways, as A-B and B-A. Why?

The simplest explanation is that in the sequence Ala both precedes Ala and follows Ala. So it is both A-B and B-A.

A second way of looking at it.

Using just that one sequence (the underlying logic wouldn't change if we had more), we see that Ala is just as likely to follow Ala as R is. If we drop that Ala-Ala pair, then we’ve lost information: we would then be led to believe that only R follows Ala in that sequence, and that is incorrect. Similarly, the sequence shows us that Ala is just as likely to precede Ala as E is. Again, if we drop that Ala-Ala pair, we would be losing information.

*********************************

PS: The following is for any readers who are not familiar with the fact that the order of amino acids in a protein is set and cannot be reversed: if someone gives you Ala-Gly, for example, you can’t reverse it and say that it is Gly-Ala. In the sequence Ala-Gly, Gly follows Ala - you can't turn it around to where Gly precedes Ala.

Proteins have an intrinsic directionality due to their having different functional groups at the two ends. The order of amino acids in a protein is determined by which end is the C-terminus and which end is the N-terminus. One starts counting at the N-terminus with that amino acid being numbered 1; the next amino acid is numbered 2; and so on down the line until reaching amino acid number n, at the N-terminus. When a protein’s aa-sequence is listed, it is given, by convention, this way.

Thus, the dipeptides N-Ala-Glu-C and C-Ala-Glu-N are different sequences, even though they both contain only Ala and Glu bonded together and are written in the same order in both dipeptides. The second one is not listed according to the convention; it is actually N-Glu-Ala-C, and that is how sources would list it.
DNAunion is offline  
Old 04-01-2003, 04:57 PM   #82
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

Quote:
DNAunion: I didn’t say a probability of 1 in 10^7 invalidated Rode’s conclusions.
Quote:
Principia: Hmm... not a very decisive kill at all, when DNAunion can't even say that 1e-7 invalidated Rode's conclusions.
DNAunion: It is a kill…of exactly what I said I killed.

Quote:
DNAunion: My point was to show that Rode’s calculation was way off and I did so (as Principia pointed out).

So what's the big deal about my following in Rode’s footsteps and using 1? Nothing really. With or without this Ala-Ala problem, I showed that Rode’s calculation is multiple orders of magnitude off. I killed his calculation, and I needed to do so only once, in only one way. And that I did (as Principia pointed out).
DNAunion: You seem to be under the impression that I claimed to have killed Rode’s conclusions, or that I HAVE to kill Rode's conclusions. Neither is true.

I'll address only the former (since I am short on time).

I didn’t say that I killed Rode's conclusions, just as the statement of mine you quoted indicates. In fact, I pointed out that the empirical probability I calculated fell somewhere BETWEEN the upper and lower limits that should cause someone to reject or accept his conclusion: my number fell in a “gray area” that I didn’t address. Thus, I made no claim of having refuted Rode’s conclusion, just his calculation.

As far as what I did imply about his conclusion…

I said that with a probability of only 1 in 10^18 (Rode’s calculated probability) that one "shouldn’t" argue with the connection between the SIPF and the ‘primitive’ prokaryotic proteins. But, with the actual probability of the match by chance being at least 11 orders of magnitude more likely than what Rode’s calculated, then one is not “forced” to accept his conclusion.

To try to put it another way: it went from "having to accept it" to "not having to accept it". The latter is different from both "having to reject it" and "having to accept that the opposite is true".


PS: I'll be out of state for awhile so I won't be able to address everything Principia brought up, nor will I be able to quickly follow up any counters he might raise.
DNAunion is offline  
Old 04-01-2003, 05:42 PM   #83
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

Quote:
Principia: P(no matches to first 4 residues) = 5C4/9C4 = 5/126, where nCr is shorthand for n!/(n-r)!/r!.

DNAunion: Never mind. Principia is correct, he just wrote it differently than I've seen it.


I've seen it as nCr = n!/[(n-r)!*r!] and at first sight it looks like Principia has a division sign where a multiplication sign should be: (n-r)!/r! instead of (n-r)!*r!. But, since you do the divisions in order from left to right, that second division changes to a multiplication by the reciprocal, which leads back to the same thing.

To make it simpler, I'll use these conventions: a = n!; b = (n-r)!; and c = r!.

I've seen it as nCr = a/(bc), whereas Principia shows it as nCr = a/b/c. See, that last operation looks wrong at first sight - it's a division but it should be a multiplication, right? No, both are correct because they are both the same. The first division is done first. [a/c]/b = [a/c]/[b/1] = a/c * 1/b = a/(bc)
DNAunion is offline  
Old 04-01-2003, 06:22 PM   #84
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

DNAunion: To anyone EXCEPT Principia.

I am trying to figure out how he came up with this.

Quote:
Principia: PP(no matches to first 4 residues) = 5C4/9C4 = 5/126, where nCr is shorthand for n!/(n-r)!/r!.
DNAunion: I understand that the general idea is to get an equation that shows, loosely phrased, "the number of ways of getting 4 non-matches, divided by, the total number of ways of drawing 4 items from set of 9".

I understand that the 9C4 is figuring out how many possible unique (unordered) combinations there are when taking 4 items at a time from a set of 9. I also know how to get 9C4 for "taking 4 amino acids out of a set of 9".

And 5C4 indicates how many combinations there are when taking 4 items at a time from a set of 5. That's kind of where I get stuck. How does one go from, "No matches in first 4 residues", to 5C4?

Could someone please explain that to me, nicely.

PS: I see some "number coincidences" in 5C4, but can't put them together. Is it saying that there are 5 non-matches, and 5C4 combinations for taking 4 at a time from those 5 non-matches? If so, how is it determined that there are 5 non-matches (and 4 matches)?

*****************************

Wait a minute! I think I got it now, and it's simple.

In loose terms, on the bottom there are 4 residues selected from the 9 (the 9C4). For those on the top to NOT match any of those on the bottom -- since we are looking for P(no matches to first 4 residues) -- those on the top have to be from a subset of the 9, with that subset consisting of the other 5 - the five that WEREN'T chosen on the bottom (there's the "number coincidence": now 9 - 4 = 5 has some meaning to me here). Thus, for the top, we need to select 4 items from a set of 5: that gives 5C4.


******************************

PPS: One may wonder why I left this long post as it is instead of replacing it with a simple "Never mind". The reason is that there might be people here besides me who are only at my level of understanding of math. For them, following along as I try to work out every step may help them understand too.
DNAunion is offline  
Old 04-02-2003, 03:24 PM   #85
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

DNAunion: I was going to overwrite one of my above posts with this one, but I can't (not allowed to after 20 minutes).

This post is divided into two parts. The second part is a question I have that I would like answered. The first part is not for people who already know how to do the calculations; it is for any people who are interested in learning some of the underlying logic by following along as I work through it for myself.

FIRST PART
I will take a single example and work through it (my “definitions” are working definitions; not formal ones). In the more complicated fashion:

What is the probability of getting exactly three matches when comparing the list of the top 4 out of 9 amino acid “partners” for Ala in the SIPF dipeptides with the list of the top 4 out of 9 amino acid “partners” for Ala from the ‘primitive’ archaebacterial proteins?

Our goal is to get to a simple ratio, something like:
P(3 matches) = number of ways to get 3 matches / total number of ways of selecting 4 things from a set of 9

Let’s start with the bottom (“total number of ways of selecting 4 things from a set of 9”). This is where the nCr notation comes in. nCr indicates the number of unique, unordered sets – i.e., combinations (thus the C) - that can be formed by taking r items at a time from a set of n items. For example, how many ways can you select 2 items from a set of 3? The notation would be 3C2. To solve it, we need to know that nCr = n!/[(n-r)!*r!]. Let’s plug the numbers for n and r in.
nCr = n!/[(n-r)!*r!]
3C2 = 3!/[(3-2)!*2!]

The ! indicates factorial. In non-formal terms, x! = x * (x-1) * (x-2) * (x-3) … * 1 (note that 0! is defined as 1). So 3! would be 3 * 2 * 1. That gives us:
3C2 = 3!/[(3-2)!*2!] = (3 * 2 * 1)/[1 * (2 * 1)] = 6/2 = 3
Let’s see if that holds up. Let the full set be {A, B, C}. How many different ways can we pull out two of those? {A, B}, {A, C}, {B, C}. Three ways…it checks.

So for our problem dealing with amino acids, the bottom is “total number of ways of selecting 4 things from a set of 9”. That gives us 9C4, or equivalently, 9!/[(9-4)!*4!] = 126 (I didn’t show the steps involved in simplifying the factorials here, but from what was given just above one should be able to work it out).

So far we have: P(3 matches) = number of ways of getting 3 matches / 9C4. We next have to figure out the top part: “number of ways of getting 3 matches”. This is a little bit trickier (this is the part that was hanging me up for so long). To do this, we have to calculate two combinations (two nCr’s).

Think about it like this. The selection of 4 items from the full set of 9 created two subsets: one that consists of the 4 items that WERE selected and another that consists of the 5 items that were NOT selected. To successfully select 3 items given 4 tries, we have to get 3 from the set of 4 that were already selected AND 1 from the set of 5 that were NOT selected. That’s why we need two combinations: we are “selecting r objects from a set of n” two times. Thus, we need both 4C3 (3 matches from the set of 4 selected previously) and 5C1 (1 non-match: 1 from the set of 5 that weren’t selected previously).

Okay, we need two…but how do we combine them to figure out the number of ways of getting 3 matches? Do we just add them together: “task 1 can be done 5 ways and task 2 can be done 3 ways, so together they can be done 8 ways”? No. If task 1 can be done in x number of ways, and task 2 can be done in y number of ways, then the two tasks together can be done in x * y ways. So the number of ways of getting three matches is 4C3 * 5C1. We now have our final equation.

P(3 matches) = (4C3 * 5C1) / 9C4
(Yes, the parentheses around the first two combinations are superfluous – but they add some degree of clarity)

We already saw that 9C4 = 126, so we have P(3 matches) = (4C3 * 5C1) / 126

4C3 = 4!/[(4-3)!*3!] = 4
and
5C1 = 5!/[(5-1)!*1!] = 5

P(3 matches) = (4C3 * 5C1) / 9C4 = (4 * 5) / 126 = 20 / 126 = 0.158730159

Therefore, you have about a 16% chance of getting three matches.

And that’s how it’s done (at least I am pretty sure).


SECOND PART: THE QUESTION
I used the above method to calculate the probabilities for a slew of different mixes of parameters and was comforted to see that it all seemed to work. For example, I calculated the probabilities of getting exactly 0, exactly 1, exactly 2, exactly 3, and exactly 4 matches when comparing a list of the top 4 SIPF amino acid “partners” with a list of the top 4 archaebacterial amino acid “partners” for the same amino acid – each probability looked legitimate on its own, and more importantly, the sum of the individual probabilities was 100%. So the probability of getting exactly 0, exactly 1, exactly 2, exactly 3, or exactly 4 matches given 4 shots is 1; just as it should be. Other sets of parameters worked also.

However, for one set of parameters I tried something failed. The full “master” set of amino acids was still 9 (just as it has been). The number of amino acid “partners” listed for the archaebacterial protein was set to 5 and the number of amino acid “partners” listed for the SIPF dipeptides was set to 3. So it was looking at having 3 shots to make matches with 5 specified items. I setup the equations as follows:

P(0 matches) = (5C0 * 4C3) / 9C5 = 4/126
P(1 match....) = (5C1 * 4C2) / 9C5 = 30/126
P(2 matches) = (5C2 * 4C1) / 9C5 = 40/126
P(3 matches) = (5C3 * 4C0) / 9C5 = 10/126

Each one seems fairly reasonable on its own, but their sum falls way short of 100%: it comes to only 84/126. I don’t get it (yeah I know, I goofed somewhere) – when selecting 3 items to try to match something, the possibility of getting either exactly 0, exactly 1, exactly 2, or exactly 3 matches is exhaustive. Where’s the missing 42/126?
DNAunion is offline  
Old 04-02-2003, 04:20 PM   #86
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

DNAunion: I think I got Principia!

Quote:
DNAunion: It was those more involved ones that I wrote the program to handle. For example, what is the probability of getting 3 matches when there is a subset of 4 out of 9 SIPF dipeptides and a subset of 5 out of 9 prokaryotic ones? Can you show me how to do that in one simple step?
Quote:
Principia: Much has been made about my not answering this "other" problem that DNAunion posed for me. So, without further ado:

P(getting at least 3 matches) =
P(getting exactly 3 or 4 matches) =
(5C3 * 4C1 + 5C4 * 4C0)/9C4 = 5/14
DNAunion: Your setup looks wrong to me, as does your answer.

Since you have 9C4 on the bottom, I’m assuming you are, basically speaking, picking the 4 from the SIPF dipeptides first and then looking for matches to those using the archaebacterial pairs (which is backwards from the way I think it should be done, but that’s a different matter). I’ll use that assumption since it seems to me that the 9C4 in the denominator leaves no other interpretation.

Now, if we pick a set of 4 from the 9 for the bottom and then try to match them on the top, then the matches should be in the same nCr as the set of 4 that were selected, and the non-matches should be in the same nCr as the set of 5 that weren’t selected. That gives:

P(3 matches) = (4C3 * 5C1) / 9C4
P(4 matches) = (4C4 * 5C0) / 9C4

P(at least 3 matches) = [(4C3 * 5C1) + (4C4 * 5C0)] / 9C4

I think mine is correct and yours is wrong: you got it backwards.

Look at the first combination (nCr). As the problem stated, we are trying to get 3 matches to the set of 4 that WERE selected. In yours, we are trying to get 3 matches to the set of 5 that WEREN’T selected.

Here's my calculation:

P(at least 3 matches) = [(4C3 * 5C1) + (4C4 * 5C0)] / 9C4
= [(4 * 5) + (1 * 1)] / 126
= (20 + 1) / 126
= 21 / 126
= 7/42
DNAunion is offline  
Old 04-02-2003, 04:29 PM   #87
Veteran Member
 
Join Date: Mar 2002
Location: anywhere
Posts: 1,976
Default

Quote:
Originally posted by DNAunion
DNAunion: I think I got Principia!

DNAunion: Your setup looks wrong to me, as does your answer.

Since you have 9C4 on the bottom, I’m assuming you are, basically speaking, picking the 4 from the SIPF dipeptides first and then looking for matches to those using the archaebacterial pairs (which is backwards from the way I think it should be done, but that’s a different matter). I’ll use that assumption since it seems to me that the 9C4 in the denominator leaves no other interpretation.

Now, if we pick a set of 4 from the 9 for the bottom and then try to match them on the top, then the matches should be in the same nCr as the set of 4 that were selected, and the non-matches should be in the same nCr as the set of 5 that weren’t selected. That gives:

P(3 matches) = (4C3 * 5C1) / 9C4
P(4 matches) = (4C4 * 5C0) / 9C4

P(at least 3 matches) = [(4C3 * 5C1) + (4C4 * 5C0)] / 9C4

I think mine is correct and yours is wrong: you got it backwards.

Look at the first combination (nCr). As the problem stated, we are trying to get 3 matches to the set of 4 that WERE selected. In yours, we are trying to get 3 matches to the set of 5 that WEREN’T selected.

Here's my calculation:

P(at least 3 matches) = [(4C3 * 5C1) + (4C4 * 5C0)] / 9C4
= [(4 * 5) + (1 * 1)] / 126
= (20 + 1) / 126
= 21 / 126
= 7/42
WRONG.

And it is exactly the reason why you can't get the previous question.

Quote:
I used the above method to calculate the probabilities for a slew of different mixes of parameters and was comforted to see that it all seemed to work. For example, I calculated the probabilities of getting exactly 0, exactly 1, exactly 2, exactly 3, and exactly 4 matches when comparing a list of the top 4 SIPF amino acid “partners” with a list of the top 4 archaebacterial amino acid “partners” for the same amino acid – each probability looked legitimate on its own, and more importantly, the sum of the individual probabilities was 100%. So the probability of getting exactly 0, exactly 1, exactly 2, exactly 3, or exactly 4 matches given 4 shots is 1; just as it should be. Other sets of parameters worked also.

However, for one set of parameters I tried something failed. The full “master” set of amino acids was still 9 (just as it has been). The number of amino acid “partners” listed for the archaebacterial protein was set to 5 and the number of amino acid “partners” listed for the SIPF dipeptides was set to 3. So it was looking at having 3 shots to make matches with 5 specified items. I setup the equations as follows:

P(0 matches) = (5C0 * 4C3) / 9C5 = 4/126
P(1 match....) = (5C1 * 4C2) / 9C5 = 30/126
P(2 matches) = (5C2 * 4C1) / 9C5 = 40/126
P(3 matches) = (5C3 * 4C0) / 9C5 = 10/126

Each one seems fairly reasonable on its own, but their sum falls way short of 100%: it comes to only 84/126. I don’t get it (yeah I know, I goofed somewhere) – when selecting 3 items to try to match something, the possibility of getting either exactly 0, exactly 1, exactly 2, or exactly 3 matches is exhaustive. Where’s the missing 42/126?
Gee, declaring victory before you have double checked your answer. Makes a reader rethink all of these mighty claims that you've been making, doesn't it? You're going to have to do better than that to reestablish your credibility, DNAunion.
Principia is offline  
Old 04-02-2003, 07:22 PM   #88
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

Quote:
DNAunion: I think I got Principia!
Quote:
Principia: WRONG
DNAunion: I thought it seemed too good to be true!

I was calculating for only 4 attempts to get three matches when I should have been calculating for 5 attempts. Taking the extra trial into account changes everything that follows in bold.


P(3 matches) = (4C3 * 5C2) / 9C4
P(4 matches) = (4C4 * 5C1) / 9C4

P(at least 3 matches) = [(4C3 * 5C2) + (4C4 * 5C1)] / 9C4
= [(4 * 10) + (1 * 5)] / 126
= (40 + 5) / 126
= 45 / 126
= 5 / 14

Which, sadly for me, IS the value that Principia came up with the first time.
:notworthy
DNAunion is offline  
Old 04-03-2003, 04:17 PM   #89
Veteran Member
 
Join Date: Jan 2001
Location: USA
Posts: 1,072
Default

Quote:
Principia: In other words why do you care about only 1,000,000 iterations? After all, I don't see any evidence at all that 1,000,000 iterations suffice to produce a suitably accurate results.
DNAunion: Really? What about this?

Quote:
DNAunion: Using [my code] to determine an empirical probability for what I discussed earlier … turned up a result of 96.0258%, which is virtually identical to the theoretical probability of 96.03% I calculated by hand…
DNAunion: That’s pretty dead on for an EMPIRICAL probability.

Quote:
Principia: Suppose one of those "empowered" users demands 1e18 iterations? What then?
DNAunion: Then I’d worry about that person’s grasp on reality!

Suppose we have some program – written by anyone, in any language - that on a standard computer performs a full one million iterations per second. To perform 10^18 iterations would take more than 30,000 years!
DNAunion is offline  
Old 04-03-2003, 05:23 PM   #90
Veteran Member
 
Join Date: Mar 2002
Location: anywhere
Posts: 1,976
Default

Quote:
DNAunion: That’s pretty dead on for an EMPIRICAL probability.
Well, being suspicious of DNAunion's claims, I did what any intellectually honest person would do -- I checked it out for myself. Here are 5 runs:
Code:
1) 95.9953%
2) 96.1158%
3) 96.0772%
4) 96.0179%
5) 96.2149%
And I gave up after that, having spent 10 minutes doing them. Especially after I noticed that the number DNAunion reported, was clearly a result of cherry picking data. I rest my case.

Quote:
DNAunion: Then I’d worry about that person’s grasp on reality!

Suppose we have some program – written by anyone, in any language - that on a standard computer performs a full one million iterations per second. To perform 10^18 iterations would take more than 30,000 years! [/B]
No, personally, I'd be more worried about DNAunion's own limited grasp of reality. In the modern age of networked computers and Gigahertz machines, why must only one computer be involved? Well, let's use DNAunion's conservative estimates. Say 1 million computers each running 1e6 iterations/sec (on the order of 1 MHz (!)) were involved. How much time would that take? Why, I did a quick back of the hand calculation and came up with: 12 days! As a reality check, I went to a popular @Home distributed computational problem, namely Seti@Home: Here are some statistics:
Code:
                           Total Last                   24 Hours 
Total CPU time       1396121.492 years        1473.817 years
Floating Point 
Operations           2.815692e+21                4.935528e+18 (57.12 TeraFLOPs/sec) 
Average CPU time
per work unit       14 hr 45 min 36.0 sec    10 hr 12 min 06.6 sec
In 24 hours, 4.9e18 FLOPS were executed. More than enough computational power for the 1e18 iterations of DNAunion's bloated code to be executed. But of course, the question is why would people waste their time with it?
Principia is offline  
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump


All times are GMT -8. The time now is 01:19 PM.

Top

This custom BB emulates vBulletin® Version 3.8.2
Copyright ©2000 - 2015, Jelsoft Enterprises Ltd.