Freethought & Rationalism ArchiveThe archives are read only. |
04-01-2003, 04:31 PM | #81 | ||
Veteran Member
Join Date: Jan 2001
Location: USA
Posts: 1,072
|
DNAunion: I’ll try to deal with only one point at a time.
Quote:
Quote:
A-B linkages: ...Ala = 1 ...R = 1 B-A linkages: ...E = 1 ...Ala = 1 In the protein, that is not double counting Ala-Ala: it should be counted both ways, as A-B and B-A. Why? The simplest explanation is that in the sequence Ala both precedes Ala and follows Ala. So it is both A-B and B-A. A second way of looking at it. Using just that one sequence (the underlying logic wouldn't change if we had more), we see that Ala is just as likely to follow Ala as R is. If we drop that Ala-Ala pair, then we’ve lost information: we would then be led to believe that only R follows Ala in that sequence, and that is incorrect. Similarly, the sequence shows us that Ala is just as likely to precede Ala as E is. Again, if we drop that Ala-Ala pair, we would be losing information. ********************************* PS: The following is for any readers who are not familiar with the fact that the order of amino acids in a protein is set and cannot be reversed: if someone gives you Ala-Gly, for example, you can’t reverse it and say that it is Gly-Ala. In the sequence Ala-Gly, Gly follows Ala - you can't turn it around to where Gly precedes Ala. Proteins have an intrinsic directionality due to their having different functional groups at the two ends. The order of amino acids in a protein is determined by which end is the C-terminus and which end is the N-terminus. One starts counting at the N-terminus with that amino acid being numbered 1; the next amino acid is numbered 2; and so on down the line until reaching amino acid number n, at the N-terminus. When a protein’s aa-sequence is listed, it is given, by convention, this way. Thus, the dipeptides N-Ala-Glu-C and C-Ala-Glu-N are different sequences, even though they both contain only Ala and Glu bonded together and are written in the same order in both dipeptides. The second one is not listed according to the convention; it is actually N-Glu-Ala-C, and that is how sources would list it. |
||
04-01-2003, 04:57 PM | #82 | |||
Veteran Member
Join Date: Jan 2001
Location: USA
Posts: 1,072
|
Quote:
Quote:
Quote:
I'll address only the former (since I am short on time). I didn’t say that I killed Rode's conclusions, just as the statement of mine you quoted indicates. In fact, I pointed out that the empirical probability I calculated fell somewhere BETWEEN the upper and lower limits that should cause someone to reject or accept his conclusion: my number fell in a “gray area” that I didn’t address. Thus, I made no claim of having refuted Rode’s conclusion, just his calculation. As far as what I did imply about his conclusion… I said that with a probability of only 1 in 10^18 (Rode’s calculated probability) that one "shouldn’t" argue with the connection between the SIPF and the ‘primitive’ prokaryotic proteins. But, with the actual probability of the match by chance being at least 11 orders of magnitude more likely than what Rode’s calculated, then one is not “forced” to accept his conclusion. To try to put it another way: it went from "having to accept it" to "not having to accept it". The latter is different from both "having to reject it" and "having to accept that the opposite is true". PS: I'll be out of state for awhile so I won't be able to address everything Principia brought up, nor will I be able to quickly follow up any counters he might raise. |
|||
04-01-2003, 05:42 PM | #83 | |
Veteran Member
Join Date: Jan 2001
Location: USA
Posts: 1,072
|
Quote:
DNAunion: Never mind. Principia is correct, he just wrote it differently than I've seen it. I've seen it as nCr = n!/[(n-r)!*r!] and at first sight it looks like Principia has a division sign where a multiplication sign should be: (n-r)!/r! instead of (n-r)!*r!. But, since you do the divisions in order from left to right, that second division changes to a multiplication by the reciprocal, which leads back to the same thing. To make it simpler, I'll use these conventions: a = n!; b = (n-r)!; and c = r!. I've seen it as nCr = a/(bc), whereas Principia shows it as nCr = a/b/c. See, that last operation looks wrong at first sight - it's a division but it should be a multiplication, right? No, both are correct because they are both the same. The first division is done first. [a/c]/b = [a/c]/[b/1] = a/c * 1/b = a/(bc) |
|
04-01-2003, 06:22 PM | #84 | |
Veteran Member
Join Date: Jan 2001
Location: USA
Posts: 1,072
|
DNAunion: To anyone EXCEPT Principia.
I am trying to figure out how he came up with this. Quote:
I understand that the 9C4 is figuring out how many possible unique (unordered) combinations there are when taking 4 items at a time from a set of 9. I also know how to get 9C4 for "taking 4 amino acids out of a set of 9". And 5C4 indicates how many combinations there are when taking 4 items at a time from a set of 5. That's kind of where I get stuck. How does one go from, "No matches in first 4 residues", to 5C4? Could someone please explain that to me, nicely. PS: I see some "number coincidences" in 5C4, but can't put them together. Is it saying that there are 5 non-matches, and 5C4 combinations for taking 4 at a time from those 5 non-matches? If so, how is it determined that there are 5 non-matches (and 4 matches)? ***************************** Wait a minute! I think I got it now, and it's simple. In loose terms, on the bottom there are 4 residues selected from the 9 (the 9C4). For those on the top to NOT match any of those on the bottom -- since we are looking for P(no matches to first 4 residues) -- those on the top have to be from a subset of the 9, with that subset consisting of the other 5 - the five that WEREN'T chosen on the bottom (there's the "number coincidence": now 9 - 4 = 5 has some meaning to me here). Thus, for the top, we need to select 4 items from a set of 5: that gives 5C4. ****************************** PPS: One may wonder why I left this long post as it is instead of replacing it with a simple "Never mind". The reason is that there might be people here besides me who are only at my level of understanding of math. For them, following along as I try to work out every step may help them understand too. |
|
04-02-2003, 03:24 PM | #85 |
Veteran Member
Join Date: Jan 2001
Location: USA
Posts: 1,072
|
DNAunion: I was going to overwrite one of my above posts with this one, but I can't (not allowed to after 20 minutes).
This post is divided into two parts. The second part is a question I have that I would like answered. The first part is not for people who already know how to do the calculations; it is for any people who are interested in learning some of the underlying logic by following along as I work through it for myself. FIRST PART I will take a single example and work through it (my “definitions” are working definitions; not formal ones). In the more complicated fashion: What is the probability of getting exactly three matches when comparing the list of the top 4 out of 9 amino acid “partners” for Ala in the SIPF dipeptides with the list of the top 4 out of 9 amino acid “partners” for Ala from the ‘primitive’ archaebacterial proteins? Our goal is to get to a simple ratio, something like: P(3 matches) = number of ways to get 3 matches / total number of ways of selecting 4 things from a set of 9 Let’s start with the bottom (“total number of ways of selecting 4 things from a set of 9”). This is where the nCr notation comes in. nCr indicates the number of unique, unordered sets – i.e., combinations (thus the C) - that can be formed by taking r items at a time from a set of n items. For example, how many ways can you select 2 items from a set of 3? The notation would be 3C2. To solve it, we need to know that nCr = n!/[(n-r)!*r!]. Let’s plug the numbers for n and r in. nCr = n!/[(n-r)!*r!] 3C2 = 3!/[(3-2)!*2!] The ! indicates factorial. In non-formal terms, x! = x * (x-1) * (x-2) * (x-3) … * 1 (note that 0! is defined as 1). So 3! would be 3 * 2 * 1. That gives us: 3C2 = 3!/[(3-2)!*2!] = (3 * 2 * 1)/[1 * (2 * 1)] = 6/2 = 3 Let’s see if that holds up. Let the full set be {A, B, C}. How many different ways can we pull out two of those? {A, B}, {A, C}, {B, C}. Three ways…it checks. So for our problem dealing with amino acids, the bottom is “total number of ways of selecting 4 things from a set of 9”. That gives us 9C4, or equivalently, 9!/[(9-4)!*4!] = 126 (I didn’t show the steps involved in simplifying the factorials here, but from what was given just above one should be able to work it out). So far we have: P(3 matches) = number of ways of getting 3 matches / 9C4. We next have to figure out the top part: “number of ways of getting 3 matches”. This is a little bit trickier (this is the part that was hanging me up for so long). To do this, we have to calculate two combinations (two nCr’s). Think about it like this. The selection of 4 items from the full set of 9 created two subsets: one that consists of the 4 items that WERE selected and another that consists of the 5 items that were NOT selected. To successfully select 3 items given 4 tries, we have to get 3 from the set of 4 that were already selected AND 1 from the set of 5 that were NOT selected. That’s why we need two combinations: we are “selecting r objects from a set of n” two times. Thus, we need both 4C3 (3 matches from the set of 4 selected previously) and 5C1 (1 non-match: 1 from the set of 5 that weren’t selected previously). Okay, we need two…but how do we combine them to figure out the number of ways of getting 3 matches? Do we just add them together: “task 1 can be done 5 ways and task 2 can be done 3 ways, so together they can be done 8 ways”? No. If task 1 can be done in x number of ways, and task 2 can be done in y number of ways, then the two tasks together can be done in x * y ways. So the number of ways of getting three matches is 4C3 * 5C1. We now have our final equation. P(3 matches) = (4C3 * 5C1) / 9C4 (Yes, the parentheses around the first two combinations are superfluous – but they add some degree of clarity) We already saw that 9C4 = 126, so we have P(3 matches) = (4C3 * 5C1) / 126 4C3 = 4!/[(4-3)!*3!] = 4 and 5C1 = 5!/[(5-1)!*1!] = 5 P(3 matches) = (4C3 * 5C1) / 9C4 = (4 * 5) / 126 = 20 / 126 = 0.158730159 Therefore, you have about a 16% chance of getting three matches. And that’s how it’s done (at least I am pretty sure). SECOND PART: THE QUESTION I used the above method to calculate the probabilities for a slew of different mixes of parameters and was comforted to see that it all seemed to work. For example, I calculated the probabilities of getting exactly 0, exactly 1, exactly 2, exactly 3, and exactly 4 matches when comparing a list of the top 4 SIPF amino acid “partners” with a list of the top 4 archaebacterial amino acid “partners” for the same amino acid – each probability looked legitimate on its own, and more importantly, the sum of the individual probabilities was 100%. So the probability of getting exactly 0, exactly 1, exactly 2, exactly 3, or exactly 4 matches given 4 shots is 1; just as it should be. Other sets of parameters worked also. However, for one set of parameters I tried something failed. The full “master” set of amino acids was still 9 (just as it has been). The number of amino acid “partners” listed for the archaebacterial protein was set to 5 and the number of amino acid “partners” listed for the SIPF dipeptides was set to 3. So it was looking at having 3 shots to make matches with 5 specified items. I setup the equations as follows: P(0 matches) = (5C0 * 4C3) / 9C5 = 4/126 P(1 match....) = (5C1 * 4C2) / 9C5 = 30/126 P(2 matches) = (5C2 * 4C1) / 9C5 = 40/126 P(3 matches) = (5C3 * 4C0) / 9C5 = 10/126 Each one seems fairly reasonable on its own, but their sum falls way short of 100%: it comes to only 84/126. I don’t get it (yeah I know, I goofed somewhere) – when selecting 3 items to try to match something, the possibility of getting either exactly 0, exactly 1, exactly 2, or exactly 3 matches is exhaustive. Where’s the missing 42/126? |
04-02-2003, 04:20 PM | #86 | ||
Veteran Member
Join Date: Jan 2001
Location: USA
Posts: 1,072
|
DNAunion: I think I got Principia!
Quote:
Quote:
Since you have 9C4 on the bottom, I’m assuming you are, basically speaking, picking the 4 from the SIPF dipeptides first and then looking for matches to those using the archaebacterial pairs (which is backwards from the way I think it should be done, but that’s a different matter). I’ll use that assumption since it seems to me that the 9C4 in the denominator leaves no other interpretation. Now, if we pick a set of 4 from the 9 for the bottom and then try to match them on the top, then the matches should be in the same nCr as the set of 4 that were selected, and the non-matches should be in the same nCr as the set of 5 that weren’t selected. That gives: P(3 matches) = (4C3 * 5C1) / 9C4 P(4 matches) = (4C4 * 5C0) / 9C4 P(at least 3 matches) = [(4C3 * 5C1) + (4C4 * 5C0)] / 9C4 I think mine is correct and yours is wrong: you got it backwards. Look at the first combination (nCr). As the problem stated, we are trying to get 3 matches to the set of 4 that WERE selected. In yours, we are trying to get 3 matches to the set of 5 that WEREN’T selected. Here's my calculation: P(at least 3 matches) = [(4C3 * 5C1) + (4C4 * 5C0)] / 9C4 = [(4 * 5) + (1 * 1)] / 126 = (20 + 1) / 126 = 21 / 126 = 7/42 |
||
04-02-2003, 04:29 PM | #87 | ||
Veteran Member
Join Date: Mar 2002
Location: anywhere
Posts: 1,976
|
Quote:
And it is exactly the reason why you can't get the previous question. Quote:
|
||
04-02-2003, 07:22 PM | #88 | ||
Veteran Member
Join Date: Jan 2001
Location: USA
Posts: 1,072
|
Quote:
Quote:
I was calculating for only 4 attempts to get three matches when I should have been calculating for 5 attempts. Taking the extra trial into account changes everything that follows in bold. P(3 matches) = (4C3 * 5C2) / 9C4 P(4 matches) = (4C4 * 5C1) / 9C4 P(at least 3 matches) = [(4C3 * 5C2) + (4C4 * 5C1)] / 9C4 = [(4 * 10) + (1 * 5)] / 126 = (40 + 5) / 126 = 45 / 126 = 5 / 14 Which, sadly for me, IS the value that Principia came up with the first time. :notworthy |
||
04-03-2003, 04:17 PM | #89 | |||
Veteran Member
Join Date: Jan 2001
Location: USA
Posts: 1,072
|
Quote:
Quote:
Quote:
Suppose we have some program – written by anyone, in any language - that on a standard computer performs a full one million iterations per second. To perform 10^18 iterations would take more than 30,000 years! |
|||
04-03-2003, 05:23 PM | #90 | ||
Veteran Member
Join Date: Mar 2002
Location: anywhere
Posts: 1,976
|
Quote:
Code:
1) 95.9953% 2) 96.1158% 3) 96.0772% 4) 96.0179% 5) 96.2149% Quote:
Code:
Total Last 24 Hours Total CPU time 1396121.492 years 1473.817 years Floating Point Operations 2.815692e+21 4.935528e+18 (57.12 TeraFLOPs/sec) Average CPU time per work unit 14 hr 45 min 36.0 sec 10 hr 12 min 06.6 sec |
||
Thread Tools | Search this Thread |
|