Freethought & Rationalism Archive

RufusAtticus · 03-29-2003, 01:02 PM

DNAUnion,

Please use the "code" blocks for displaying code. Could you please give a simple statement what probability your trying to calculate with your programs. (I've looked at Rhode's paper and am not sure exactly what he was trying ot do either.)

I'd just like a stament like, "To calculate the probability that four things chosen out of nine things have two things that match." Or whatever is appropriate for your program.

DNAunion · 03-29-2003, 01:06 PM

Quote:

RufusAtticus: DNAUnion, please use the "code" blocks for displaying code.

DNAunion: I've had problems before when using code tags, but that was a while back and maybe these newer discussion forum "languages" have worked out the kinks.

Quote:

Principia: 4) A Monte Carlo analysis written in Visual Foxpro (!!

) ...

DNAunion: What, you don't consider Visual FoxPro to be a "real" programming language? Fine, give us reasons for your (flawed) personal opinion.

Perhaps it's just that you can't "speak" VFP: maybe you can only decipher a "real" programming language like C++. Well here, I took the time to recode it in that language. Now you can examine the code and point out my errors.

Code:

// calcprob.cpp
// This program models choosing lettered tiles from an urn in 
// order to calculate an empirical probability

#include <iostream>
#include <stdlib.h>	// needed for the rand function
#include <time.h>	// needed to get current time to seed rand function
using namespace std;

long GetRandomNumber(int nMin, int nMax);

int main()
{
	const int nDiscardTilesOnceChosen = 1;
	const int nLetteredTiles = 9;
	const int nTargetTiles = 4;
	const int nMatchesNeededForSuccess = 1;
	const int nTrialsPerIteration = 4;
	const long lIterations = 1000000;

	int nTrial = 0;
	int nMatches = 0;
	int nIndex = 0;
	int nLooper = 0;
	int nFoundOne = 0;
	long lIteration = 0;
	long lSuccessfulIterations = 0;
	char cLetter = ' ';
	char cUrn[nLetteredTiles][3];


	// Before doing anything else, intialize the random number generator
	// using the system clock
	srand((unsigned)time(NULL));

	// Initialize array (fill the urn with lettered tiles)
	// The columns of the multidimensional array breakdown as follows:
	// [1] = LETTER: a unique symbol on a tile that gets placed into the urn.
	// 		 As the program currently stands, this value is not used.
	// [2] = CHOSEN: has this letter/tile already been chosen from the urn?
	//		 If so, it may have been disarded and so not available 
	//		 any more, or it may have been replaced and available to 
	//		 be selected again.  Which occurs for an already selected
	//		 tile depends upon the value of the const variable
	//		 nDiscardTilesOnceChosen.
	// [3] = TARGET: is this letter/tile one of the targets?
	for (nIndex = 0; nIndex < nLetteredTiles; nIndex++)
	{
		cLetter = 64 + nIndex;
		cUrn[nIndex][1] = cLetter;
		cUrn[nIndex][2] = 'F';
		cUrn[nIndex][3] = 'F';
	}
	
	// Choose x number of tiles from the Urn to serve as targets
	for (nLooper = 1; nLooper <= nTargetTiles; nLooper++)
	{
		nFoundOne = 0;
		while (nFoundOne == 0)
		{
			nIndex = (int) GetRandomNumber(0, nLetteredTiles - 1);
			if (cUrn[nIndex][3] == 'F')
			{
				cUrn[nIndex][3] = 'T';
				nFoundOne = 1;	
			}
		}
	}



	// Begin selecting tiles from the Urn.
	for (lIteration = 1; lIteration <= lIterations; lIteration++)
	{
		// New iteration: need to clear all CHOSEN flags
		for (nIndex = 0; nIndex < nLetteredTiles; nIndex++)
		{
			cUrn[nIndex][2] = 'F';	
		}
		nMatches = 0;
		
	
		// Give the user some output throughout the process 
		cout << "Iteration " << lIteration << " of " << lIterations << endl;

		for (nTrial = 1; nTrial <= nTrialsPerIteration; nTrial++)
		{

			// Pull a single tile out of the urn
			nFoundOne = 0;
			while (nFoundOne == 0)
			{
				nIndex = (int) GetRandomNumber(0, nLetteredTiles - 1);
				if (nDiscardTilesOnceChosen == 0)
				{
					// Doesn't matter if the tile has been 
					// chosen previously because selected
					// tiles are placed back into the urn.
					nFoundOne = 1;
				}
				else if (cUrn[nIndex][2] == 'T')
				{
					// This tile has already been chosen and
					// discarded: it can't be selected again.
					nFoundOne = 0;
				}
				else if (cUrn[nIndex][2] == 'F')
				{
					// This tile has not been chosen previously.
					nFoundOne = 1;
				}
			}
			// A tile has been chosen: flag it as such
			cUrn[nIndex][2] = 'T';

			// Does the chosen tile match one of the targets?
			if (cUrn[nIndex][3] == 'T')
			{
				nMatches += 1;
			}

			// No need to continue pulling tiles for this iteration
			// if we've obtained enough matches for success
			if (nMatches >= nMatchesNeededForSuccess)
			{
				nTrial = nTrialsPerIteration + 1;
			}
		}

		// Did we get enough matches for this iteration?
		if (nMatches >= nMatchesNeededForSuccess)
		{
			lSuccessfulIterations += 1;
		}
	}
	

	cout << "Tiles discarded after being chosen? ";
	cout << (nDiscardTilesOnceChosen == 1?"Yes":"No") << endl;
	cout << "Number of lettered tiles in Urn: ";
	cout << nLetteredTiles << endl;
	cout << "Number of target tiles: ";
	cout << nTargetTiles << endl;
	cout << "Number of matches needed: ";
	cout << nMatchesNeededForSuccess << endl;
	cout << "Trials per iteration: ";
	cout << nTrialsPerIteration << endl;
	cout << "Total iterations: ";
	cout << lIterations << endl;
	cout << "Successful iterations: ";
	cout << lSuccessfulIterations << endl;
	cout << "Empirical probability: ";
	cout << ((float)lSuccessfulIterations / lIterations) * 100 << "%" << endl;

	return (0);
}


long GetRandomNumber(int nMin, int nMax)
{
	long lRandomNumber;

	lRandomNumber = rand();
	while (lRandomNumber < nMin || lRandomNumber > nMax)
	{
		lRandomNumber = rand();
	}
	return lRandomNumber;
}

Principia · 03-29-2003, 01:13 PM

Actually, my point 4 was supposed to show that a monte carlo analysis was completely redundant for the problem you are solving... much less having to do it in Visual Foxpro. So, really, that you can program in C neither impresses me, nor does it address the overall theme of my post.

DNAunion · 03-29-2003, 01:15 PM

DNAunion: I don't think Principia will uncover any errors in my code. Using it to determine an empirical probability for what I discussed earlier (see below quote) turned up a result of 96.0258%, which is virtually identical to the theoretical probability of 96.03% I calculated by hand in the below quote.

Quote:

DNAunion: What we will do first is calculate the probability that none of the four �primitive� amino acids match the SIPF ones, then from it calculate the opposite probability (that at least one would match).

�Primitive� Amino Acid 1: To start with, there are four SIPF targets and nine possible �primitive� amino acids that could be compared to them. So the probability of �primitive� aa-1 matching one of the four target SIPF amino acids is 4/9. Therefore, the probability of its not matching is 1 � 4/9 = 5/9.

�Primitive� Amino Acid 2: Since the probability of this aa is dependent upon the previous one, we have to assume that aa-1 did not match. That leaves eight possible �primitive� amino acids and still four target SIPF ones. So the probability of getting a match here is 4/8, which means the probability of not getting a match is 1 � 4/8 = 4/8 = 1/2.

�Primitive� Amino Acid 3: We have to assume that the previous attempt failed to match, leaving seven �primitive� amino acids and still four target SIPF ones. So the probability of matching here is 4/7, meaning that the probability of a non-match is 1 � 4/7 = 3/7.

�Primitive� Amino Acid 4: Since the last one failed to match also, we are left with six �primitive� amino acids and still have four target SIPF ones. So the probability of a match on this final step is 4/6, which means the probability of a non-match is 1 � 46 = 2/6 = 1/3.

To figure out the overall probability � that is, what is the probability of not getting any matches in four attempts -- we just multiply each of the four individual probabilities for non-matches.

P(no matches) = 5/9 * 1/2 * 3/7 * 2/3 = 5/126

And, looking at the opposite case�

P(at least one match) = 1 � P(no matches) = 1 � 5/126 = 121/126 = 96.03%

Principia · 03-29-2003, 01:20 PM

Let me give you an example. I will condense the following verbose and inelegant analysis of yours:

Quote:

Rode takes into account only enough trials to cover the number of coincidences. For example, for a single coincidence, Rode considers only a single trial. Sure, if you are only going to get one shot at an event with a probability of 4/9, then of course your chance of success is 4/9. But that is not the case here. There are four chances to get a single match. For example, one of his single matches is for the amino acid Ala, in which the archaebacteria have joined to it either Ala, Glu, Val, Leu and the SIPF has joined to it Ala, Pro, Gly, and His. So there were four attempts � Ala, Glu, Val, and Leu � at matching any of the four SIPF amino acids. That changes the probability of a single match dramatically: let�s take a look.

What we will do first is calculate the probability that none of the four �primitive� amino acids match the SIPF ones, then from it calculate the opposite probability (that at least one would match).

�Primitive� Amino Acid 1: To start with, there are four SIPF targets and nine possible �primitive� amino acids that could be compared to them. So the probability of �primitive� aa-1 matching one of the four target SIPF amino acids is 4/9. Therefore, the probability of its not matching is 1 � 4/9 = 5/9.

�Primitive� Amino Acid 2: Since the probability of this aa is dependent upon the previous one, we have to assume that aa-1 did not match. That leaves eight possible �primitive� amino acids and still four target SIPF ones. So the probability of getting a match here is 4/8, which means the probability of not getting a match is 1 � 4/8 = 4/8 = 1/2.

�Primitive� Amino Acid 3: We have to assume that the previous attempt failed to match, leaving seven �primitive� amino acids and still four target SIPF ones. So the probability of matching here is 4/7, meaning that the probability of a non-match is 1 � 4/7 = 3/7.

�Primitive� Amino Acid 4: Since the last one failed to match also, we are left with six �primitive� amino acids and still have four target SIPF ones. So the probability of a match on this final step is 4/6, which means the probability of a non-match is 1 � 46 = 2/6 = 1/3.

To figure out the overall probability � that is, what is the probability of not getting any matches in four attempts -- we just multiply each of the four individual probabilities for non-matches.

P(no matches) = 5/9 * 1/2 * 3/7 * 2/3 = 5/126

And, looking at the opposite case�

P(at least one match) = 1 � P(no matches) = 1 � 5/126 = 121/126 = 96.03%

down into one paragraph:

P(no matches to first 4 residues) = 5C4/9C4 = 5/126, where nCr is shorthand for n!/(n-r)!/r!.

Therefore, P(at least one match in the first 4 residues) = 1 - P(no matches to first 4 residues) = 1 - 5/126.

And guess what, the analysis is not that much more difficult for the remaining cases.

Anyways, this demonstration is moot beyond supporting what I meant in point 4, since points 1-3 show that your concerns are just mere nitpicking.

DNAunion · 03-29-2003, 01:24 PM

Quote:

Principia: So, really, that you can program in C neither impresses me...

DNAunion: It wasn't C, it was C++. You do know that they are different, right?

Quote:

Principia: ... nor does it address the overall theme of my post.

DNAunion: So? Where did I say it addressed the overall theme of your post? Nowhere.

It did, however, address "derogatory" comments you made in your point #4.

Principia · 03-29-2003, 01:26 PM

Let me put point 4 another way, using another example of DNAunion's:

Quote:

I was tutoring someone tonight in college algebra and one of the problems we had was as follows:

|x^2 - 4| = x - 2

Here's what I did.

1) "Split" it into two equations to eliminate the absolute value sign:

a. x^2 - 4 = x - 2

b. x^2 - 4 = -(x - 2)

2) Recognizing x^2 - 4 as being the difference of two perfect squares, with one of the two factors being the same as the right side of the equation, I factored the left side.

a. (x + 2)(x - 2) = x - 2

b. (x + 2)(x - 2) = -(x - 2)

3) I then divided both sides of the equation by the common (x - 2).

a. [(x + 2)(x - 2)] / (x - 2) = (x - 2) / (x - 2)

b. [(x + 2)(x - 2)] / (x - 2) = -(x - 2) / (x - 2)

4) Reducing leads to:

a. x + 2 = 1

b. x + 2 = -1

5) I wrote a program in GW-BASIC to solve the equations in 4). What I did was run 1 million iterations where a random value between 5 and -5 was plugged into the equations. After all those iterations for each equations a. and b. I returned the value that gave the smallest errors:
-0.99987
-3.01344

6) Finally, I checked my computed answers by plugging them back into the original equation (the one that has the absolute value in it). None of the "solutions" worked. This appears to indicate that there is no solution to the problem.

However, x = 2 is a solution.

PS: BASIC code will be provided on demand.

DNAunion · 03-29-2003, 01:28 PM

Quote:

Principia: And guess what, the analysis is not that much more difficult for the remaining cases.

DNAunion: It was those more involved ones that I wrote the program to handle. For example, what is the probability of getting 3 matches when there is a subset of 4 out of 9 SIPF dipeptides and a subset of 5 out of 9 prokaryotic ones? Can you show me how to do that in one simple step?

Principia · 03-29-2003, 01:30 PM

I got so tickled by DNAunion's algebra problem, which he asked me to solve for him, that I decided to construct another analogous problem:

Quote:

Solve x^2-4 = 0:

1) x^2 - 4 = 0

2) x^2 - 4 = 0*(x^2-4)

3) Divide by x^2 - 4 from both sides of the equation

4) 1 = 0

Clearly this can never be the case for any x, therefore my "correct" algebra implies that there are no solutions to the equation x^2-4.

But I know the solutions are 2 and -2.

Principia · 03-29-2003, 01:32 PM

OK, unless there is more substantive discussion on Rode's SIPF hypothesis, I guess I am done with this thread.

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search

03-29-2003, 01:02 PM	#11
RufusAtticus Veteran Member Join Date: Nov 2001 Location: NCSU Posts: 5,853	DNAUnion, Please use the "code" blocks for displaying code. Could you please give a simple statement what probability your trying to calculate with your programs. (I've looked at Rhode's paper and am not sure exactly what he was trying ot do either.) I'd just like a stament like, "To calculate the probability that four things chosen out of nine things have two things that match." Or whatever is appropriate for your program.

03-29-2003, 01:13 PM	#13
Principia Veteran Member Join Date: Mar 2002 Location: anywhere Posts: 1,976	Actually, my point 4 was supposed to show that a monte carlo analysis was completely redundant for the problem you are solving... much less having to do it in Visual Foxpro. So, really, that you can program in C neither impresses me, nor does it address the overall theme of my post.

03-29-2003, 01:32 PM	#20
Principia Veteran Member Join Date: Mar 2002 Location: anywhere Posts: 1,976	OK, unless there is more substantive discussion on Rode's SIPF hypothesis, I guess I am done with this thread.

Freethought & Rationalism Archive

The archives are read only.