Freethought & Rationalism Archive

Mageth · 07-10-2003, 10:29 AM

Originally posted by DigitalChicken
Well of course. That's why I used non-absolute terms.

I understood that you knew that; my comment was meant to clarify the point to others.

What you will find is that in a great number of cases.. if not a vast majority... those considerations are irrelevant in this day and age.

That's true; most of my bad experiences with optimizing compilers came a few years back.

Yes there is. The social context in which the code is written.

I may program something that I will never touch again and some unknown person may pick it up. In that case, its of prime importance that the code be understandable.

In my comment, I should have said "However, there's often no reason...". And I absolutely agree that the code should be understandable; I just don't think it's necessarily true that code must be made more complex to be more understandable. (I think we agree on this).

That may be the case but its not always the case by far. It also depends on what is meant by "complex." For example, seperating a multi-step process into individual code blocks which are commented well may be less "complex" by one definition than another case where the same multi-step process might be written in a single "elegant" code block.

I agree, I think.

By "simplifying and streamlining" code, I mean removing unnecessary statements and complexity (note that I don't mean squeezing the code down as short as possible), and also clarifying the code (preferably making it easier to understand and more easily modifiable - an important goal which has been implied but not mentioned) by performing such actions as separating a multi-step process into individual code blocks where appropriate.

sophie · 07-10-2003, 12:51 PM

NialScorva, Of course you are right.

When I was professional I used to call it virtual contiguity.

DNAunion · 07-10-2003, 07:53 PM

Quote:

Sophie: The claim elements of an array are stored linearly and contiguously is not necessarily true.

DNAunion: Sorry, but it is true. Anyone can run these two simple tests to see that the memory locations used by a C/C++ array are indeed contiguous and linear.

Code:

// ArrayContig.cpp

#include <iostream>

using std::cout;
using std::endl;

int main()
{
	const int ELEMENTS = 20;
	int lcv, *ptrInt;
	int myInts[ELEMENTS];

	for (lcv = 0; lcv < ELEMENTS; ++lcv)
	{
		myInts[lcv] = lcv;
	}

	ptrInt = myInts;

	for (lcv = 0; lcv < ELEMENTS; ++lcv)
	{
		cout << *(ptrInt + lcv);
		cout << " stored at " << (ptrInt + lcv) << endl;
	}

	cout << endl;
	
	return 0;
}

DNAunion: Or this modified form for two-dimensional arrays.

Code:

// ArrayContig.cpp

#include <iostream>

using std::cout;
using std::endl;

int main()
{
	const int ROWS = 5;
	const int COLS = 3;
	int row, col;
	int myInts[ROWS][COLS];


	for (row = 0; row < ROWS; ++row)
	{
		for (col = 0; col < COLS; ++col)
			myInts[row][col] = row * COLS + col;
	}

	
	for (row = 0; row < ROWS; ++row)
	{
		for (col = 0; col < COLS; ++col)
		{
			cout << myInts[row][col];
			cout << " stored at " << &myInts[row][col] << endl;
		}
	}

	cout << endl;

	return 0;
}

Quote:

Sophie: The appearance of contiguity is achieved through memory mapping.

DNAunion: The mapping of any interest occurs between the logical layout of a multi-dimensional array and the actual layout of memory. For example, for a two-dimensional array, the address of any single element is mapped from its indices to RAM as follows (using �pseudocode�):

thisElementsMemoryLocation = memoryAddressOfBaseOfArray + [((thisElementsRowIndex * COLUMNS) + thisElementsColIndex) * sizeof(oneElement)]

*****************************************

DNAunion: In addition to the above statements by me, here are some authors in the fields saying the same. Let�s start with the simplest, most straightforward quotes.

Quote:

�An array is a group of contiguous memory locations that all have the same name and type.� (C# A Programmer�s Introduction, Deitel & Deitel, Prentice Hall, 2003, p197)

Quote:

�An array is a collection of one or more variables of the same type stored in consecutive memory locations.� (Web Programming Languages Sourcebook, Gordon McComb, John Wiley & Sons Inc, 1997, p417)

Quote:

�An array is a consecutive group of memory locations that all have the same name and the same type. To refer to a particular location or element in the array, we specify the name of the array and the position number of that particular element in the array.� (C++ How To Program: Fourth Edition, Deitel & Deitel, Prentice Hall, 2003, p253)

Quote:

�Arrays
Arrays are sequences of homogeneous elements arranged consecutively in memory.� (Mastering Algorithms with C, Kyle Loudon, O�Reilly & Associates, 1999, p16)

Quote:

�Pointer arithmetic is meaningless unless performed on a pointer that points to an array. We cannot assume that two variables of the same type are stored contiguously in memory unless they are adjacent elements of an array.� (C++ How To Program: Fourth Edition, Deitel & Deitel, Prentice Hall, 2003, p343)

DNAunion: And finish off with some that are a little more detailed/involved.

Quote:

�The following declaration, for example, introduces a 3x3 matrix, each of whose elements is of type double:

double mat[3][3];

Conceptually, the storage for mat forms a two-dimensional structure in which the individual elements are laid out like this:

mat[0][0]�.mat[0][1]�.mat[0][2]
mat[1][0]�.mat[1][1]�.mat[1][2]
mat[2][0]�.mat[2][1]�.mat[2][2]

Internally, C represents the variable mat as an array of three elements, each of which is an array of three floating-point values. The memory allocated to mat consists of nine cell arranged in the following form:

--------------begin of mat[0]
mat[0][0]
mat[0][1]
mat[0][2]
--------------begin of mat[1]
mat[1][0]
mat[1][1]
mat[1][2]
--------------begin of mat[2]
mat[2][0]
mat[2][1]
mat[2][2]

In the two-dimensional diagram, the first index is assumed to indicate the row number. This choice, however, is arbitrary because the two-dimensional geometry of the matrix is entirely conceptual; in memory, these values form a one-dimensional list.� (Programming Abstractions in C: A Second Course in Computer Science, Eric S. Roberts, Addison-Wesley, 1998, p75-76)

Quote:

�For example, if an array or pointer contains the address 0x10000000, at which a sequence of five 4-byte integers is stored, a[3] accesses the integer at address 0x1000000c. This address is obtained by adding (3)(4) = 12[base 10] = c[base 16] to the address 0x10000000. On the other hand, for an array or pointer referencing twenty characters (a string), a[3] accesses the character at address 0x1111113. This address is obtained by adding (3)(1) = 3[base 10] = 3[base 16] to the address 0x10000000.� (Mastering Algorithms with C, Kyle Loudon, O�Reilly & Associates, 1999, p16)

Quote:

�Assume that array int v[5] has been declared and that its first element is at location 3000 in memory. Assume that pointer vPtr has been initialized to point to v[0] (i.e., that the value of vPtr is 3000). Figure 5.18 diagrams this situation for a machine with four-byte integers.

[that diagram shows the following information:
&v[0] = 3000
&v[1] = 3004
&v[2] = 3008
&v[3] = 3012
&v[4] = 3016]

Note that vPtr can be initialized to point to array v with either of the following statements.

vPtr = v;
vPtr = &v[0];

In conventional arithmetic, the addition 3000 + 2 yields the value 3002. This is normally not the case with pointer arithmetic. When an integer is added to, or subtracted from, a pointer, the pointer is not simply incremented or decremented by that integer, but by that integer times the size of the object to which the pointer refers. The number of bytes depends on the object�s data type. For example, the statement

vPtr += 2; [or vPtr = vPtr + 2;]

would produce 3008 (3000 + 2 * 4), assuming that an int is stored in four bytes of memory. In the array v, vPtr would now point to v[2]. If an integer is stored in two bytes of memory, then the preceding calculation would result in memory location 3004 (3000 + 2 * 4).� (C++ How To Program: Fourth Edition, Deitel & Deitel, Prentice Hall, 2003, p341-342)

DNAunion: Well, that should take care of that.

DNAunion · 07-10-2003, 08:05 PM

Quote:

Sophie: I must issue a trashing warning :

DNAunion: And I must reject it as misguided.

Quote:

Sophie:

[portions of DNAunion�s code]
nFoundOne = 0;
while (nFoundOne == 0)
{
nIndex = (int) GetRandomNumber(0, nLetteredTiles - 1);
...
}

The set of solutions which use randomization to access every element in a set once, is prone to what is termed trashing. This means if one randomly picks numbers from 1..n, it is not true that n tries will eventually visit every every element.

DNAunion: And who says that�s what the code is attempting to do? Not the programmer. Not the code. And not the comments.

For people who can�t read code that well, I included comments (perhaps you should have read them). Just above the executable statement prior to the code you extracted, we see the following:

Quote:

DNAunion:

Code:

// Choose x number of tiles from the Urn to serve as targets

DNAunion: Do you see anywhere in that comment, or in the code that follows, that shows me trying to access every element in the array by generating random numbers? Nope.

Trashing charge rejected.

Quote:

Sophie: A true randomiser may never pick one entry and this may entail your program may run into the infinity clause, even though you think it may not.

DNAunion: Nope again. My program won�t �run into the infinity clause�, for one reason, because the code�s not even attempting to do what you think. The code is simply attempting to randomly pick 4 different tiles out of 9 as targets. Unless the code continues to pick from the same set of 1, 2, or 3 tiles, over and over and over and over and over and over and over and over and over and over and over and over and over and over and over and over and � such that it never picks 4 different tiles out of 9, it will succeed �eventually� (more likely than not, in only a few trials) and then be on its merry way.

DNAunion · 07-10-2003, 08:10 PM

Quote:

DNAunion: But the two subscripts are misleading�there is no real element [5, 1] in the array. That is, to access element [5, 1] the computer does not go down 6 rows and then over to the right 1 [sic, should be 2]. No, elements of an array are stored linearly and contiguously. You can demonstrate this for yourself by using pointer arithmetic instead of subscripts.

Quote:

Mageth: Umm, I learned this, what, 22 years ago in my intro programming classes.

DNAunion: Great, so both you and I know it. Now we just have to convince Sophie, and possibly NialScorva too, of the truth we both learned in intro programming classes.

Quote:

Mageth: And like sophie indicated, a programmer should never assume how the data's being stored.

DNAunion: Wrong. If two elements are stored in the same array, then you can be certain where element x + 1 is relative to element x. Don�t take my word for it?

Quote:

� We cannot assume that two variables of the same type are stored contiguously in memory unless they are adjacent elements of an array.� (C++ How To Program: Fourth Edition, Deitel & Deitel, Prentice Hall, 2003, p343)

DNAunion: So I am not only correct to assume that the elements in that array were stored contiguously, I am also correct to confidently state that there were.

Perhaps what you are trying to say is that when writing code I shouldn�t rely upon being able to get by with one-off errors because of the linear and contiguous arrangement of array elements. So? I never claimed I should either. Remember, I FOUND THE OFF-BY-ONE ERROR IN MY OWN CODE. You know, just like millions of other programmers do. The difference here is that I could not modify my code to correct it because these boards have a 120-minute maximum edit window. I mean, come on, it�s not like I intentionally overran the array by 1.

Oh yea, let me point out again that the error � which I myself found - was introduced by hastily translating between languages. There is no off-by-one error in the original.

DNAunion · 07-10-2003, 08:13 PM

Quote:

DNAunion: PPS: That mistake does not have any negative impact on the accuracy of the program's results...as both my and [somebody's] runs demonstrated. I wonder if [somebody] could explain to us why?

Quote:

Mageth: Blind luck?

Quote:

DNAunion: No, because of the way memory is allocated for array variables.

Quote:

Mageth: No, it's blind luck.

DNAunion: No, it�s not. But let me first better explain what I was actually asking, using biology and an analogy for a minute.

As frequently explained, a frameshift mutation occurs when a single nucleotide is inserted into a gene, throwing off the reading frame that processes the (triplet) codons. As an analogy, the codons in a gene could be represented as the individual three-letter words in the following, meaningful sentence:

|THE BIG RED FOX ATE THE BIG FAT HEN|

A frameshift mutation would occur if a single nucleotide (in the analogy, a single letter) were inserted into the sequence, as follows:

|XTH EBI GRE DFO XAT ETH EBI GFA THE| N

In this example, the single off-by-one shift turns every codon into nonsense (well, for one of the 9 it �merely� changes its meaning completely). A perfectly sensible sentence is transformed into complete �garbage�.

My question was aimed at the kind of catastrophe. The error that I myself found in my hastily done translation (note that the error was not present in the actual, original program), basically inserted a single blank element at the beginning of the array (the �gene�) made up of rows each consisting of three elements (triplet �codons�). Why then didn�t a �frameshift mutation� catastrophe occur?

The reason, as I stated, is that the two-dimensional array exists as a two-dimensional array only conceptually. In actuality, the elements are store contiguously in linear memory. Going back to the biology analogy that used a sentence, it is more a matter of:

|THE BIG RED FOX ATE THE BIG FAT HEN|

Becoming

|_THE BIG RED FOX ATE THE BIG FAT HE| N

In addition to an insertion being made, the processing of the �codons� was also shifted by the same amount, keeping the individual triplets together. Unlike in the former frameshift mutation example, were there was no compensating change, here, �96%� of the original meaning is still retained. It�s not a catastrophe.

The only problem that needs to be considered, as I said, is the very last element. (Concerning that last element, originally, I didn�t analyze the possibilities thoroughly enough and someone else spotted something I overlooked. That�s a different issue that I will address in a separate post).

Perhaps now you can see why I replied �No� when you offered just �Blind luck?� as the explanation. There�s an actual, logical, theoretical, explainable, honest-to-goodness, real reason that underlies the non-catastrophe: blind luck isn�t it.

DNAunion · 07-10-2003, 08:43 PM

Quote:

NialScorva: A multidimensional array is actually an array of arrays.

DNAunion: No it�s not...at least not the multidimensional array found in my program. It is, as I said, conceptually a two-dimensional, 9x3 �table� but exists in memory as a linear series of 27 elements stored contiguously.

You are changing the topic. Each element of my array is a single instance of a single primitive data type and the memory for the entire array is allocated at one time, at compile time, in one chunk of � shall I say it? Yes! � contiguous memory locations.

What you in your �they don�t have to be contiguous� section have is the first dimension containing pointers and each of the second set of arrays allocated dynamically, at runtime, using the new operator. That�s like comparing apples and oranges.

Quote:

NialScorva:

char foo[10][10];

is an array of 10 blocks of 10 contiguous bites. *Usually* these are stored contiguously, so that foo[0][9] and foo[1][0] are adjacent, but there is no such requirement in the language spec.

DNAunion: No, the single statement char foo[10][10]; does allocate 100 contiguous bytes of memory for that two-dimensional array, and foo[0][9] is adjacent in memory to foo[1][0].

Quote:

NialScorva: It's perfectly valid for foo to point to an array of pointers to arrays. In fact, you can see this in the similar notation:

*********************
char* x[10];
for(int i=0;i<10;i++)
x[i ]=new char[10];
*********************

You can then access x[0][1] and it should work and look just like a multidimensional array, though only the blocks of 10 in the final subscript are likely to be contigent.

DNAunion: No, all of the char pointers in the 10-element one-dimensional array x are stored contiguously.

And, each of the 10-char arrays that is allocated dynamically with new also consists of 10 contiguous elements.

The only �array things� that are not (or at least may not be) stored contiguously in memory are those 10 individual, completely separate 10-char arrays generated at different times, on the fly.

Quote:

NialScorva: Strictly speaking, the only assumption you can make is that the last subscript will be contiguous (eg pointer math will work within it's bounds) or it's possible that your program will break if you change compilers.

DNAunion: Anyone with an ANSI Standard-compliant C++ compiler can run this code and see that the elements of the first dimension � which stores pointers � are stored contiguously in memory.

Code:

#include <iostream>

using std::cout;
using std::endl;

int main() 
{
	int i;

	// purely fictitious person
	char *record[][2] = 
	{
		"fname", "Arnold", 
		"mi", "G.",
		"lname", "Miller", 
		"occupation", "auto mechanic.",
		"weight", "155 pounds",
		"height", "5' 2\"",
		"address", "96-X Frontage Road",
		"city", "Los Angeles",
		"state", "ME",
		"", ""
	};

	for(i = 0; *record[i][0]; i++) 
	{
		cout << &record[i][0] << "  ";
		cout << record[i][1] << endl;
	}

	return 0;
}

DNAunion · 07-10-2003, 09:17 PM

Quote:

LeftCoast: It is never a good idea to bet on your own irreplacability. No one is so valuable that they cannot be replaced.

DNAunion: Actually, to be honest, I've had to explain to others that I'm not irreplaceable ("You know [DNAunion], they could never get anyone to do what you do"). I am quite aware of how quickly one can be sent packing...I've seen it happen time after time where I work. Above, I qualified my original statements some...perhaps a bit more would have been better...BUT I WAS RUNNING LATE AND HAD TO GET BACK TO WORK! :-)

DNAunion · 07-10-2003, 09:41 PM

Quote:

DNAunion: The only issue that might arise is that the very last element overshoots the array bounds by 1 element�s worth. But, as long as the compiler doesn�t override my sequence of variable declarations to try some optimization, that doesn�t occur. All of the variables are defined at the very beginning of main(), and the last one defined is the array. So the array should be the last variable that memory is reserved for and overshooting the upper bound by one element does not cause a problem.

Quote:

NialScorva: Except that local variables stored on the stack, so the first time you call a subroutine you'll overwrite the last element with one of the registers.

DNAunion: Yes, you've got something there. In my haste to check the array out, I forgot the program even made function calls so completely overlooked the possibility you bring up.

But now you go way off course.

Quote:

NialScorva: If you were to access that last element inside of the subroutine, then it would gleefully overwrite the saved state so you'd get undefined behavior when you returned from the function and restored the registers to a state different than when you went in.

DNAunion: And if the moon crashed into the Earth, you and the rest of us would all be dead.

Of course, neither of our statements deals with the actual facts of the situation. In other words, my "random number" function does NOT access the array so your "point" is imaginary. If anyone is going to offer up negative comments on my code, they at least need to deal with reality, not sensationalism.

Quote:

NialScorva: A bigger problem is if the registers are preserved after the call is made. The block you're overwriting is the return address in that case, meaning that you have no idea where code will be executed when you return from the function.

DNAunion: Which again is not even possible with my program.

You are as much as stuffing code into my mouth. You aren�t offering hypotheticals when you say �you�re overwriting� � that�s stating a fact (which, of course, isn�t a fact); and when you say �you have no idea where code will be executed� � again, stated as fact when it is not even possible for my code to do. And even we accept them as hypotheticals (which we shouldn't considering how many times you state things as "fact"), they don't even pertain to my code and so are irrelevant.

Quote:

NialScorva: Not only that, but you'll corrupt the stack and be unable to see where your program is crashing.

DNAunion: In YOUR imaginary code that you are stuffing into my mouth, yeah, that might happen.

Quote:

NialScorva: If your array is on the heap, you have similar problems.

DNAunion: And if the sun crashed into the Earth, you and the rest of us would be dead.

Quote:

NialScorva: Once your heap is corrupted, all bets are off. Your program will run fine 9 times out of 10, but then crash hard.

DNAunion: No, once YOUR heap is corrupted by YOUR imaginary code, the YOU will have problems.

Quote:

NialScorva: This is *not* easily debugged.

DNAunion: Than I advise you to stop thinking about doing it!

DNAunion · 07-11-2003, 04:39 AM

DNAunion: Good, no one else spotted this yet. Here's another example of my finding my own error but not being able to correct it because of the 120-minute window for editing.

The following code does not do what I suggest. I don't have time to rewrite it now (I was just shaving and thought to myself, "Hey, that code doesn't do what I said it did" so rushed here to point it out before anyone else did.).

Here's the code that should not be criticized.

****************************************
DNAunion: Anyone with an ANSI Standard-compliant C++ compiler can run this code and see that the elements of the first dimension � which stores pointers � are stored contiguously in memory.

Code:

#include <iostream>

using std::cout;
using std::endl;

int main() 
{
	int i;

	// purely fictitious person
	char *record[][2] = 
	{
		"fname", "Arnold", 
		"mi", "G.",
		"lname", "Miller", 
		"occupation", "auto mechanic.",
		"weight", "155 pounds",
		"height", "5' 2\"",
		"address", "96-X Frontage Road",
		"city", "Los Angeles",
		"state", "ME",
		"", ""
	};

	for(i = 0; *record[i][0]; i++) 
	{
		cout << &record[i][0] << "  ";
		cout << record[i][1] << endl;
	}

	return 0;
}

********************************

07-10-2003, 12:51 PM	#52
sophie Veteran Member Join Date: May 2003 Location: On the road to extinction. . . Posts: 1,485	contiguous NialScorva, Of course you are right. When I was professional I used to call it virtual contiguity.

07-11-2003, 04:39 AM	#60
DNAunion Veteran Member Join Date: Jan 2001 Location: USA Posts: 1,072	DNAunion: Good, no one else spotted this yet. Here's another example of my finding my own error but not being able to correct it because of the 120-minute window for editing. The following code does not do what I suggest. I don't have time to rewrite it now (I was just shaving and thought to myself, "Hey, that code doesn't do what I said it did" so rushed here to point it out before anyone else did.). Here's the code that should not be criticized. **************************************** DNAunion: Anyone with an ANSI Standard-compliant C++ compiler can run this code and see that the elements of the first dimension � which stores pointers � are stored contiguously in memory. Code: #include <iostream> using std::cout; using std::endl; int main() { int i; // purely fictitious person char record[][2] = { "fname", "Arnold", "mi", "G.", "lname", "Miller", "occupation", "auto mechanic.", "weight", "155 pounds", "height", "5' 2\"", "address", "96-X Frontage Road", "city", "Los Angeles", "state", "ME", "", "" }; for(i = 0; record[i][0]; i++) { cout << &record[i][0] << " "; cout << record[i][1] << endl; } return 0; } ********************************

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search

07-10-2003, 10:29 AM	#51
Mageth Contributor Join Date: Jul 2001 Location: Deep in the heart of mother-lovin' Texas Posts: 29,689	Originally posted by DigitalChicken Well of course. That's why I used non-absolute terms. I understood that you knew that; my comment was meant to clarify the point to others. What you will find is that in a great number of cases.. if not a vast majority... those considerations are irrelevant in this day and age. That's true; most of my bad experiences with optimizing compilers came a few years back. Yes there is. The social context in which the code is written. I may program something that I will never touch again and some unknown person may pick it up. In that case, its of prime importance that the code be understandable. In my comment, I should have said "However, there's often no reason...". And I absolutely agree that the code should be understandable; I just don't think it's necessarily true that code must be made more complex to be more understandable. (I think we agree on this). That may be the case but its not always the case by far. It also depends on what is meant by "complex." For example, seperating a multi-step process into individual code blocks which are commented well may be less "complex" by one definition than another case where the same multi-step process might be written in a single "elegant" code block. I agree, I think. By "simplifying and streamlining" code, I mean removing unnecessary statements and complexity (note that I don't mean squeezing the code down as short as possible), and also clarifying the code (preferably making it easier to understand and more easily modifiable - an important goal which has been implied but not mentioned) by performing such actions as separating a multi-step process into individual code blocks where appropriate.

Freethought & Rationalism Archive

The archives are read only.