Freethought & Rationalism Archive

KC · 01-22-2003, 08:53 AM

I was hoping some Mo Bio types could help me here. Can a protein's folding be directly predicted from the DNA gene sequence for that protein? Any references would be appreciated.

Cheers,

KC

Godbert · 01-22-2003, 10:17 AM

No, we are nowhere near that. IBM is building a $100 000 000 computer for that purpose right now, but it's also a question of sotware and the calculation will probably take as long as it would take to get the structure with conventional methods. You can get approximate structure if you already have the strtucture of another protein that has very high sequence identity with your protein. But it is still not correct.
I'd say we'll be stuck wiht NMR and X-ray crystallography to get protein structures at least until 2010. Those methods are currently in the process of being automatized by lots of firms, so we might actually already have most important structures before we can calculate protein folding.
Furthermore if you do it experimentally you can be reasonably sure that you have the actual structure, whereas when you calculate the structure you could just be in a false energy minimum. For instance there could be a chaperone molecule involved you don't know about that steers the protein folding in a completely different direction.

Godbert · 01-22-2003, 10:25 AM

Actually, the biggest hope right now is to make a database of all small folds in so far solved structures (about 20000) correlated with sequence and one day have most naturally occuring folds in that database. You would then search the new sequence for sequence homology with those known folds and piece the structure together from those small subfolds taking into account known global folding patterns. But we'll still need more experimentally determined structures before that can work

theyeti · 01-22-2003, 10:29 AM

No, it cannot.

And it's not as if people haven't tried. It's considered one of the Holy Grails of biochemistry to be able to predict a protein's 3-D structure from its primary sequence alone, and they're getting better at it all the time, but so far no dice. There are yearly (or less frequent) competitions put on by several groups in which crystalographers withold their structural data, and the modelers try their best to predict what the structures will be. The results are inconsistent; sometimes one prediction will come close, but with other proteins the same method by the same group will yeild a completely wrong answer. There are methods that seem to work in some cases but completely suck in others, and no one method can consistently come close all of the time. No one knows why. If anyone ever figures out how to solve the protein folding problem, that person is guarunteed a Nobel prize.

Currently, the only way to accurately predict a protein's structure is through homology modeling, where you take an already resolved structure for a closely related protein and use that as a template for modeling your own protein. This works reasonably well most of the time as long as your sequence identity is > 50%. Once you go below about 20-30%, you can't really rely on your structure having a relationship wtih reality anymore. It's still not a bad idea to make one as a working hypothesis, for instance in order to guide mutagenesis experiments, but the structure could be completely different.

If you want an idea of what protein modeling is like and what programs are used, you can check out a project that I did about a year ago here.

theyeti

KC · 01-22-2003, 10:39 AM

Many, many thanks for the replies! I hadn't heard that it was possible yet. Is it possible in theory? That is, should we be able to derive a protein's folding strictly from the DNA sequence, or is folding determined by additional factors?

Cheers,

KC

Principia · 01-22-2003, 11:36 AM

Some relevant review articles:

Quote:

Curr Opin Struct Biol 2002 Apr;12(2):176-81

Ab initio protein structure prediction.

Hardin C, Pogorelov TV, Luthey-Schulten Z.

Center for Biophysics and Computational Biology, University of Illinois, 600 South Mathews Avenue, Urbana, Illinois 61801, USA.

Steady progress has been made in the field of ab initio protein folding. A variety of methods now allow the prediction of low-resolution structures of small proteins or protein fragments up to approximately 100 amino acid residues in length. Such low-resolution structures may be sufficient for the functional annotation of protein sequences on a genome-wide scale. Although no consistently reliable algorithm is currently available, the essential challenges to developing a general theory or approach to protein structure prediction are better understood. The energy landscapes resulting from the structure prediction algorithms are only partially funneled to the native state of the protein. This review focuses on two areas of recent advances in ab initio structure prediction-improvements in the energy functions and strategies to search the caldera region of the energy landscapes.

Quote:

Annu Rev Biophys Biomol Struct 2001;30:173-89

Ab initio protein structure prediction: progress and prospects.

Bonneau R, Baker D.

Department of Biochemistry, University of Washington, Seattle, Washington, Box 357350, 98195, USA. dabaker@u.washington.edu

Considerable recent progress has been made in the field of ab initio protein structure prediction, as witnessed by the third Critical Assessment of Structure Prediction (CASP3). In spite of this progress, much work remains, for the field has yet to produce consistently reliable ab initio structure prediction protocols. In this work, we review the features of current ab initio protocols in an attempt to highlight the foundations of recent progress in the field and suggest promising directions for future work.

Quote:

J Struct Biol 2001 May-Jun;134(2-3):186-90

Functional inferences from blind ab initio protein structure predictions.

Bonneau R, Tsai J, Ruczinski I, Baker D.

Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA.

Ab initio protein structure prediction methods have improved dramatically in the past several years. Because these methods require only the sequence of the protein of interest, they are potentially applicable to the open reading frames in the many organisms whose sequences have been and will be determined. Ab initio methods cannot currently produce models of high enough resolution for use in rational drug design, but there is an exciting potential for using the methods for functional annotation of protein sequences on a genomic scale. Here we illustrate how functional insights can be obtained from low-resolution predicted structures using examples from blind ab initio structure predictions from the third and fourth critical assessment of structure prediction (CASP3, CASP4) experiments.

Quote:

Nature 2002 Nov 14;420(6912):218-23

The structure of the protein universe and genome evolution.

Koonin EV, Wolf YI, Karev GP.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA. koonin@ncbi.nih.gov

Despite the practically unlimited number of possible protein sequences, the number of basic shapes in which proteins fold seems not only to be finite, but also to be relatively small, with probably no more than 10,000 folds in existence. Moreover, the distribution of proteins among these folds is highly non-homogeneous -- some folds and superfamilies are extremely abundant, but most are rare. Protein folds and families encoded in diverse genomes show similar size distributions with notable mathematical properties, which also extend to the number of connections between domains in multidomain proteins. All these distributions follow asymptotic power laws, such as have been identified in a wide variety of biological and physical systems, and which are typically associated with scale-free networks. These findings suggest that genome evolution is driven by extremely general mechanisms based on the preferential attachment principle.

A guide to protein structure prediction

Resouces for protein modeling from Oak Ridge National Laboratory

That's it for now...

theyeti · 01-22-2003, 11:38 AM

Quote:

Originally posted by KC
Many, many thanks for the replies! I hadn't heard that it was possible yet. Is it possible in theory? That is, should we be able to derive a protein's folding strictly from the DNA sequence, or is folding determined by additional factors?

That's a good question. I don't really know for sure, but I suspect that the answer is "yes", but only if one takes into account the subcellular environment. There are proteins that help other proteins fold properly, like chaperonins and foldases, mostly by preventing them from forming aggregates and by getting them out of local energy minima. Whether or not one needs to take these into account in order to predict a protein's structure is questionable. In general I think that a protein's native structure is whatever its global free energy minimum is. There may be exceptions, but if so I can't think of any off hand. The difficulty in protein folding is reaching that global minimum without getting stuck at a local minimum, in which case the protein must often times be degraded by proteolysis. I suspect though that the free energy minimum is very dependent on things like pH and the ionic strength of the solution, and these can be quite difficult to predict for crowded subcellular compartments. This is probably why certain modeling approaches work well with some proteins but not with others.

theyeti

lpetrich · 01-22-2003, 02:06 PM

A quick clarification: proteins and DNA are built out of two different kinds of building blocks: amino acids for proteins and nucleotides for nucleic acids (DNA and a close chemical relative, RNA). Sets of three nucleotides (codons) are translated into each amino acid according to a "genetic code" that is nearly constant across the Earth's biota.

So if you have a gene sequence, you can determine the corresponding protein sequence very easily. But as pointed out earlier, it's the folding that's the really difficult part.

If you wish to take part in a protein-folding distributed-computing effort, check out folding@home.

Strictly speaking, a gene sequence translates into a "polypeptide chain", a single-strand protein. But many proteins are multi-stranded, composed of multiple subproteins, and many proteins also have non-amino-acid "prosthetic groups" attached to them.

Hemoglobin, for example, has this structure:

a b
b a

where a and b are its alpha and beta subunits. And each of these has a heme group attached, a porphyrin group with an iron atom in the middle.

KC · 01-22-2003, 02:12 PM

Quote:

Originally posted by lpetrich
A quick clarification: proteins and DNA are built out of two different kinds of building blocks: amino acids for proteins and nucleotides for nucleic acids (DNA and a close chemical relative, RNA). Sets of three nucleotides (codons) are translated into each amino acid according to a "genetic code" that is nearly constant across the Earth's biota.

So if you have a gene sequence, you can determine the corresponding protein sequence very easily. But as pointed out earlier, it's the folding that's the really difficult part.

Yes. I had been doing some thinking on just how much specification there is in DNA, and I wondered if it was possible to derive the folding structure of a protein from the amino acid chains specified by the DNA sequence.

Cheers,

KC

Godbert · 01-22-2003, 08:49 PM

Quote:

Originally posted by theyeti
In general I think that a protein's native structure is whatever its global free energy minimum is. There may be exceptions, but if so I can't think of any off hand.

This may be more than you actually wanted to know but I don't think that the above is the case. A local minimum can easily win over the global minimum if it's folding path has lower energy transition states. Faster folding would then steer the protein so far in the direction of the local minimum that achieving the global minimum would be 'impossible' to achieve. Additionally the protein already starts to fold while it is being transcribed and very likely not towards the global minimum. Also it makes sense for proteins functionally to not be too stable since their function very often requires structural changes.
So I would say simply calculating the global free energy minimum will not give you the right structure in the majority of cases.

PS:Another interesting example is the famous prion protein of BSE which has two different folded structures. One is that of the normal form occuring in healthy tissue. But when enough of a differently folded version of that protein is supplied a self-catalysed process occurs that results in all the proteins stably converting to the other, extremely stable folding pattern. This form is so stable that it is not even digested when consumed, but nevertheless is not the normally naturally occuring form. Of course even that does not mean it is at the global minimum.

01-22-2003, 08:53 AM	#1
KC Senior Member Join Date: Mar 2002 Location: San Narcisco, RRR Posts: 527	Protein Folding Question I was hoping some Mo Bio types could help me here. Can a protein's folding be directly predicted from the DNA gene sequence for that protein? Any references would be appreciated. Cheers, KC

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search

01-22-2003, 10:17 AM	#2
Godbert Senior Member Join Date: Dec 2002 Location: southern california Posts: 779	No, we are nowhere near that. IBM is building a $100 000 000 computer for that purpose right now, but it's also a question of sotware and the calculation will probably take as long as it would take to get the structure with conventional methods. You can get approximate structure if you already have the strtucture of another protein that has very high sequence identity with your protein. But it is still not correct. I'd say we'll be stuck wiht NMR and X-ray crystallography to get protein structures at least until 2010. Those methods are currently in the process of being automatized by lots of firms, so we might actually already have most important structures before we can calculate protein folding. Furthermore if you do it experimentally you can be reasonably sure that you have the actual structure, whereas when you calculate the structure you could just be in a false energy minimum. For instance there could be a chaperone molecule involved you don't know about that steers the protein folding in a completely different direction.

01-22-2003, 10:25 AM	#3
Godbert Senior Member Join Date: Dec 2002 Location: southern california Posts: 779	Actually, the biggest hope right now is to make a database of all small folds in so far solved structures (about 20000) correlated with sequence and one day have most naturally occuring folds in that database. You would then search the new sequence for sequence homology with those known folds and piece the structure together from those small subfolds taking into account known global folding patterns. But we'll still need more experimentally determined structures before that can work

01-22-2003, 10:29 AM	#4
theyeti Veteran Member Join Date: Jun 2001 Location: Denver, CO, USA Posts: 9,747	No, it cannot. And it's not as if people haven't tried. It's considered one of the Holy Grails of biochemistry to be able to predict a protein's 3-D structure from its primary sequence alone, and they're getting better at it all the time, but so far no dice. There are yearly (or less frequent) competitions put on by several groups in which crystalographers withold their structural data, and the modelers try their best to predict what the structures will be. The results are inconsistent; sometimes one prediction will come close, but with other proteins the same method by the same group will yeild a completely wrong answer. There are methods that seem to work in some cases but completely suck in others, and no one method can consistently come close all of the time. No one knows why. If anyone ever figures out how to solve the protein folding problem, that person is guarunteed a Nobel prize. Currently, the only way to accurately predict a protein's structure is through homology modeling, where you take an already resolved structure for a closely related protein and use that as a template for modeling your own protein. This works reasonably well most of the time as long as your sequence identity is > 50%. Once you go below about 20-30%, you can't really rely on your structure having a relationship wtih reality anymore. It's still not a bad idea to make one as a working hypothesis, for instance in order to guide mutagenesis experiments, but the structure could be completely different. If you want an idea of what protein modeling is like and what programs are used, you can check out a project that I did about a year ago here. theyeti

01-22-2003, 10:39 AM	#5
KC Senior Member Join Date: Mar 2002 Location: San Narcisco, RRR Posts: 527	Many, many thanks for the replies! I hadn't heard that it was possible yet. Is it possible in theory? That is, should we be able to derive a protein's folding strictly from the DNA sequence, or is folding determined by additional factors? Cheers, KC

01-22-2003, 02:06 PM	#8
lpetrich Contributor Join Date: Jul 2000 Location: Lebanon, OR, USA Posts: 16,829	A quick clarification: proteins and DNA are built out of two different kinds of building blocks: amino acids for proteins and nucleotides for nucleic acids (DNA and a close chemical relative, RNA). Sets of three nucleotides (codons) are translated into each amino acid according to a "genetic code" that is nearly constant across the Earth's biota. So if you have a gene sequence, you can determine the corresponding protein sequence very easily. But as pointed out earlier, it's the folding that's the really difficult part. If you wish to take part in a protein-folding distributed-computing effort, check out folding@home. Strictly speaking, a gene sequence translates into a "polypeptide chain", a single-strand protein. But many proteins are multi-stranded, composed of multiple subproteins, and many proteins also have non-amino-acid "prosthetic groups" attached to them. Hemoglobin, for example, has this structure: a b b a where a and b are its alpha and beta subunits. And each of these has a heme group attached, a porphyrin group with an iron atom in the middle.

Freethought & Rationalism Archive

The archives are read only.