Huntingtin linker sequence determination by computational methods – correspondence with Alex Holehouse

I firmly believe that the more scientists discuss their data with experts both within their field and outside of it, the better! Without peer review, constructive criticism and comprehensive evaluation of the data we all generate, we can not understand its limitations nor its value to others working in the same field.

In this vein, I was delighted to receive an email today from Alex Holehouse, a graduate student at Washington University, Saint Louis, who is supervised by Rohit Pappu. Alex specialises in computational biophysics with a specific interest in intrinsically disordered proteins which he analyses and detects using a variety of computational methods. In particular, his lab have recently set up the CIDER online server which allows users to determine these regions of disorder themselves.

Alex sent me his thoughts on the huntingtin protein as well as some data highlighting key regions of intrinsic disorder within the protein sequence:

I was reading through your recent posts on mapping domains in Htt and was intrigued – I’d always got it in my head that Huntingtin is fully disordered – i.e. the entire protein doesn’t adopt a specific tertiary structure but exists in an ensemble of different conformations (this is certainly true for Exon1, which is all we work with). 
However, this definitely doesn’t seem to be the case and it really looks like there are specific folded domains (based on the Phyre predictions, the initial proteolysis results and the sequence). I spend most of my time thinking about sequence-to-conformation relationships, and while determining a 3D folded structure from sequence is hard (slight understatement) often one can tell a) if a region is likely to be disordered or not and b) get some general ideas about what a disordered regions’ conformation behaviour is likely to be.
I was looking through the full Htt sequence, and while the majority of the protein outside exon 1 looks structured (with the exception of a region from ~400-670), there are a few short really well defined regions which would – based on sequence at least – be strong contenders for expanded and unstructured linkers. I’ve put together a figured (attached) which basically gives a map of the sequence with respect to a few different properties of interest.
The local disorder track describes the local sequence disorder propensity using IUPred-long [1]. We’ve worked with a bunch of different predictors over the years and IUPred generally does a reasonable job (on the whole, disorder predictors are actually pretty good). The disordered regions here are consistent with the meta-prediction from D2P2 ( which combines a whole host of disorder predictors. The take away from this track is that beyond ~700 there are very few regions predicted to be disordered except for these well defined and pretty short stretches (10-30 residues in length) – this pattern is consistent with what you’d expect for folded domains connected together by disordered linkers. 
The local charge density track quantifies what fraction of residues in a 20-residue sliding window are charged. Many of the places where we see these disordered regions in Htt show a spike in charge density, consistent with them being enriched in charged residues. Enrichment for charge is a common feature of expanded linkers – by which I mean linkers which are more stretched out than one might expect for a random coil, suggesting that these short stretches are not just disordered, but are actually expanded in terms of their conformational ensemble.
The local hydrophobicity track uses the Kyte-Doolite hydrophobicity table [2] to quantify the hydrophobicity within a 20-residue sliding window. While most of the protein shows a fairly high degree of hydrophobicity (consistent with folded regions), the disordered stretches show a substantial drop in hydrophobicity. This isn’t too surprising – if we see an increase in the fraction of charged residues we would expect a change in composition towards less hydrophobic residues, but this just serves as an alternative way to think about the local sequence composition.
Finally, local net charge track quantifies the net charge by considering the net impact of Asp/Glu/Arg/Lys residues. As well as showing a high charge density (as shown by the local charge density track) the local net charge track shows that most of the linkers have an overabundance of negatively charged residues, leading to a strong net-negative charge. This means they have a polyelectrolytic character – i.e. they’re self-repulsive due to the net negative charge, which further suggests these would be expanded regions (i.e. it’s hard to push a whole bunch of negatively charged residues together).
Having looked at these more general properties, I went back to examine the actual amino acid sequence of the various disordered regions (shown below). One thing that struck me was that as well as being enriched for charged residues, these short stretches also have a high abundance of serine, proline and glycine residues, the residues most often observed in linkers.
Taken together, my hunch is that whatever domain structure Htt has, at least some of these disordered stretches represent inter-domain linkers that may help you define where the folded domains lie along the sequence. Which is not to say there couldn’t be multiple folded domains within the large ordered regions, but I’d be very surprised if these apparently disordered regions were located within a folded domain.
I agree with Alex’s assertions and his predicted intrinsically disordered regions correlate well with the interdomain regions I have calculated from the limited proteolysis experiments as well as 3D structure prediction analyses. I too am concerned that the disordered regions Alex refers to, and that I see as proteolytically accessible, could just be longer loop regions connecting a continuous higher order alpha helical structure but hopefully expression analyses of the putative domains will provide supporting or contrasting evidence for or against this notion. What we really need is a high resolution structure of the complete protein but this remains rather challenging!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.