Target selection is the process a lab or individual uses to identify proteins for structure determination studies.
Different laboratories have different reasons for choosing targets for structure determination. Some labs may be trying to solve the structure of individual proteins, while others may wish to target particular protein families. Many structural genomics laboratories have much more flexibility when it comes to selecting targets and generally this is where 3D-SPECS data will be most applicable. For example, selecting 100 XtalPred class 1 protein regions is statistically likely to result in more structures than selecting 100 XtalPred class 5 protein regions (How much more likely? see section What do XtalPred Crystallisation Classes mean?).
Target selection is something of a chicken-and-egg problem. How can you select the best targets without first selecting the best constructs/truncations? 3D-SPECS solves this problem by calculating an approximate 'best' construct by matching human proteins to PDB templates and uses XtalPred to calculate the likelihood of crystallisation success. By doing this for all human proteins, 3D-SPECS automatically identifies protein candidates with a high predicted crystallisation success. These regions can be optimised by following good construct design principles.
Whether you are trying to solve the structure of one protein or a hundred proteins, the prediction data in 3D-SPECS can be useful for designing constructs that maximise the chance of crystallisation success.
It is important to avoid cutting off any critical secondary structures that a responsible for protein stability or for important protein-protein interactions (e.g. in multimer formation). The tricky part of construct design is knowing what are 'critical' residues and which are not. Generally, the best guide for where to start/stop a protein truncation is to identify a protein in the Protein Data Bank (PDB) that has then same folding arrangement of secondary structures. Then to begin/end the construct near to the start/stop of the PDB template after accurately aligning the two proteins.
The impact of including / excluding certain residues can be estimated by assessing the 3D structure of the PDB template and asking questions like:
If these questions seem a bit daunting, don't worry, by selecting multiple start and stop sites (e.g. 2, 3 or 4 start, 2, 3 or 4 stop) you can experimentally test between 4 and 16 constructs (depending on your experimental pipeline bandwidth). Around 9 constructs (3x3) with a start/stop position separation of 5-10 residues, makes for a good sampling compromise. There is no magic 'optimum' number of constructs. The more constructs you make the more chance you have of finding the right region to express/purify/crystallise. At the end of the day, whether a protein region expresses and can be purified is more important than the theoretical construct design and/or how many constructs were required or what sampling strategy it took you to find the 'expressable' region.
3D-SPECS gives the regions with the best alignments to PDB templates on the 'SUMMARY' page in the 'Crystallisable Regions' table. The individual alignment(s) can be viewed under the 'TEMPLATES' tab. The combination of the PDB template information and the secondary structure detail gives the construct designer the opportunity to adjust the start/stop positions to accommodate the unique features of each query sequence.
The 'XTALPRED' tab shows the XtalPred scores for many 'in silico' truncations. This gives the construct designer an indication of whether removal of residues at each terminus will help or hinder crystallisation. However it is best to combine this information with the PDB template data rather than simply use the XtalPred scores as a guide on their own. As a general rule, 3D information/knowledge should always take priority over sequence-based predictions (of which XtalPred is an example).
And in terms of real-world success (based on XtalPred benchmarking in Slabinski et al Protein Science, 2007) using structural genomics data from a database called TargetDB these classes roughly translate into percent crystallisation success:
These are the results from testing ~4,000 different proteins. Notice that XtalPred Class 5 has a considerably lower success rate than the other classes. The method that the XtalPred algorithm uses will put a protein in class 5 if it has any single bad property (not a combination of bad properties). This includes a long region of disorder (>40 residues), a single TM helix, or unusually long or short sequences (longer than 700 or shorter than 70).
For more detail, take a look at the local XtalPred page and the references therein or the remote XtalPred website help page