Note:
This webpage contains the PDB IDs for all protein sequences used (for training and testing) in research paper:
L. Lee, J. L. Leopold, R. L. Frank and A. M. Maglia, “Protein Secondary Structure Prediction Using Rule Induction from Coverings”, Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology 2009 (part of IEEE Symposium Series on Computational Intelligence 2009), Nashville, Tennessee, USA.
TABLE I, RESULTS FOR PROTEIN SECONDARY STRUCTURE PREDICTION (Click on links to view PDB IDs)
Training Set |
||||
Folding Type Classes |
Total Number of Proteins (SCOP) |
Number of Proteins |
Number of 5-Residue Segments |
Number of Rules (at 90% threshold) |
All-α |
7,999 |
47,955 |
203,636 |
|
All-β |
12,968 |
83,187 |
257,911 |
|
α/β |
12,199 |
107,900 |
319,361 |
|
α+β |
11,425 |
137,715 |
346,379 |
|
Test Set |
||||
Folding Type Classes |
Number of Proteins |
Number of Residues |
Q3 (%) |
|
All-α |
10,151 |
88.7 |
||
All-β |
17,627 |
80.2 |
||
α/β |
20,810 |
77.0 |
||
α+β |
12,379 |
78.9 |
||
Total |
223 |
60,967 |
80.3 |