RT-RICO: Protein Secondary Structure Prediction Using Rule Induction from Coverings

Note:

This webpage contains the PDB IDs for all protein sequences used (for training and testing) in research paper:

L. Lee, J. L. Leopold, R. L. Frank and A. M. Maglia, “Protein Secondary Structure Prediction Using Rule Induction from Coverings”, Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology 2009 (part of IEEE Symposium Series on Computational Intelligence 2009), Nashville, Tennessee, USA.


TABLE I, RESULTS FOR PROTEIN SECONDARY STRUCTURE PREDICTION
(Click on links to view PDB IDs)

    Training
Set
   
Folding Type Classes
Total Number of Proteins (SCOP)
Number of Proteins
Number of 5-Residue Segments
Number of Rules (at 90% threshold)
All-α
7,999
199
47,955
203,636
All-β
12,968
323
83,187
257,911
α/β
12,199
304
107,900
319,361
α+β
11,425
567
137,715
346,379
Test Set
Folding Type Classes
Number of Proteins
Number of Residues
Q3 (%)
All-α
40
10,151
88.7
All-β
65
17,627
80.2
α/β
61
20,810
77.0
α+β
57
12,379
78.9
         
Total
223
60,967
80.3