This webpage contains the Perl programs and the protein data used (for training and testing) in the research paper:
L. Lee, J. L. Leopold, and R. L. Frank, “Protein secondary structure prediction using BLAST and Relaxed Threshold Rule Induction from Coverings”.
Data Source:
Perl Programs and Data Download (Step by Step):
Step 0, Preliminary Data Preparation: Match data from the CB396 set and the RS126 set against latest secondary structure data from the PDB database. <Download Program and Data (60MB .rar file)>
Step 1, Online BLAST, PDB Data Match and Data Preparation: BLAST search for each test protein, match PDB secondary structure data, and prepare data files for rule generation. <Download Program and Data (18MB .rar file)>
Step 2, RT-RICO Rule Generation and prediction: Main rule generation algorithm/program and main prediction algorithm/program. <Download Program and Data (134MB .rar file)>
Step 3, Q3 score calculation. <Download Program and Data (563KB .rar file)>
TABLE - Protein secondary structure prediction using BLAST-RT-RICO approach on RS126 and CB396 test datasets
(Same as Table 6 in the research paper)
|
Training Dataset (For An Individual Protein) | ||
For Test Dataset | Max. No. of Proteins |
Min. No. of Proteins |
Ave. No. of Proteins |
RS126 | 495 |
1 |
41.29 |
CB396 | 158 |
1 |
15.91 |
For Test Dataset | Max. No. of 5-Residue Segments |
Min. No. of 5-Residue Segments |
Ave. No 5-Residue Segments |
RS126 | 107,765 |
35 |
8,467 |
CB396 | 42,938 |
20 |
4,480 |
For Test Dataset | Max. No. of Rules (at 90% threshold) |
Min. No. of Rules (at 90% threshold) |
Ave. No. of Rules (at 90% threshold) |
RS126 | 89,235 |
668 |
21,242 |
CB396 | 98,743 |
379 |
18,596 |
Test Dataset (all protein domains) | |||
Test Dataset | No. of Proteins Using Offline Preprocessing |
Total No. of Residues of the Test Dataset |
Q3 (%) |
RS126 | 1 |
23,416 |
89.93 |
CB396 | 9 |
62,657 |
87.71 |