BLAST-RT-RICO: Protein secondary structure prediction using BLAST and Relaxed Threshold Rule Induction from Coverings

 

This webpage contains the Perl programs and the protein data used (for training and testing) in the research paper:

L. Lee, J. L. Leopold, and R. L. Frank, “Protein secondary structure prediction using BLAST and Relaxed Threshold Rule Induction from Coverings”.

Data Source:

Perl Programs and Data Download (Step by Step):

Step 0, Preliminary Data Preparation: Match data from the CB396 set and the RS126 set against latest secondary structure data from the PDB database. <Download Program and Data (60MB .rar file)>

Step 1, Online BLAST, PDB Data Match and Data Preparation: BLAST search for each test protein, match PDB secondary structure data, and prepare data files for rule generation. <Download Program and Data (18MB .rar file)>

Step 2, RT-RICO Rule Generation and prediction: Main rule generation algorithm/program and main prediction algorithm/program. <Download Program and Data (134MB .rar file)>

Step 3, Q3 score calculation. <Download Program and Data (563KB .rar file)>

 

TABLE - Protein secondary structure prediction using BLAST-RT-RICO approach on RS126 and CB396 test datasets
(Same as Table 6 in the research paper)

 

Training Dataset (For An Individual Protein)
For Test Dataset
Max. No. of Proteins
Min. No. of Proteins
Ave. No. of Proteins
RS126
495
1
41.29
CB396
158
1
15.91
     
For Test Dataset
Max. No. of
5-Residue Segments
Min. No. of
5-Residue Segments
Ave. No
5-Residue Segments
RS126
107,765
35
8,467
CB396
42,938
20
4,480
       
For Test Dataset
Max. No. of Rules (at 90% threshold)
Min. No. of Rules (at 90% threshold)
Ave. No. of Rules (at 90% threshold)
RS126

89,235
668
21,242
CB396

98,743
379
18,596
       
  Test Dataset (all protein domains)
Test Dataset
No. of Proteins Using Offline Preprocessing
Total No. of Residues of the Test Dataset
Q3 (%)
RS126
1
23,416
89.93
CB396
9
62,657
87.71