Computer Science Department
Algorithms for Searching a Proteomic Database
Stephen Topper
The field of bioinformatics is heavily dependent on computers. Due to the complexity of DNA sequences, analysis of proteomic data must be done by computers. However, even with the computerized assistance, some analysis, especially comparison of protein sequences, can still take a long time. This makes minimization of running time a critical factor. This project will implement several different algorithms designed to take a protein sequence as a query and return the top ten most closely related sequences from a proteomic database. These algorithms will require the creation of a simple database to store proteomic information. In addition, a sufficient amount of sequences will have to be added to the database in order to have something to search. The main points of the project, however, are the algorithms that will be implemented. Two algorithms that will be implemented are widely used substitution matrices for sequence comparison, PAM and BLOSUM. These matrices compare two sequences based on identity of the sequences and chance of mutation for each position in the sequence. A basic identity search will also be implemented.