Quick Search Adv. Search

J4

• article • Previous Articles     Next Articles

A Novel Method for Prediction of Protein Domain Using Distance-Based Maximal Entropy

Shu-xue Zou; Yan-xin Huang; Yan Wang; Chun-guang Zhou   

  1. Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College
    of Computer Science and Technology, Jilin University, Changchun 130012, P. R. China
  • Received:2008-01-18 Revised:2008-04-18 Online:2008-09-30 Published:2008-04-18
  • Contact: Chun-guang Zhou

Abstract: Detecting the boundaries of protein domains is an important and challenging task in both experimental and computational structural biology. In this paper, a promising method for detecting the domain structure of a protein from sequence information alone is presented. The method is based on analyzing multiple sequence alignments derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence. Then they are combined into a single predictor using support vector machine. What is more important, the domain detection is first taken as an imbal-anced data learning problem. A novel undersampling method is proposed on distance-based maximal entropy in the feature space of Support Vector Machine (SVM). The overall precision is about 80%. Simulation results demonstrate that the method can help not only in predicting the complete 3D structure of a protein but also in the machine learning system on general im-balanced datasets.

Key words: SVM, protein domain boundary, imbalanced data learning, distance-based maximal entropy