Results from Biovista’s BioLab Experiment Assistant tool as applied to data mining will be presented at the Second European Workshop on Data Mining and Text Mining for Bioinformatics. The workshop is held in conjunction with the 15th European Conference on Machine Learning (ECML) and the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) both co-located in Pisa, Italy, September 20-24, 2004.

 

Abstract

Discovering the interactions between genes and proteins is seen as one of the core tasks in molecular biology. The quantity of research results in this area is growing at such a rate that it is very difficult for individual researchers to keep track of them. As such results appear mainly in the form of scientific articles, it is necessary to process them in an efficient manner in order to be able to extract the relevant results.

Many databases exist that aim at consolidating the newly gained knowledge in a format that is easily accessible and searchable, however the creators of such databases normally make use of human readers who manually ‘curate’ the relevant papers. This is an expensive and time consuming process, besides, there might be a significant time lag between the publication of a result and its introduction into such databases.

In this paper we propose a method for discovery of interactions between genes and proteins from the scientific literature, based on a complete syntactic analysis of the corpus. We report on preliminary results.