Current
research is examining machine learning techniques for
proteomic classification and marker selection Using sample
fractionation with SELDI-TOF MS. Over the
last couple of years, technologies such as surface-enhanced
laser desorption/ionization (SELDI) time-of-flight mass
spectrometry have dramatically changed the study of
Proteomic. Yet, as data is generated in an increasingly
rapid and automated manner, novel and application-specific
computational methods will be needed to deal with all this
information. This paper explores methods that can be used to
glean informative marker and classification profiles from
Proteomic and genomic data. Then, these methods are applied
to clonal hematological disorders in order to arrive at a
diagnostic profile. In doing so, novel proteomic markers,
clustering, and classification profiles for these malignancies
will be presented within the context of SELDI.
A
SELDI-based procedure was developed to analyze serum from 74
patients with disease and 39 control patients from Harvard
Medical School (USA) and University of Dusseldorf (Germany).
The serum was separated into pH5, pH9, organic, and whole
serum fractions- and then analyzed by SELDI. As part of this,
novel methodologies were developed to facilitate the
automation of the process both in computational methods and in
robotic sample preparation.
Machine learning methods ranging from a Bayesian framework to
support vector machines, k-nearest neighbors, logistic
regression, decision trees, and others were used to find
highly specific and sensitive profiles for prediction of these
disorders and clinical subclasses. Comparison between
predictors that distinguish malignant samples from control is
explored with regard to the orthogonal data it provides over
current pre-bone biopsy information. A high specificity might
reduce the frequencies of biopsies. Highest specificity was
found using SVM and logistic regression (89%). Using a
decision tree approach and pruning, performance accuracy,
sensitivity, and specificity were found to be 80%, 73%, and
85%. This was accomplished by using only three simple
decision rules with five protein markers, a fact that makes it
much more clinically feasible than the current SELDI
literature for any other disease currently explored (which
usually includes profiles using over one hundred proteins).
Also, it is more feasible to test for proteins clinically
from a blood draw than to do a large genetic profile.
In
summary, protocols involving various novel SELDI sample
preparation and machine learning techniques have been applied
and compared for utilization in SELDI profile and marker
discovery. Evidence is provided for the first feasible
protein profiles that can be exploited for these malignancies
(and subtypes). In addition, the power of Proteomic over
genomic approaches both in terms of feasibility and
performance is discussed.
Authors/contributors:
Gil Alterovitz, Manuel Aivado, Towia Libermann, Marco Ramoni,
Isaac S. Kohane