Machine Learning on noisy genome data. Scikit-learn python
I want to classify data using three dimensions, lets call them: A,B, and C
B and C are almost always positively correlated. B+C and A are usually negatively correlated. However C is usually an “all or none” statistic; we see it sometimes but not always.
With this in mind I chose to classify data using Linear Discriminant Analysis in the scikit-learn python library. http://scikit-learn.org/stable/modules/generated/sklearn.lda.LDA.html
I’m not entirely married to LDA but my PI would like to keep a linear model.
I would like to train the data but apply a weight expressed in this pseudo-code
lda = LDA.() lda.train(trainX,trainY, weights=('None','None',"all_or_none") ) # "all_or_none" indicates that when C is absent to NOT penalize the prediction
I’m a little naive in machine learning, maybe there’s another way to do this in scikit-learn?
Via: Google Alert for ML