Measuring Feature Selection Performance¶
This module can be used to evaluate feature selection methods via K-fold cross validation.
-
class
picturedrocks.performance.
FoldTester
(adata)¶ Performs K-fold Cross Validation for Marker Selection
FoldTester
can be used to evaluate various marker selection algorithms. It can split the data in K folds, run marker selection algorithms on these folds, and classify data based on testing and training data.Parameters: adata (anndata.AnnData) – data to slice into folds -
classify
(classifier)¶ Classify each cell using training data from other folds
For each fold, we project the data onto the markers selected for that fold, which we treat as test data. We also project the complement of the fold and treat that as training data.
Parameters: classifier – a classifier that trains with a training data set and predicts labels of test data. See NearestCentroidClassifier for an example. Note
The classifier should not attempt to modify data in-place. Any preprocessing should be done on a copy.
-
loadfolds
(file)¶ Load folds from a file
The file can be one saved either by
FoldTester.savefolds()
orFoldTester.savefoldsandmarkers()
. In the latter case, it will not load any markers.See also
-
loadfoldsandmarkers
(file)¶ Load folds and markers
Loads a folds and markers file saved by
FoldTester.savefoldsandmarkers()
Parameters: file (str) – filename to load from (typically with a .npz
extension)See also
-
makefolds
(k=5, random=False)¶ Makes folds
Parameters:
-
savefolds
(file)¶ Save folds to a file
Parameters: file (str) – filename to save (typically with a .npz
extension)
-
savefoldsandmarkers
(file)¶ Save folds and markers for each fold
This saves folds, and for each fold, the markers previously found by
FoldTester.selectmarkers()
.Parameters: file (str) – filename to save to (typically with a .npz
extension)
-
selectmarkers
(select_function)¶ Perform a marker selection algorithm on each fold
Parameters: select_function (function) – a function that takes in an AnnData
object and outputs a list of gene markers, given by their indexNote
The select_function should not attempt to modify data in-place. Any preprocessing should be done on a copy.
-
-
class
picturedrocks.performance.
NearestCentroidClassifier
¶ Nearest Centroid Classifier for Cross Validation
Computes the centroid of each cluster label in the training data, then predicts the label of each test data point by finding the nearest centroid.
-
test
(Xtest)¶
-
train
(adata)¶
-
-
class
picturedrocks.performance.
PerformanceReport
(y, yhat)¶ Report actual vs predicted statistics
Parameters: - y (numpy.ndarray) – actual cluster labels, (N, 1)-shaped numpy array
- yhat (numpy.ndarray) – predicted cluster labels, (N, 1)-shaped numpy array
-
confusionmatrixfigure
()¶ Compute and make a confusion matrix figure
Returns: confusion matrix Return type: plotly figure
-
getconfusionmatrix
()¶ Get the confusion matrix for the latest run
Returns: array of shape (K, K), with the [i, j] entry being the fraction of cells in cluster i that were predicted to be in cluster j Return type: numpy.ndarray
-
printscore
()¶ Print a message with the score
-
show
()¶ Print a full report
This uses iplot, so we assume this will only be run in a Jupyter notebook and that init_notebook_mode has already been run.
-
wrong
()¶ Returns the number of cells misclassified.
-
picturedrocks.performance.
kfoldindices
(n, k, random=False)¶ Generate indices for k-fold cross validation
Parameters: Yields: numpy.ndarray – array of indices in each fold
-
picturedrocks.performance.
merge_markers
(ft, n_markers)¶
-
picturedrocks.performance.
truncatemarkers
(ft, n_markers)¶