cg
diff grant.txt @ 29:5e2e4732b647
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Mon Apr 13 03:43:51 2009 -0700 (16 years ago) |
parents | 01c118d1074b |
children | 6ec3230fe1dc |
line diff
1.1 --- a/grant.txt Mon Apr 13 03:31:42 2009 -0700
1.2 +++ b/grant.txt Mon Apr 13 03:43:51 2009 -0700
1.3 @@ -31,7 +31,7 @@
1.4
1.5 In the machine learning literature, this sort of procedure may be thought of as a __supervised learning task__, defined as a task in which the goal is to learn a mapping from instances to labels, and the training data consists of a set of instances (voxels) for which the labels (subregions) are known.
1.6
1.7 -Each gene expression level is called a __feature__, and the selection of which genes to include is called __feature selection__. Feature selection is one component of the task of learning a classifier. Some methods for learning classifiers start out with a separate feature selection phase, whereas other methods combine feature selection with other aspects of training.
1.8 +Each gene expression level is called a __feature__, and the selection of which genes\footnote{Strictly speaking, the features are gene expression levels, but we'll call them genes.} to include is called __feature selection__. Feature selection is one component of the task of learning a classifier. Some methods for learning classifiers start out with a separate feature selection phase, whereas other methods combine feature selection with other aspects of training.
1.9
1.10 One class of feature selection methods assigns some sort of score to each candidate gene. The top-ranked genes are then chosen. Some scoring measures can assign a score to a set of selected genes, not just to a single gene; in this case, a dynamic procedure may be used in which features are added and subtracted from the selected set depending on how much they raise the score. Such procedures are called "stepwise" or "greedy".
1.11
1.12 @@ -41,14 +41,10 @@
1.13
1.14
1.15 \vspace{0.3cm}**Principle 1: Combinatorial gene expression**
1.16 -
1.17 -Above, we defined an "instance" as the combination of a voxel with the "associated gene expression data". In our case this refers to the expression level of genes within the voxel, but should we include the expression levels of all genes, or only a few of them?
1.18 -
1.19 -It is too much to hope that every anatomical region of interest will be identified by a single gene. For example, in the cortex, there are some areas which are not clearly delineated by any gene included in the Allen Brain Atlas (ABA) dataset. However, at least some of these areas can be delineated by looking at combinations of genes (an example of an area for which multiple genes are necessary and sufficient is provided in Preliminary Results).
1.20 +It is too much to hope that every anatomical region of interest will be identified by a single gene. For example, in the cortex, there are some areas which are not clearly delineated by any gene included in the Allen Brain Atlas (ABA) dataset. However, at least some of these areas can be delineated by looking at combinations of genes (an example of an area for which multiple genes are necessary and sufficient is provided in Preliminary Results). Therefore, each instance should contain multiple features (genes).
1.21
1.22
1.23 \vspace{0.3cm}**Principle 2: Only look at combinations of small numbers of genes**
1.24 -
1.25 When the classifier classifies a voxel, it is only allowed to look at the expression of the genes which have been selected as features. The more data that is available to a classifier, the better that it can do. For example, perhaps there are weak correlations over many genes that add up to a strong signal. So, why not include every gene as a feature? The reason is that we wish to employ the classifier in situations in which it is not feasible to gather data about every gene. For example, if we want to use the expression of marker genes as a trigger for some regionally-targeted intervention, then our intervention must contain a molecular mechanism to check the expression level of each marker gene before it triggers. It is currently infeasible to design a molecular trigger that checks the level of more than a handful of genes. Similarly, if the goal is to develop a procedure to do ISH on tissue samples in order to label their anatomy, then it is infeasible to label more than a few genes. Therefore, we must select only a few genes as features.
1.26
1.27
1.28 @@ -317,4 +313,4 @@
1.29
1.30 note:
1.31
1.32 -do we need to cite: no known markers? impressive results?
1.33 +do we need to cite: no known markers, impressive results?