cg
diff grant.txt @ 63:af5fd52f453f
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Sun Apr 19 15:23:53 2009 -0700 (16 years ago) |
parents | ecf330fcfba3 |
children | 54ac7984b164 |
line diff
1.1 --- a/grant.txt Sun Apr 19 14:50:20 2009 -0700
1.2 +++ b/grant.txt Sun Apr 19 15:23:53 2009 -0700
1.3 @@ -50,6 +50,7 @@
1.4 \vspace{0.3cm}**Principle 2: Only look at combinations of small numbers of genes**
1.5 When the classifier classifies a voxel, it is only allowed to look at the expression of the genes which have been selected as features. The more data that is available to a classifier, the better that it can do. For example, perhaps there are weak correlations over many genes that add up to a strong signal. So, why not include every gene as a feature? The reason is that we wish to employ the classifier in situations in which it is not feasible to gather data about every gene. For example, if we want to use the expression of marker genes as a trigger for some regionally-targeted intervention, then our intervention must contain a molecular mechanism to check the expression level of each marker gene before it triggers. It is currently infeasible to design a molecular trigger that checks the level of more than a handful of genes. Similarly, if the goal is to develop a procedure to do ISH on tissue samples in order to label their anatomy, then it is infeasible to label more than a few genes. Therefore, we must select only a few genes as features.
1.6
1.7 +The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task combines feature selection with supervised learning.
1.8
1.9
1.10 \vspace{0.3cm}**Principle 3: Use geometry in feature selection**
1.11 @@ -289,7 +290,7 @@
1.12
1.13 Now, for each region, we created and ran a forward stepwise procedure which attempted to find pairs of gene expression boolean masks such that the conditional entropy of the target area's boolean mask, conditioned upon the pair of gene expression boolean masks, is minimized.
1.14
1.15 -This finds pairs of genes which are most informative (at least at these discretization thresholds) relative to the question, "Is this surface pixel a member of the target area?".
1.16 +This finds pairs of genes which are most informative (at least at these discretization thresholds) relative to the question, "Is this surface pixel a member of the target area?". Its advantage over linear methods such as logistic regression is that it takes account of arbitrarily nonlinear relationships; for example, if the XOR of two variables predicts the target, conditional entropy would notice, whereas linear methods would not.
1.17
1.18
1.19 \vspace{0.3cm}**Gradient similarity**
1.20 @@ -356,18 +357,18 @@
1.21 \label{hole}\end{figure}
1.22
1.23
1.24 -
1.25 -=== Specific to Aim 1 (and Aim 3) ===
1.26 +\vspace{0.3cm}**Feature selection integrated with prediction**
1.27 +As noted earlier, in general, any predictive method can be used for feature selection by running it inside a stepwise wrapper. Also, some predictive methods integrate soft constraints on number of features used. Examples of both of these will be seen in the section "Locating areas with gene expression".
1.28 +
1.29 +
1.30 +=== Locating areas with gene expression ===
1.31 \vspace{0.3cm}**Forward stepwise logistic regression**
1.32 -todo
1.33 +As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise logistic regression to find single genes, pairs of genes, and triplets of genes which predict areal identify. This is an example of feature selection integrated with prediction using a stepwise wrapper. Some of the single genes found were shown in previous figures, and Figure \ref{MOcombo} shows a combination of genes which was found.
1.34
1.35
1.36 \vspace{0.3cm}**SVM on all genes at once**
1.37
1.38 -In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%\footnote{5-fold cross-validation.}. As noted above, however, a classifier that looks at all the genes at once isn't practically useful.
1.39 -
1.40 -The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task combines feature selection with supervised learning.
1.41 -
1.42 +In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%\footnote{5-fold cross-validation.}. As noted above, however, a classifier that looks at all the genes at once isn't as practically useful as a classifier that uses only a few genes.
1.43
1.44
1.45 \vspace{0.3cm}**Decision trees**
1.46 @@ -396,7 +397,7 @@
1.47
1.48
1.49
1.50 -=== Specific to Aim 2 (and Aim 3) ===
1.51 +=== Data-driven redrawing of the cortical map ===
1.52
1.53 \vspace{0.3cm}**Raw dimensionality reduction results**
1.54
1.55 @@ -443,6 +444,10 @@
1.56 todo amongst other things:
1.57
1.58
1.59 +layerfinding
1.60 +
1.61 +
1.62 +
1.63
1.64 \vspace{0.3cm}**Develop algorithms that find genetic markers for anatomical regions**
1.65