cg

changeset 63:af5fd52f453f
.
author: bshanks@bshanks.dyndns.org
date: Sun Apr 19 15:23:53 2009 -0700 (16 years ago)
parents: ecf330fcfba3
children: 54ac7984b164
files: grant.doc grant.html grant.odt grant.pdf grant.txt
--- a/grant.html	Sun Apr 19 14:50:20 2009 -0700
+++ b/grant.html	Sun Apr 19 15:23:53 2009 -0700
@@ -70,9 +70,12 @@
+__________________________________
+   1Strictly speaking, the features are gene expression levels, but we&#8217;ll call them genes.
+The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many
+of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task
+combines feature selection with supervised learning.
-_________________________________________
-   1Strictly speaking, the features are gene expression levels, but we&#8217;ll call them genes.
@@ -117,17 +120,17 @@
+_________________________________________
+   2By &#8220;fundamentally spatial&#8221; we mean that there is information from a large number of spatial locations indexed by spatial coordinates; not
+just data which has only a few different locations or which is indexed by anatomical label.
+    3Actually, many of these projects use quadrilaterals instead of square pixels; but we will refer to them as pixels for simplicity.
+    4&#8220;Expression energy ratio&#8221;, which captures overexpression.
-___________________________
-   2By &#8220;fundamentally spatial&#8221; we mean that there is information from a large number of spatial locations indexed by spatial coordinates; not
-just data which has only a few different locations or which is indexed by anatomical label.
-    3Actually, many of these projects use quadrilaterals instead of square pixels; but we will refer to them as pixels for simplicity.
-    4&#8220;Expression energy ratio&#8221;, which captures overexpression.
@@ -224,6 +227,14 @@
+_________________________________________
+   5This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes.  However, it is
+possible that the currently accepted cortical maps divide the cortex into regions which are unnatural from the point of view of gene expression;
+perhaps there is some other way to map the cortex for which each region can be identified by single genes. Another possibility is that, although
+the cluster prototype fits an anatomical region, the individual genes are each somewhat different from the prototype.
+    6We ran &#8220;vanilla&#8221; NNMF, whereas the paper under discussion used a modified method.  Their main modification consisted of adding a soft
+spatial contiguity constraint.  However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was
+needed. The paper under discussion also mentions that they tried a hierarchial variant of NNMF, which we have not yet tried.
@@ -232,14 +243,6 @@
-_________________________________________
-   5This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes.  However, it is
-possible that the currently accepted cortical maps divide the cortex into regions which are unnatural from the point of view of gene expression;
-perhaps there is some other way to map the cortex for which each region can be identified by single genes. Another possibility is that, although
-the cluster prototype fits an anatomical region, the individual genes are each somewhat different from the prototype.
-    6We ran &#8220;vanilla&#8221; NNMF, whereas the paper under discussion used a modified method.  Their main modification consisted of adding a soft
-spatial contiguity constraint.  However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was
-needed. The paper under discussion also mentions that they tried a hierarchial variant of NNMF, which we have not yet tried.
@@ -281,13 +284,6 @@
-captured by any stain. Therefore, current ideas about cortical anatomy need to incorporate what we can learn from looking
-at the patterns of gene expression.
-While we do not here propose to analyze human gene expression data, it is conceivable that the methods we propose to
-develop could be used to suggest modifications to the human cortical map as well.
-Related work
-[10 ] describes the application of AGEA to the cortex. The paper describes interesting results on the structure of correlations
-between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either
@@ -298,6 +294,13 @@
+captured by any stain. Therefore, current ideas about cortical anatomy need to incorporate what we can learn from looking
+at the patterns of gene expression.
+While we do not here propose to analyze human gene expression data, it is conceivable that the methods we propose to
+develop could be used to suggest modifications to the human cortical map as well.
+Related work
+[10 ] describes the application of AGEA to the cortex. The paper describes interesting results on the structure of correlations
+between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either
@@ -364,7 +367,7 @@
-&#8220;Is this surface pixel a member of the target area?&#8221;.
+&#8220;Is this surface pixel a member of the target area?&#8221;. Its advantage over linear methods such as logistic regression is that it
@@ -373,6 +376,8 @@
+takes account of arbitrarily nonlinear relationships; for example, if the XOR of two variables predicts the target, conditional
+entropy would notice, whereas linear methods would not.
@@ -404,9 +409,7 @@
-In Figure 3, we give an example of a cortical area which is not marked by any single gene, but which can be identified
-combinatorially.
-____________________________
+_________________________________________
@@ -418,6 +421,38 @@
+In Figure 3, we give an example of a cortical area which is not marked by any single gene, but which can be identified
+combinatorially.
+Underexpression of a gene can serve as a marker Underexpression of a gene can sometimes serve as a marker.
+See, for example, Figure 4.
+Feature selection integrated with prediction As noted earlier, in general, any predictive method can be used for
+feature selection by running it inside a stepwise wrapper. Also, some predictive methods integrate soft constraints on number
+of features used. Examples of both of these will be seen in the section &#8220;Locating areas with gene expression&#8221;.
+Locating areas with gene expression
+Forward stepwise logistic regression As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed
+forward stepwise logistic regression to find single genes, pairs of genes, and triplets of genes which predict areal identify.
+Some of the single genes found were shown in previous figures, and Figure 3 shows a combination of genes which was found.
+SVM on all genes at once
+In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical
+surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%19. As noted above,
+however, a classifier that looks at all the genes at once isn&#8217;t as practically useful as a classifier that uses only a few genes.
+Decision trees
+todo
+Areas which can be identified by single genes
+Using all of the methods we have tried to far, we have already found single genes which roughly identify some areas and
+groupings of areas. For each of these areas, an example of a gene which roughly identifies it is shown in Figure 5. We have
+not yet cross-verified these genes in other atlases.
+In addition, there are a number of areas which are almost identified by single genes:  COAa+NLOT (anterior part of
+cortical amygdalar area, nucleus of the lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate), VIS
+(visual), AUD (auditory).
+Data-driven redrawing of the cortical map
+Raw dimensionality reduction results
+todo
+(might want to incld nnMF since mentioned above)
+Dimensionality reduction plus K-means or spectral clustering
+_________________________________________
+  195-fold cross-validation.
+
@@ -429,7 +464,6 @@
-
@@ -441,33 +475,6 @@
-Underexpression of a gene can serve as a marker Underexpression of a gene can sometimes serve as a marker.
-See, for example, Figure 4.
-Specific to Aim 1 (and Aim 3)
-Forward stepwise logistic regression todo
-SVM on all genes at once
-In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical
-surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%19. As noted above,
-however, a classifier that looks at all the genes at once isn&#8217;t practically useful.
-The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many
-of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task
-combines feature selection with supervised learning.
-Decision trees
-todo
-Areas which can be identified by single genes
-Using all of the methods we have tried to far, we have already found single genes which roughly identify some areas and
-groupings of areas. For each of these areas, an example of a gene which roughly identifies it is shown in Figure 5. We have
-not yet cross-verified these genes in other atlases.
-In addition, there are a number of areas which are almost identified by single genes:  COAa+NLOT (anterior part of
-cortical amygdalar area, nucleus of the lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate), VIS
-(visual), AUD (auditory).
-____________________
-  195-fold cross-validation.
-Specific to Aim 2 (and Aim 3)
-Raw dimensionality reduction results
-todo
-(might want to incld nnMF since mentioned above)
-Dimensionality reduction plus K-means or spectral clustering
@@ -483,6 +490,7 @@
+layerfinding
--- a/grant.txt	Sun Apr 19 14:50:20 2009 -0700
+++ b/grant.txt	Sun Apr 19 15:23:53 2009 -0700
@@ -50,6 +50,7 @@
+The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task combines feature selection with supervised learning.
@@ -289,7 +290,7 @@
-This finds pairs of genes which are most informative (at least at these discretization thresholds) relative to the question, "Is this surface pixel a member of the target area?".
+This finds pairs of genes which are most informative (at least at these discretization thresholds) relative to the question, "Is this surface pixel a member of the target area?". Its advantage over linear methods such as logistic regression is that it takes account of arbitrarily nonlinear relationships; for example, if the XOR of two variables predicts the target, conditional entropy would notice, whereas linear methods would not.
@@ -356,18 +357,18 @@
-
-=== Specific to Aim 1 (and Aim 3) ===
+\vspace{0.3cm}**Feature selection integrated with prediction**
+As noted earlier, in general, any predictive method can be used for feature selection by running it inside a stepwise wrapper. Also, some predictive methods integrate soft constraints on number of features used. Examples of both of these will be seen in the section "Locating areas with gene expression".
+
+
+=== Locating areas with gene expression ===
-todo
+As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise logistic regression to find single genes, pairs of genes, and triplets of genes which predict areal identify. This is an example of feature selection integrated with prediction using a stepwise wrapper. Some of the single genes found were shown in previous figures, and Figure \ref{MOcombo} shows a combination of genes which was found.
-In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%\footnote{5-fold cross-validation.}. As noted above, however, a classifier that looks at all the genes at once isn't practically useful. 
-
-The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task combines feature selection with supervised learning.
-
+In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%\footnote{5-fold cross-validation.}. As noted above, however, a classifier that looks at all the genes at once isn't as practically useful as a classifier that uses only a few genes. 
@@ -396,7 +397,7 @@
-=== Specific to Aim 2 (and Aim 3) ===
+=== Data-driven redrawing of the cortical map ===
@@ -443,6 +444,10 @@
+layerfinding
+
+
+
author	bshanks@bshanks.dyndns.org
date	Sun Apr 19 15:23:53 2009 -0700 (16 years ago)
parents	ecf330fcfba3
children	54ac7984b164
files	grant.doc grant.html grant.odt grant.pdf grant.txt