cg

changeset 16:796116742ec5
.
author: bshanks@bshanks.dyndns.org
date: Sun Apr 12 03:39:30 2009 -0700 (16 years ago)
parents: 395faa66383e
children: ff9b47f2c7d3
files: grant.doc grant.html grant.odt grant.pdf grant.txt
--- a/grant.html	Sun Apr 12 02:49:55 2009 -0700
+++ b/grant.html	Sun Apr 12 03:39:30 2009 -0700
@@ -24,10 +24,10 @@
-             Machine learning terminology: supervised learning
-            The task of looking for marker genes for anatomical subregions means that one
-            is looking for a set of genes such that, if the expression level of those genes is
-            known, then the locations of the subregions can be inferred.
+            Machine learning terminology: supervised learning
+               The task of looking for marker genes for anatomical subregions means that
+            one is looking for a set of genes such that, if the expression level of those genes
+            is known, then the locations of the subregions can be inferred.
@@ -36,9 +36,9 @@
+            a function. The input to this function is a voxel, along with the gene expression
-            a function. The input to this function is a voxel, along with the gene expression
@@ -79,60 +79,60 @@
-             Principle 1: Combinatorial gene expression
-            Above, we defined an &#8220;instance&#8221; as the combination of a voxel with the &#8220;asso-
-            ciated gene expression data&#8221;.  In our case this refers to the expression level of
+               Principle 1: Combinatorial gene expression
+               Above, we defined an &#8220;instance&#8221; as the combination of a voxel with the
+            &#8220;associated gene expression data&#8221;. In our case this refers to the expression level
+            of genes within the voxel, but should we include the expression levels of all
+            genes, or only a few of them?
+               It is too much to hope that every anatomical region of interest will be iden-
-            genes within the voxel, but should we include the expression levels of all genes,
-            or only a few of them?
-               It is too much to hope that every anatomical region of interest will be iden-
-             Principle 2: Only look at combinations of small numbers of genes
-            When the classifier classifies a voxel, it is only allowed to look at the expression of
-            the genes which have been selected as features. The more data that is available
-            to a classifier, the better that it can do.  For example, perhaps there are weak
-            correlations over many genes that add up to a strong signal. So, why not include
-            every gene as a feature? The reason is that we wish to employ the classifier in
-            situations in which it is not feasible to gather data about every gene.   For
-            example, if we want to use the expression of marker genes as a trigger for some
-            regionally-targeted intervention, then our intervention must contain a molecular
-            mechanism to check the expression level of each marker gene before it triggers.
-            It is currently infeasible to design a molecular trigger that checks the level of
-            more than a handful of genes. Similarly, if the goal is to develop a procedure to
-            do ISH on tissue samples in order to label their anatomy, then it is infeasible
-            to label more than a few genes.  Therefore, we must select only a few genes as
-            features.
-             Principle 3: Use geometry in feature selection
-            When doing feature selection with score-based methods, the simplest thing to do
-            would be to score the performance of each voxel by itself and then combine these
-            scores (pointwise scoring). A more powerful approach is to also use information
-            about the geometric relations between each voxel and its neighbors; this requires
-            non-pointwise, local scoring methods.  See Preliminary Results for evidence of
-            the complementary nature of pointwise and local scoring methods.
-             Principle 4: Work in 2-D whenever possible
-            There are many anatomical structures which are commonly characterized in
+               Principle 2: Only look at combinations of small numbers of genes
+               When the classifier classifies a voxel, it is only allowed to look at the expres-
+            sion of the genes which have been selected as features.  The more data that is
+            available to a classifier, the better that it can do.  For example, perhaps there
+            are weak correlations over many genes that add up to a strong signal. So, why
+            not include every gene as a feature?  The reason is that we wish to employ
+            the classifier in situations in which it is not feasible to gather data about every
+            gene. For example, if we want to use the expression of marker genes as a trigger
+            for some regionally-targeted intervention, then our intervention must contain a
+            molecular mechanism to check the expression level of each marker gene before
+            it triggers.  It is currently infeasible to design a molecular trigger that checks
+            the level of more than a handful of genes. Similarly, if the goal is to develop a
+            procedure to do ISH on tissue samples in order to label their anatomy, then it
+            is infeasible to label more than a few genes.  Therefore, we must select only a
+            few genes as features.
+               Principle 3: Use geometry in feature selection
+               When doing feature selection with score-based methods, the simplest thing
+            to do would be to score the performance of each voxel by itself and then com-
+            bine these scores (pointwise scoring).  A more powerful approach is to also use
+            information about the geometric relations between each voxel and its neighbors;
+            this requires non-pointwise, local scoring methods. See Preliminary Results for
+            evidence of the complementary nature of pointwise and local scoring methods.
+               Principle 4: Work in 2-D whenever possible
+               There are many anatomical structures which are commonly characterized in
-                                            3
-
-             Machine learning terminology: clustering
-            If one is given a dataset consisting merely of instances, with no class labels, then
-            analysis of the dataset is referred to as unsupervised learning in the jargon of
-            machine learning.  One thing that you can do with such a dataset is to group
+            Machine learning terminology: clustering
+               If one is given a dataset consisting merely of instances, with no class labels,
+            then analysis of the dataset is referred to as unsupervised learning in the jargon
+            of machine learning. One thing that you can do with such a dataset is to group
+                                            3
+
@@ -144,15 +144,15 @@
-             Similarity scores
-            todo
-             Spatially contiguous clusters; image segmentation
-            We have shown that aim 2 is a type of clustering task. In fact, it is a special type
-            of clustering task because we have an additional constraint on clusters; voxels
-            grouped together into a cluster must be spatially contiguous.  In Preliminary
-            Results,  we show that one can get reasonable results without enforcing this
-            constraint, however, we plan to compare these results against other methods
-            which guarantee contiguous clusters.
+               Similarity scores
+               todo
+               Spatially contiguous clusters; image segmentation
+               We have shown that aim 2 is a type of clustering task.   In fact,  it is a
+            special type of clustering task because we have an additional constraint on
+            clusters; voxels grouped together into a cluster must be spatially contiguous.
+            In Preliminary Results, we show that one can get reasonable results without
+            enforcing this constraint, however, we plan to compare these results against
+            other methods which guarantee contiguous clusters.
@@ -164,15 +164,13 @@
-                                            4
-
-             Dimensionality reduction
-            Unlike aim 1, there is no externally-imposed need to select only a handful of
-            informative genes for inclusion in the instances.  However, some clustering al-
-            gorithms perform better on small numbers of features.  There are techniques
+               Dimensionality reduction
+               Unlike aim 1, there is no externally-imposed need to select only a handful
+            of informative genes for inclusion in the instances.  However, some clustering
+            algorithms perform better on small numbers of features.  There are techniques
@@ -181,6 +179,8 @@
+                                            4
+
@@ -194,8 +194,8 @@
-             Clustering genes rather than voxels
-            Although the ultimate goal is to cluster the instances (voxels or pixels), one
+               Clustering genes rather than voxels
+               Although the ultimate goal is to cluster the instances (voxels or pixels), one
@@ -206,24 +206,27 @@
-__________________________
-   1This would seem to contradict our finding in aim 1 that some cortical areas are combina-
-torially coded by multiple genes.  However, it is possible that the currently accepted cortical
-                                            5
-
-             Background
-            The cortex is divided into areas and layers.  To a first approximation, the par-
-            cellation of the cortex into areas can be drawn as a 2-D map on the surface
-            of the cortex.  In the third dimension, the boundaries between the areas con-
-            tinue downwards into the cortical depth, perpendicular to the surface. The layer
+            Background
+               The cortex is divided into areas and layers.  To a first approximation, the
+            parcellation of the cortex into areas can be drawn as a 2-D map on the surface of
+            the cortex.  In the third dimension, the boundaries between the areas continue
+            downwards into the cortical depth,  perpendicular to the surface.   The layer
+___
+   1This would seem to contradict our finding in aim 1 that some cortical areas are combina-
+torially coded by multiple genes.  However, it is possible that the currently accepted cortical
+maps divide the cortex into subregions which are unnatural from the point of view of gene
+expression; perhaps there is some other way to map the cortex for which each subregion can
+be identified by single genes.
+                                            5
+
@@ -237,8 +240,8 @@
-             Significance
-            The method developed in aim (1) will be applied to each cortical area to find
+               Significance
+               The method developed in aim (1) will be applied to each cortical area to find
@@ -250,12 +253,6 @@
-______
-maps divide the cortex into subregions which are unnatural from the point of view of gene
-expression; perhaps there is some other way to map the cortex for which each subregion can
-be identified by single genes.
-                                            6
-
@@ -272,17 +269,19 @@
+                                            6
+
-             Using combinations of multiple genes is necessary and sufficient to
+               Using combinations of multiple genes is necessary and sufficient to
-            Here we give an example of a cortical area which is not marked by any single
-            gene, but which can be identified combinatorially.  according to logistic regres-
-            sion, gene wwc12 is the best fit single gene for predicting whether or not a pixel
-            on the cortical surface belongs to the motor area (area MO). The upper-left
+               Here we give an example of a cortical area which is not marked by any
+            single gene, but which can be identified combinatorially.  according to logistic
+            regression, gene wwc12 is the best fit single gene for predicting whether or not a
+            pixel on the cortical surface belongs to the motor area (area MO). The upper-left
@@ -291,9 +290,31 @@
+            in these two figures, we get the lower-left of Figure . This combination captures
+            area MO much better than any single gene.
+               Geometric and pointwise scoring methods provide complementary
+            information
+               To show that local geometry can provide useful information that cannot be
+            detected via pointwise analyses, consider Fig. . The top row of Fig.  displays the
+            3 genes which most match area AUD, according to a pointwise method4.  The
+            bottom row displays the 3 genes which most match AUD according to a method
+            which considers local geometry5 The pointwise method in the top row identifies
+            genes which express more strongly in AUD than outside of it; its weakness is that
+            this includes many areas which don&#8217;t have a salient border matching the areal
+            border. The geometric method identifies genes whose salient expression border
+            seems to partially line up with the border of AUD; its weakness is that this
+            includes genes which don&#8217;t express over the entire area. Genes which have high
+            rankings using both pointwise and border criteria, such as Aph1a in the example,
+    4For each gene, a logistic regression in which the response variable was whether or not a
+surface pixel was within area AUD, and the predictor variable was the value of the expression
+of the gene underneath that pixel. The resulting scores were used to rank the genes in terms
+of how well they predict area AUD.
+    5For each gene the gradient similarity (see section ??) between (a) a map of the expression
+of each gene on the cortical surface and (b) the shape of area AUD, was calculated, and this
+was used to rank the genes.
@@ -306,24 +327,6 @@
-            in these two figures, we get the lower-left of Figure . This combination captures
-            area MO much better than any single gene.
-             Geometric and pointwise scoring methods provide complementary
-            information
-            To show that local geometry can provide useful information that cannot be
-            detected via pointwise analyses, consider Fig.  .  The top row of Fig.   displays
-            the 3 genes which most match area AUD, according to a pointwise method4. The
-            bottom row displays the 3 genes which most match AUD according to a method
-            which considers local geometry5 The pointwise method in the top row identifies
-            genes which express more strongly in AUD than outside of it; its weakness is that
-__________________________
-   4For each gene, a logistic regression in which the response variable was whether or not a
-surface pixel was within area AUD, and the predictor variable was the value of the expression
-of the gene underneath that pixel. The resulting scores were used to rank the genes in terms
-of how well they predict area AUD.
-    5For each gene the gradient similarity (see section ??) between (a) a map of the expression
-of each gene on the cortical surface and (b) the shape of area AUD, was calculated, and this
-was used to rank the genes.
@@ -333,47 +336,43 @@
-            this includes many areas which don&#8217;t have a salient border matching the areal
-            border. The geometric method identifies genes whose salient expression border
-            seems to partially line up with the border of AUD; its weakness is that this
-            includes genes which don&#8217;t express over the entire area. Genes which have high
-            rankings using both pointwise and border criteria, such as Aph1a in the example,
-             Areas which can be identified by single genes
-            todo
+               Areas which can be identified by single genes
+               todo
-             SVM on all genes at once
-            In order to see how well one can do when looking at all genes at once, we ran
-            a support vector machine to classify cortical surface pixels based on their gene
-            expression profiles. We achieved classification accuracy of about 81%6. As noted
-            above, however, a classifier that looks at all the genes at once isn&#8217;t practically
-            useful.
-_____________________
+            SVM on all genes at once
+               In order to see how well one can do when looking at all genes at once, we
+            ran a support vector machine to classify cortical surface pixels based on their
+            gene expression profiles.  We achieved classification accuracy of about 81%6.
+            As noted above, however, a classifier that looks at all the genes at once isn&#8217;t
+            practically useful.
+               The requirement to find combinations of only a small number of genes limits
+            us from straightforwardly applying many of the most simple techniques from
+            the field of supervised machine learning.  In the parlance of machine learning,
+            our task combines feature selection with supervised learning.
+               Decision trees
+               todo
+____________________
-               The requirement to find combinations of only a small number of genes limits
-            us from straightforwardly applying many of the most simple techniques from
-            the field of supervised machine learning.  In the parlance of machine learning,
-            our task combines feature selection with supervised learning.
-             Decision trees
-            todo
-             Many areas are captured by clusters of genes
-            todo
+            Many areas are captured by clusters of genes
+               todo
-            Develop algorithms that find genetic markers for anatomical regions
+               Develop algorithms that find genetic markers for anatomical re-
+            gions
@@ -391,15 +390,15 @@
-                                            10
-
-             Apply these algorithms to the cortex
+               Apply these algorithms to the cortex
+                                            10
+
@@ -408,7 +407,7 @@
-            Develop algorithms to suggest a division of a structure into anatom-
+               Develop algorithms to suggest a division of a structure into anatom-
@@ -424,24 +423,22 @@
+    Principle 4: Work in 2-D whenever possible
+    In anatomy, the manifold of interest is usually either defined by a combina-
+tion of two relevant anatomical axes (todo), or by the surface of the structure
+(as is the case with the cortex).  In the former case, the manifold of interest is
+a plane, but in the latter case it is curved. If the manifold is curved, there are
+various methods for mapping the manifold into a plane.
+    The method that we will develop will begin by mapping the data into a
+2-D plane.  Although the manifold that characterized cortical areas is known
+to be the cortical surface, it remains to be seen which method of mapping the
+manifold into a plane is optimal for this application. We will compare mappings
+which attempt to preserve size (such as the one used by Caret??) with mappings
+which preserve angle (conformal maps).
+    Although there is much 2-D organization in anatomy, there are also struc-
+tures whose shape is fundamentally 3-dimensional.  If possible, we would like
+the method we develop to include a statistical test that warns the user if the
+assumption of 2-D structure seems to be wrong.
-             Principle 4: Work in 2-D whenever possible
-            In anatomy, the manifold of interest is usually either defined by a combination
-            of two relevant anatomical axes (todo), or by the surface of the structure (as is
-            the case with the cortex). In the former case, the manifold of interest is a plane,
-            but in the latter case it is curved.  If the manifold is curved, there are various
-            methods for mapping the manifold into a plane.
-               The method that we will develop will begin by mapping the data into a
-            2-D plane.  Although the manifold that characterized cortical areas is known
-            to be the cortical surface, it remains to be seen which method of mapping the
-            manifold into a plane is optimal for this application. We will compare mappings
-            which attempt to preserve size (such as the one used by Caret??) with mappings
-            which preserve angle (conformal maps).
-               Although there is much 2-D organization in anatomy, there are also struc-
-            tures whose shape is fundamentally 3-dimensional.  If possible, we would like
-            the method we develop to include a statistical test that warns the user if the
-            assumption of 2-D structure seems to be wrong.
-                                            12
-
-
+
--- a/grant.txt	Sun Apr 12 02:49:55 2009 -0700
+++ b/grant.txt	Sun Apr 12 03:39:30 2009 -0700
@@ -15,7 +15,9 @@
-==== Machine learning terminology: supervised learning ====
+
+**Machine learning terminology: supervised learning**
+
@@ -34,20 +36,28 @@
-==== Principle 1: Combinatorial gene expression ====
+
+**Principle 1: Combinatorial gene expression**
+
-==== Principle 2: Only look at combinations of small numbers of genes ====
+
+**Principle 2: Only look at combinations of small numbers of genes**
+
-==== Principle 3: Use geometry in feature selection ====
+
+**Principle 3: Use geometry in feature selection**
+
-==== Principle 4: Work in 2-D whenever possible ====
+
+**Principle 4: Work in 2-D whenever possible**
+
@@ -55,30 +65,40 @@
-==== Machine learning terminology: clustering ====
+
+**Machine learning terminology: clustering**
+
-==== Similarity scores ====
-
-todo
-
-==== Spatially contiguous clusters; image segmentation ====
+
+**Similarity scores**
+
+
+todo
+
+
+**Spatially contiguous clusters; image segmentation**
+
-==== Dimensionality reduction ====
+
+**Dimensionality reduction**
+
-==== Clustering genes rather than voxels ====
+
+**Clustering genes rather than voxels**
+
@@ -91,7 +111,9 @@
-==== Background ====
+
+**Background**
+
@@ -100,7 +122,9 @@
-==== Significance ====
+
+**Significance**
+
@@ -125,7 +149,9 @@
-==== Using combinations of multiple genes is necessary and sufficient to delineate some cortical areas ====
+
+**Using combinations of multiple genes is necessary and sufficient to delineate some cortical areas**
+
@@ -139,7 +165,9 @@
-==== Geometric and pointwise scoring methods provide complementary information ====
+
+**Geometric and pointwise scoring methods provide complementary information**
+
@@ -156,20 +184,26 @@
-==== Areas which can be identified by single genes ====
+
+**Areas which can be identified by single genes**
+
-==== SVM on all genes at once ====
+
+**SVM on all genes at once**
+
-==== Decision trees ====
+
+**Decision trees**
+
@@ -181,7 +215,9 @@
-==== Many areas are captured by clusters of genes ====
+
+**Many areas are captured by clusters of genes**
+
@@ -202,20 +238,26 @@
-==== Develop algorithms that find genetic markers for anatomical regions ====
+
+**Develop algorithms that find genetic markers for anatomical regions**
+
-==== Apply these algorithms to the cortex ====
+
+**Apply these algorithms to the cortex**
+
-==== Develop algorithms to suggest a division of a structure into anatomical parts ====
+
+**Develop algorithms to suggest a division of a structure into anatomical parts**
+
@@ -231,7 +273,9 @@
-==== Principle 4: Work in 2-D whenever possible ====
+
+**Principle 4: Work in 2-D whenever possible**
+
author	bshanks@bshanks.dyndns.org
date	Sun Apr 12 03:39:30 2009 -0700 (16 years ago)
parents	395faa66383e
children	ff9b47f2c7d3
files	grant.doc grant.html grant.odt grant.pdf grant.txt