# HG changeset patch
# User bshanks@bshanks.dyndns.org
# Date 1239618121 25200
# Node ID 9d0cc9c66ecd7c601d96b8943a7943b5576a4661
# Parent  8ff9b7b5c242e199c20553dcf430b309f77bdc44
.

--- a/grant.html	Mon Apr 13 03:21:04 2009 -0700
+++ b/grant.html	Mon Apr 13 03:22:01 2009 -0700
@@ -22,6 +22,8 @@
                All algorithms that we develop will be implemented in an open-source soft-
             ware toolkit.  The toolkit, as well as the machine-readable datasets developed
             in aim (3), will be published and freely available for others to use.
+                                            1
+
              Background and significance
              Aim 1
             Machine learning terminology: supervised learning
@@ -35,8 +37,6 @@
             this a classification task, because each voxel is being assigned to a class (namely,
             its subregion).
                Therefore, an understanding of the relationship between the combination of
-                                            1
-
             their expression levels and the locations of the subregions may be expressed as
             a function. The input to this function is a voxel, along with the gene expression
             levels within that voxel;  the output is the subregional identity of the target
@@ -68,6 +68,8 @@
             procedures are called &#8220;stepwise&#8221; or &#8220;greedy&#8221;.
                Although the classifier itself may only look at the gene expression data within
             each voxel before classifying that voxel, the learning algorithm which constructs
+                                            2
+
             the classifier may look over the entire dataset.  We can categorize score-based
             feature selection methods depending on how the score of calculated.   Often
             the score calculation consists of assigning a sub-score to each voxel, and then
@@ -83,8 +85,6 @@
                Above, we defined an &#8220;instance&#8221; as the combination of a voxel with the
             &#8220;associated gene expression data&#8221;. In our case this refers to the expression level
             of genes within the voxel, but should we include the expression levels of all
-                                            2
-
             genes, or only a few of them?
                It is too much to hope that every anatomical region of interest will be iden-
             tified by a single gene. For example, in the cortex, there are some areas which
@@ -116,6 +116,8 @@
             evidence of the complementary nature of pointwise and local scoring methods.
                Principle 4: Work in 2-D whenever possible
                There are many anatomical structures which are commonly characterized in
+                                            3
+
             terms of a two-dimensional manifold. When it is known that the structure that
             one is looking for is two-dimensional, the results may be improved by allowing
             the analysis algorithm to take advantage of this prior knowledge.  In addition,
@@ -128,8 +130,6 @@
             of machine learning. One thing that you can do with such a dataset is to group
             instances together. A set of similar instances is called a cluster, and the activity
             of finding grouping the data into clusters is called clustering or cluster analysis.
-                                            3
-
                The task of deciding how to carve up a structure into anatomical subregions
             can be put into these terms.  The instances are once again voxels (or pixels)
             along with their associated gene expression profiles.  We make the assumption
@@ -162,6 +162,8 @@
             image into clusters, usually contiguous clusters.  Aim 2 is similar to an image
             segmentation task. There are two main differences; in our task, there are thou-
             sands of color channels (one for each gene), rather than just three.  There are
+                                            4
+
             imaging tasks which use more than three colors, however, for example multispec-
             tral imaging and hyperspectral imaging, which are often used to process satellite
             imagery. A more crucial difference is that there are various cues which are ap-
@@ -176,8 +178,6 @@
             algorithms perform better on small numbers of features.  There are techniques
             which &#8220;summarize&#8221; a larger number of features using a smaller number of fea-
             tures; these techniques go by the name of feature extraction or dimensionality
-                                            4
-
             reduction.  The small set of features that such a technique yields is called the
             reduced feature set. After the reduced feature set is created, the instances may
             be replaced by reduced instances, which have as their features the reduced fea-
@@ -208,6 +208,8 @@
             This is because many genes have an expression pattern which seems to pick
             out a single, spatially continguous subregion. Therefore, it seems likely that an
             anatomically interesting subregion will have multiple genes which each individ-
+                                            5
+
             ually pick it out1. This suggests the following procedure: cluster together genes
             which pick out similar subregions, and then to use the more popular common
             subregions as the final clusters. In the Preliminary Data we show that a num-
@@ -216,14 +218,6 @@
             this fashion.
              Aim 3
             Background
-_______________
-   1This would seem to contradict our finding in aim 1 that some cortical areas are combina-
-torially coded by multiple genes.  However, it is possible that the currently accepted cortical
-maps divide the cortex into subregions which are unnatural from the point of view of gene
-expression; perhaps there is some other way to map the cortex for which each subregion can
-be identified by single genes.
-                                            5
-
                The cortex is divided into areas and layers.  To a first approximation, the
             parcellation of the cortex into areas can be drawn as a 2-D map on the surface of
             the cortex.  In the third dimension, the boundaries between the areas continue
@@ -254,6 +248,14 @@
             finding markers for each individual cortical areas, we will find a small panel
             of genes that can find many of the areal boundaries at once.  This panel of
             marker genes will allow the development of an ISH protocol that will allow
+__________________________
+   1This would seem to contradict our finding in aim 1 that some cortical areas are combina-
+torially coded by multiple genes.  However, it is possible that the currently accepted cortical
+maps divide the cortex into subregions which are unnatural from the point of view of gene
+expression; perhaps there is some other way to map the cortex for which each subregion can
+be identified by single genes.
+                                            6
+
             experimenters to more easily identify which anatomical areas are present in
             small samples of cortex.
                The method developed in aim (3) will provide a genoarchitectonic viewpoint
@@ -269,8 +271,6 @@
                While we do not here propose to analyze human gene expression data, it is
             conceivable that the methods we propose to develop could be used to suggest
             modifications to the human cortical map as well.
-                                            6
-
              Related work
             There does not appear to be much work on the automated analysis of spatial
             gene expression data.
@@ -297,23 +297,26 @@
             yielded impressive results, proving the usefulness of such research. We have run
             NNMF on the cortical dataset and while the results are promising (see Prelim-
             inary Data), we think that it will be possible to find a better method2 (we also
+__________________________
+   2We ran &#8220;vanilla&#8221; NNMF, whereas the paper under discussion used a modified method.
+Their main modification consisted of adding a soft spatial contiguity constraint.  However,
+on our dataset,  NNMF naturally produced spatially contiguous clusters,  so no additional
+                                            7
+
             think that more automation of the parts that this paper&#8217;s authors did manually
             will be possible).
                and [?] describes AGEA. todo
+__________________________
+constraint was needed. The paper under discussion mentions that they also tried a hierarchial
+variant of NNMF, but since they didn&#8217;t report its results, we assume that those result were
+not any more impressive than the results of the non-hierarchial variant.
+                                            8
+
              Preliminary work
              Format conversion between SEV, MATLAB, NIFTI
             todo
              Flatmap of cortex
             todo
-_______________________
-   2We ran &#8220;vanilla&#8221; NNMF, whereas the paper under discussion used a modified method.
-Their main modification consisted of adding a soft spatial contiguity constraint.  However,
-on our dataset,  NNMF naturally produced spatially contiguous clusters,  so no additional
-constraint was needed. The paper under discussion mentions that they also tried a hierarchial
-variant of NNMF, but since they didn&#8217;t report its results, we assume that those result were
-not any more impressive than the results of the non-hierarchial variant.
-                                            7
-
                Using combinations of multiple genes is necessary and sufficient to
             delineate some cortical areas
                Here we give an example of a cortical area which is not marked by any
@@ -343,15 +346,7 @@
             genes which express more strongly in AUD than outside of it; its weakness is that
             this includes many areas which don&#8217;t have a salient border matching the areal
             border. The geometric method identifies genes whose salient expression border
-            seems to partially line up with the border of AUD; its weakness is that this
-            includes genes which don&#8217;t express over the entire area. Genes which have high
-            rankings using both pointwise and border criteria, such as Aph1a in the example,
-            may be particularly good markers.   None of these genes are,  individually,  a
-            perfect marker for AUD; we deliberately chose a &#8220;difficult&#8221; area in order to
-            better contrast pointwise with geometric methods.
-               Areas which can be identified by single genes
-               todo
-____________________
+__________________________
    3&#8220;WW, C2 and coiled-coil domain containing 1&#8221;; EntrezGene ID 211652
     4&#8220;mitochondrial translational initiation factor 2&#8221;; EntrezGene ID 76784
     5For each gene, a logistic regression in which the response variable was whether or not a
@@ -361,7 +356,7 @@
     6For each gene the gradient similarity (see section ??) between (a) a map of the expression
 of each gene on the cortical surface and (b) the shape of area AUD, was calculated, and this
 was used to rank the genes.
-                                            8
+                                            9
 
                                         
             
@@ -373,6 +368,8 @@
             the boundary of region MO. Pixels are colored approximately according to the
             density of expressing cells underneath each pixel, with red meaning a lot of
             expression and blue meaning little.
+                                            10
+
                                                         
                                                         
             Figure 2: The top row shows the three genes which (individually) best predict
@@ -380,8 +377,14 @@
             genes which (individually) best match area AUD, according to gradient similar-
             ity. From left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a,
             Ptk7, Aph1a again, and Lepr
-                                            9
-
+            seems to partially line up with the border of AUD; its weakness is that this
+            includes genes which don&#8217;t express over the entire area. Genes which have high
+            rankings using both pointwise and border criteria, such as Aph1a in the example,
+            may be particularly good markers.   None of these genes are,  individually,  a
+            perfect marker for AUD; we deliberately chose a &#8220;difficult&#8221; area in order to
+            better contrast pointwise with geometric methods.
+               Areas which can be identified by single genes
+               todo
              Specific to Aim 1 (and Aim 3)
             Forward stepwise logistic regression todo
                SVM on all genes at once
@@ -396,6 +399,10 @@
             our task combines feature selection with supervised learning.
                Decision trees
                todo
+____________________
+   75-fold cross-validation.
+                                            11
+
              Specific to Aim 2 (and Aim 3)
             Raw dimensionality reduction results
                todo
@@ -404,6 +411,8 @@
                Many areas are captured by clusters of genes
                todo
                todo
+                                            12
+
              Research plan
             todo amongst other things:
                Develop algorithms that find genetic markers for anatomical re-
@@ -419,10 +428,6 @@
                  with a handful of genes. We will consider both (a) algorithms that incre-
                  mentally/greedily combine single gene markers into sets, such as forward
                  stepwise regression and decision trees, and also (b) supervised learning
-__________________________
-   75-fold cross-validation.
-                                            10
-
                  techniques which use soft constraints to minimize the number of features,
                  such as sparse support vector machines.
               4. Extend the procedure to handle difficult areas by combining or redrawing
@@ -446,6 +451,8 @@
                  at once.
                Develop algorithms to suggest a division of a structure into anatom-
             ical parts
+                                            13
+
               1. Explore dimensionality reduction algorithms applied to pixels:  including
                  TODO
               2. Explore dimensionality reduction algorithms applied to genes:  including
@@ -457,9 +464,8 @@
                  clustering to create anatomical maps
               6. Run this algorithm on the cortex: present a hierarchial, genoarchitectonic
                  map of the cortex
-                                            11
-
-            _______________________________________________________________________________________________________ stuff i dunno where to put yet (there is more scattered through grant-
+______________________________________________
+    stuff  i  dunno  where  to  put  yet  (there  is  more  scattered  through  grant-
 oldtext):
     Principle 4: Work in 2-D whenever possible
     In anatomy, the manifold of interest is usually either defined by a combina-
@@ -484,6 +490,6 @@
 app2 has examples of genetic targeting to specific anatomical regions
     &#8212;
     note:
-                                            12
-
-
+                                            14
+
+
Binary file grant.odt has changed
Binary file grant.pdf has changed
--- a/grant.txt	Mon Apr 13 03:21:04 2009 -0700
+++ b/grant.txt	Mon Apr 13 03:22:01 2009 -0700
@@ -13,6 +13,7 @@
 All algorithms that we develop will be implemented in an open-source software toolkit. The toolkit, as well as the machine-readable datasets developed in aim (3), will be published and freely available for others to use. 
 
 
+\newpage
 
 == Background and significance ==
 
@@ -151,6 +152,8 @@
 
 
 
+\newpage
+
 == Preliminary work ==
 
 === Format conversion between SEV, MATLAB, NIFTI ===
@@ -254,6 +257,9 @@
 
 todo
 
+
+
+\newpage
 == Research plan ==
 
 todo amongst other things: