cg

changeset 89:79f51f8c878b
.
author: bshanks@bshanks.dyndns.org
date: Tue Apr 21 05:50:39 2009 -0700 (16 years ago)
parents: ae1e1da359d2
children: 9e85d264837c
files: grant.doc grant.html grant.odt grant.pdf grant.txt
--- a/grant.html	Tue Apr 21 05:38:52 2009 -0700
+++ b/grant.html	Tue Apr 21 05:50:39 2009 -0700
@@ -363,6 +363,9 @@
+_________________________________________
+  16SEV is a sparse format for spatial data. It is the format in which the ABA data is made available.
+
@@ -377,8 +380,8 @@
-                  In the Research Plan, we describe how we will automatically locate the layer depths.  For
-validation, we have manually demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
+In the Research Plan, we describe how we will automatically locate the layer depths. For validation, we have manually
+demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
@@ -387,8 +390,6 @@
-_________________________________________
-  16SEV is a sparse format for spatial data. It is the format in which the ABA data is made available.
@@ -427,47 +428,73 @@
+One might say that gradient similarity attempts to measure how much the border of the area of gene expression and
+the border of the target region overlap.  However, since gene expression falls off continuously rather than jumping from its
+maximum value to zero, the spatial pattern of a gene&#8217;s expression often does not have a discrete border. Therefore, instead
+of looking for a discrete border, we look for large gradients.  Gradient similarity is a symmetric function over two images
+(i.e. two scalar fields). It is is high to the extent that matching pixels which have large values and large gradients also have
+gradients which are oriented in a similar direction. The formula is:
+                &#x2211;
+             pixel<img src="cmsy7-32.png" alt="&#x2208;" />pixels cos(abs(&#x2220;&#x2207;1 -&#x2220;&#x2207;2)) &#x22C5;|&#x2207;1| + |&#x2207;2| 
+   2        &#x22C5; pixel_value1 + pixel_value2 
+                     2
-the corresponding pixels in the upper row).     One might say that gradient similarity attempts to measure how much the
-                               border of the area of gene expression and the border of the target region over-
-                               lap. However, since gene expression falls off continuously rather than jumping
-                               from its maximum value to zero, the spatial pattern of a gene&#8217;s expression often
-                               does not have a discrete border.  Therefore, instead of looking for a discrete
-                               border, we look for large gradients. Gradient similarity is a symmetric function
-                               over two images (i.e. two scalar fields). It is is high to the extent that matching
-                               pixels which have large values and large gradients also have gradients which
-                               are oriented in a similar direction. The formula is:
-                                   &#x2211;
-                                pixel<img src="cmsy7-32.png" alt="&#x2208;" />pixels cos(abs(&#x2220;&#x2207;1 -&#x2220;&#x2207;2)) &#x22C5;|&#x2207;1| + |&#x2207;2| 
-   2        &#x22C5; pixel_value1 + pixel_value2 
-                     2
-                                  where &#x2207;1 and &#x2207;2 are the gradient vectors of the two images at the current
+the corresponding pixels in the upper row).     where &#x2207;1 and &#x2207;2 are the gradient vectors of the two images at the current
-oriented in a similar direction (because the borders are similar).
-Most of the genes in Figure 5 were identified via gradient similarity.
-Gradient similarity provides information complementary to correlation
-To show that gradient similarity can provide useful information that cannot be detected via pointwise analyses, consider
-Fig. 3. The top row of Fig.  3 displays the 3 genes which most match area AUD, according to a pointwise method17.  The
+                               oriented in a similar direction (because the borders are similar).
+                                  Most of the genes in Figure 5 were identified via gradient similarity.
+                                  Gradient similarity provides information complementary to cor-
+                               relation
+                                  To show that gradient similarity can provide useful information that cannot
+                               be detected via pointwise analyses, consider Fig.  3.  The top row of Fig.  3
+                               displays the 3 genes which most match area AUD, according to a pointwise
+                               method17.  The bottom row displays the 3 genes which most match AUD ac-
+                               cording to a method which considers local geometry18  The pointwise method
+                               in the top row identifies genes which express more strongly in AUD than out-
+                               side of it; its weakness is that this includes many areas which don&#8217;t have a
+                               salient border matching the areal border.  The geometric method identifies
+genes whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes
+genes which don&#8217;t express over the entire area.  Genes which have high rankings using both pointwise and border criteria,
+such as Aph1a in the example, may be particularly good markers.  None of these genes are, individually, a perfect marker
+for AUD; we deliberately chose a &#8220;difficult&#8221; area in order to better contrast pointwise with geometric methods.
+Areas which can be identified by single genes Using gradient similarity, we have already found single genes which
+roughly identify some areas and groupings of areas. For each of these areas, an example of a gene which roughly identifies
+it is shown in Figure 5. We have not yet cross-verified these genes in other atlases.
+In addition, there are a number of areas which are almost identified by single genes:  COAa+NLOT (anterior part of
+cortical amygdalar area, nucleus of the lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate), VIS
+(visual), AUD (auditory).
+These results validate our expectation that the ABA dataset can be exploited to find marker genes for many cortical
+areas, while also validating the relevancy of our new scoring method, gradient similarity.
+Combinations of multiple genes are useful and necessary for some areas
+In Figure 4, we give an example of a cortical area which is not marked by any single gene, but which can be identified
+combinatorially.  Acccording to logistic regression, gene wwc1 is the best fit single gene for predicting whether or not a
+pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure 4 shows wwc1&#8217;s spatial
+expression pattern over the cortex.  The lower-right boundary of MO is represented reasonably well by this gene, but the
+gene overshoots the upper-left boundary.  This flattened 2-D representation does not show it, but the area corresponding
+to the overshoot is the medial surface of the cortex.  MO is only found on the dorsal surface.  Gene mtif2 is shown in the
+upper-right. Mtif2 captures MO&#8217;s upper-left boundary, but not its lower-right boundary. Mtif2 does not express very much
+on the medial surface.  By adding together the values at each pixel in these two figures, we get the lower-left image.  This
+combination captures area MO much better than any single gene.
+This shows that our proposal to develop a method to find combinations of marker genes is both possible and necessary.
+Feature selection integrated with prediction As noted earlier, in general, any predictive method can be used for
+feature selection by running it inside a stepwise wrapper. Also, some predictive methods integrate soft constraints on number
+of features used. Examples of both of these will be seen in the section &#8220;Multivariate Predictive methods&#8221;.
-bottom row displays the 3 genes which most match AUD according to a method which considers local geometry18  The
-pointwise method in the top row identifies genes which express more strongly in AUD than outside of it; its weakness is
-that this includes many areas which don&#8217;t have a salient border matching the areal border. The geometric method identifies
-genes whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes
-genes which don&#8217;t express over the entire area.  Genes which have high rankings using both pointwise and border criteria,
-such as Aph1a in the example, may be particularly good markers.  None of these genes are, individually, a perfect marker
-for AUD; we deliberately chose a &#8220;difficult&#8221; area in order to better contrast pointwise with geometric methods.
+   18For each gene the gradient similarity between (a) a map of the expression of each gene on the cortical surface and (b) the shape of area AUD,
+was calculated, and this was used to rank the genes.
+
@@ -491,56 +518,7 @@
-Ets1.                                  Areas which can be identified by single genes Using gradient simi-
-                               larity, we have already found single genes which roughly identify some areas
-                               and groupings of areas.  For each of these areas, an example of a gene which
-                               roughly identifies it is shown in Figure 5. We have not yet cross-verified these
-                               genes in other atlases.
-                                  In addition, there are a number of areas which are almost identified by single
-                               genes: COAa+NLOT (anterior part of cortical amygdalar area, nucleus of the
-                               lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate),
-                               VIS (visual), AUD (auditory).
-                                  These results validate our expectation that the ABA dataset can be ex-
-                               ploited to find marker genes for many cortical areas, while also validating the
-                               relevancy of our new scoring method, gradient similarity.
-                                  Combinations of multiple genes are useful and necessary for some
-                               areas
-                                  In Figure 4, we give an example of a cortical area which is not marked by
-                               any single gene, but which can be identified combinatorially.  Acccording to
-                               logistic regression, gene wwc1 is the best fit single gene for predicting whether
-                               or not a pixel on the cortical surface belongs to the motor area (area MO).
-                               The upper-left picture in Figure 4 shows wwc1&#8217;s spatial expression pattern over
-                               the cortex. The lower-right boundary of MO is represented reasonably well by
-                               this gene, but the gene overshoots the upper-left boundary. This flattened 2-D
-                               representation does not show it, but the area corresponding to the overshoot is
-                               the medial surface of the cortex. MO is only found on the dorsal surface. Gene
-                               mtif2 is shown in the upper-right.  Mtif2 captures MO&#8217;s upper-left boundary,
-                               but not its lower-right boundary.  Mtif2 does not express very much on the
-                               medial surface. By adding together the values at each pixel in these two figures,
-                               we get the lower-left image. This combination captures area MO much better
-                               than any single gene.
-                                  This shows that our proposal to develop a method to find combinations of
-                               marker genes is both possible and necessary.
-                                  Feature selection integrated with prediction As noted earlier, in gen-
-                               eral, any predictive method can be used for feature selection by running it
-                               inside a stepwise wrapper.  Also, some predictive methods integrate soft con-
-                               straints on number of features used. Examples of both of these will be seen in
-                               the section &#8220;Multivariate Predictive methods&#8221;.
-                                Multivariate Predictive methods
-                               Forward  stepwise  logistic  regression  Logistic  regression  is  a  popular
-                               method for predictive modeling of categorial data.  As a pilot run, for five
-                               cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise
-                               logistic regression to find single genes, pairs of genes, and triplets of genes
-                               which predict areal identify. This is an example of feature selection integrated
-                               with prediction using a stepwise wrapper.   Some of the single genes found
-                               were shown in various figures throughout this document, and Figure 4 shows
-                               a combination of genes which was found.
-                                  We felt that,  for single genes,  gradient similarity did a better job than
-                               logistic regression at capturing our subjective impression of a &#8220;good gene&#8221;.
-_________________________________________
-  18For each gene the gradient similarity between (a) a map of the expression of each gene on the cortical surface and (b) the shape of area AUD,
-was calculated, and this was used to rank the genes.
-
+Ets1.                                Multivariate Predictive methods
@@ -553,7 +531,24 @@
-were used; for landmark Isomap, 7 dimensions were used.                  SVM on all genes at once
+were used; for landmark Isomap, 7 dimensions were used.                Forward  stepwise  logistic  regression Lo-
+                                                         gistic regression is a popular method for pre-
+                                                         dictive modeling of categorial data.  As a pi-
+                                                         lot run, for five cortical areas (SS, AUD, RSP,
+                                                         VIS, and MO), we performed forward stepwise
+                                                         logistic regression to find single genes, pairs of
+                                                         genes, and triplets of genes which predict areal
+                                                         identify.  This is an example of feature selec-
+                                                         tion integrated with prediction using a stepwise
+                                                         wrapper.  Some of the single genes found were
+                                                         shown in various figures throughout this doc-
+                                                         ument,  and Figure 4 shows a combination of
+                                                         genes which was found.
+                                                           We felt that, for single genes, gradient simi-
+                                                         larity did a better job than logistic regression at
+                                                         capturing our subjective impression of a &#8220;good
+                                                         gene&#8221;.
+                                                           SVM on all genes at once
@@ -567,46 +562,35 @@
-                                                         We  have  applied  the  following  dimensional-
-                                                         ity reduction algorithms to reduce the dimen-
-                                                         sionality of the gene expression profile associ-
-                                                         ated  with  each  voxel:  Principal  Components
-                                                         Analysis (PCA), Simple PCA (SPCA), Multi-
-                                                         Dimensional  Scaling  (MDS),  Isomap,  Land-
-                                                         mark Isomap, Laplacian eigenmaps, Local Tan-
-                                                         gent Space Alignment (LTSA), Hessian locally
-                                                         linear  embedding,  Diffusion  maps,  Stochastic
-                                                         Neighbor Embedding (SNE), Stochastic Prox-
-                                                         imity Embedding (SPE), Fast Maximum Vari-
-                                                         ance Unfolding (FastMVU), Non-negative Ma-
-                                                         trix Factorization (NNMF). Space constraints
-                                                         prevent us from showing many of the results,
-                                                         but as a sample, PCA, NNMF, and landmark
-                                                         Isomap are shown in the first, second, and third
-rows of Figure 6.
-region that most matches each prototype are overlayed.          After applying the dimensionality reduction, we ran clus-
+region that most matches each prototype are overlayed.       We have applied the following dimensionality reduction al-
+                                              gorithms to reduce the dimensionality of the gene expression
+                                              profile associated with each voxel:  Principal Components
+                                              Analysis (PCA), Simple PCA (SPCA), Multi-Dimensional
+                                              Scaling (MDS), Isomap, Landmark Isomap, Laplacian eigen-
+                                              maps, Local Tangent Space Alignment (LTSA), Hessian lo-
+                                              cally linear embedding, Diffusion maps, Stochastic Neigh-
+                                              bor  Embedding  (SNE),  Stochastic  Proximity  Embedding
+                                              (SPE),  Fast  Maximum  Variance  Unfolding  (FastMVU),
+                                              Non-negative  Matrix  Factorization  (NNMF).  Space  con-
+                                              straints prevent us from showing many of the results, but as
+                                              a sample, PCA, NNMF, and landmark Isomap are shown in
+                                              the first, second, and third rows of Figure 6.
+                                                 After applying the dimensionality reduction, we ran clus-
-                                              k-means and spectral clustering. The results of k-means af-
-                                              ter PCA, NNMF, and landmark Isomap are shown in the
-                                              last row of Figure 6.  To compare, the leftmost picture on
-                                              the bottom row of Figure 6 shows some of the major sub-
-                                              divisions of cortex.  These results clearly show that differ-
-                                              ent dimensionality reduction techniques capture different as-
-                                              pects of the data and lead to different clusterings, indicating
-                                              the utility of our proposal to produce a detailed comparion
-                                              of these techniques as applied to the domain of genomic
-                                              anatomy.
-                                                 Many areas are captured by clusters of genes We
-                                              also clustered the genes using gradient similarity to see if
-                                              the spatial regions defined by any clusters matched known
-anatomical regions.  Figure 7 shows, for ten sample gene clusters, each cluster&#8217;s average expression pattern, compared to
-a known anatomical boundary.  This suggests that it is worth attempting to cluster genes, and then to use the results to
-cluster voxels.
-_____________________________
+k-means and spectral clustering.  The results of k-means after PCA, NNMF, and landmark Isomap are shown in the last
+row of Figure 6. To compare, the leftmost picture on the bottom row of Figure 6 shows some of the major subdivisions of
+cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data
+and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparion of these techniques
+as applied to the domain of genomic anatomy.
+Many areas are captured by clusters of genes We also clustered the genes using gradient similarity to see if the
+_________________________________________
+spatial regions defined by any clusters matched known anatomical regions. Figure 7 shows, for ten sample gene clusters, each
+cluster&#8217;s average expression pattern, compared to a known anatomical boundary. This suggests that it is worth attempting
+to cluster genes, and then to use the results to cluster voxels.
@@ -647,47 +631,49 @@
+__________
+  20Already, for each cortical area, we have used the C4.5 algorithm to find a decision tree for that area. We achieved good classification accuracy
+on our training set, but the number of genes that appeared in each tree was too large.  We plan to implement a pruning procedure to generate
+trees that use fewer genes
-___
-  20Already, for each cortical area, we have used the C4.5 algorithm to find a decision tree for that area. We achieved good classification accuracy
-on our training set, but the number of genes that appeared in each tree was too large.  We plan to implement a pruning procedure to generate
-trees that use fewer genes
-&#x2219;October-November 2009: develop an automated mechanism for segmenting the cortical voxels into layers
-&#x2219;November 2009 (milestone): a preliminary automated mechanism for segmenting the cortical voxels into layers
-&#x2219;October 2009-April 2010: develop scoring methods and to test them in various supervised learning frameworks. Also
+&#x2219;September-November 2009: Develop an automated mechanism for segmenting the cortical voxels into layers
+&#x2219;November 2009 (milestone): Have completed construction of a flatmapped, cortical dataset with information for each
+layer
+&#x2219;October 2009-April 2010: Develop scoring methods and to test them in various supervised learning frameworks. Also
-&#x2219;January 2010 (milestone): submit a publication on single marker genes for cortical areas
+&#x2219;January 2010 (milestone): Submit a publication on single marker genes for cortical areas
-&#x2219;June 2010 (milestone): submit a paper describing a method fulfilling Aim 1. Release toolbox.
-&#x2219;July 2010 (milestone):  submit a paper describing combinations of marker genes for each cortical area, and a small
+&#x2219;June 2010 (milestone): Submit a paper describing a method fulfilling Aim 1. Release toolbox.
+&#x2219;July 2010 (milestone):  Submit a paper describing combinations of marker genes for each cortical area, and a small
-&#x2219;April-September 2010:  Explore dimensionality reduction algorithms for Aim 2.  Explore standard hierarchial clus-
-tering algorithms, used in combination with dimensionality reduction, for Aim 2.  Explore co-clustering algorithms.
-Think about how radial profile information can be used for Aim 2.  Adapt clustering algorithms to use radial profile
-information.
-&#x2219;January-March 2011:  Quantitatively compare the performance of different dimensionality reduction and clustering
-techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
-&#x2219;March 2011 (milestone): submit a paper describing a method fulfilling Aim 2. Release toolbox.
-&#x2219;February-May 2011:  Using the methods developed for Aim 2, explore the genomic anatomy of the cortex.  Read
-the literature and talk to people to learn about research related to unexpected and interesting discoveries.  Create
-documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 1 software toolbox.
-&#x2219;May 2011 (milestone): submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
-&#x2219;May-August 2011: revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1.
+&#x2219;April-March 2011:  Explore dimensionality reduction algorithms for Aim 2.  Explore standard hierarchial clustering
+algorithms, used in combination with dimensionality reduction, for Aim 2.  Explore co-clustering algorithms.  Think
+about how radial profile information can be used for Aim 2.  Adapt clustering algorithms to use radial profile in-
+formation.  Quantitatively compare the performance of different dimensionality reduction and clustering techniques.
+Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
+&#x2219;March 2011 (milestone): Submit a paper describing a method fulfilling Aim 2. Release toolbox.
+&#x2219;February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways
+of organizing the cortex into areas are discovered, read the literature and talk to people to learn about research related
+to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug
+reports for Aim 1 software toolbox.
+&#x2219;May 2011 (milestone): Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
+&#x2219;May-August 2011: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Follow
+up on responses to our papers. Possibly submit another paper.
--- a/grant.txt	Tue Apr 21 05:38:52 2009 -0700
+++ b/grant.txt	Tue Apr 21 05:50:39 2009 -0700
@@ -236,7 +236,6 @@
-\newpage
@@ -451,7 +450,6 @@
-\newpage
@@ -535,20 +533,20 @@
-* October-November 2009: develop an automated mechanism for segmenting the cortical voxels into layers
-* November 2009 (milestone): a preliminary automated mechanism for segmenting the cortical voxels into layers
-* October 2009-April 2010: develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised learning frameworks which use multivariate versions of the best scoring methods.
-* January 2010 (milestone): submit a publication on single marker genes for cortical areas
+* September-November 2009: Develop an automated mechanism for segmenting the cortical voxels into layers
+* November 2009 (milestone): Have completed construction of a flatmapped, cortical dataset with information for each layer
+* October 2009-April 2010: Develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised learning frameworks which use multivariate versions of the best scoring methods.
+* January 2010 (milestone): Submit a publication on single marker genes for cortical areas
-* June 2010 (milestone): submit a paper describing a method fulfilling Aim 1. Release toolbox.
-* July 2010 (milestone): submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
+* June 2010 (milestone): Submit a paper describing a method fulfilling Aim 1. Release toolbox.
+* July 2010 (milestone): Submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
-* March 2011 (milestone): submit a paper describing a method fulfilling Aim 2. Release toolbox.
-* February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. Read the literature and talk to people to learn about research related to unexpected and interesting discoveries. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 1 software toolbox.
-* May 2011 (milestone): submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
-* May-August 2011: revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. 
+* March 2011 (milestone): Submit a paper describing a method fulfilling Aim 2. Release toolbox.
+* February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, read the literature and talk to people to learn about research related to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 2 software toolbox.
+* May 2011 (milestone): Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
+* May-August 2011: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Follow up on responses to our papers. Possibly submit another paper.
author	bshanks@bshanks.dyndns.org
date	Tue Apr 21 05:50:39 2009 -0700 (16 years ago)
parents	ae1e1da359d2
children	9e85d264837c
files	grant.doc grant.html grant.odt grant.pdf grant.txt