cg

changeset 87:f04ea2784509
.
author: bshanks@bshanks.dyndns.org
date: Tue Apr 21 05:34:25 2009 -0700 (16 years ago)
parents: aafe6f8c3593
children: ae1e1da359d2
files: grant.doc grant.html grant.odt grant.pdf grant.txt
--- a/grant.html	Tue Apr 21 04:05:54 2009 -0700
+++ b/grant.html	Tue Apr 21 05:34:25 2009 -0700
@@ -22,7 +22,17 @@
-Background and significance
+The challenge topic
+This proposal addresses challenge topic 06-HG-101. Massive new datasets obtained with techniques such as in situ hybridiza-
+tion (ISH), immunohistochemistry, in situ transgenic reporter, microarray voxelation, and others, allow the expression levels
+of many genes at many locations to be compared. Our goal is to develop automated methods to relate spatial variation in
+gene expression to anatomy. We want to find marker genes for specific anatomical regions, and also to draw new anatomical
+maps based on gene expression patterns.
+The Challenge and Potential impact
+Now we will discuss each of our three aims in turn.  For each aim, we will develop a conceptual framework for thinking
+about the task, and we will present our strategy for solving it. Next we will discuss related work. At the conclusion of each
+section, we will summarize why our strategy is different from what has been done before. At the end of this section, we will
+describe the potential impact.
@@ -62,6 +72,8 @@
+_______
+   1Strictly speaking, the features are gene expression levels, but we&#8217;ll call them genes.
@@ -75,8 +87,6 @@
-_________________________________________
-   1Strictly speaking, the features are gene expression levels, but we&#8217;ll call them genes.
@@ -115,6 +125,12 @@
+_________________________________________
+   2By &#8220;fundamentally spatial&#8221; we mean that there is information from a large number of spatial locations indexed by spatial coordinates; not
+just data which have only a few different locations or which is indexed by anatomical label.
+    3Actually, many of these projects use quadrilaterals instead of square pixels; but we will refer to them as pixels for simplicity.
+    4the number of true pixels in the intersection of the two images, divided by the number of pixels in their union.
+    5&#8220;Expression energy ratio&#8221;, which captures overexpression.
@@ -127,12 +143,6 @@
-_
-   2By &#8220;fundamentally spatial&#8221; we mean that there is information from a large number of spatial locations indexed by spatial coordinates; not
-just data which have only a few different locations or which is indexed by anatomical label.
-    3Actually, many of these projects use quadrilaterals instead of square pixels; but we will refer to them as pixels for simplicity.
-    4the number of true pixels in the intersection of the two images, divided by the number of pixels in their union.
-    5&#8220;Expression energy ratio&#8221;, which captures overexpression.
@@ -170,23 +180,19 @@
+__
+   6There are imaging tasks which use more than three colors, for example multispectral imaging and hyperspectral imaging, which are often
+used to process satellite imagery.
+    7First, because the number of features in the reduced dataset is less than in the original dataset, the running time of clustering algorithms
+may be much less. Second, it is thought that some clustering algorithms may give better results on reduced data.
-pattern which seems to pick out a single, spatially continguous region.  Therefore, it seems likely that an anatomically
+patternwhich seems to pick out a single, spatially continguous region.  Therefore, it seems likely that an anatomically
-________________________________
-   6There are imaging tasks which use more than three colors, for example multispectral imaging and hyperspectral imaging, which are often
-used to process satellite imagery.
-    7First, because the number of features in the reduced dataset is less than in the original dataset, the running time of clustering algorithms
-may be much less. Second, it is thought that some clustering algorithms may give better results on reduced data.
-    8This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes.  However, it is
-possible that the currently accepted cortical maps divide the cortex into regions which are unnatural from the point of view of gene expression;
-perhaps there is some other way to map the cortex for which each region can be identified by single genes. Another possibility is that, although
-the cluster prototype fits an anatomical region, the individual genes are each somewhat different from the prototype.
@@ -226,8 +232,17 @@
+_________________________________________
+   8This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes.  However, it is
+possible that the currently accepted cortical maps divide the cortex into regions which are unnatural from the point of view of gene expression;
+perhaps there is some other way to map the cortex for which each region can be identified by single genes. Another possibility is that, although
+the cluster prototype fits an anatomical region, the individual genes are each somewhat different from the prototype.
+    9A radial profile is a profile along a line perpendicular to the cortical surface.
+   10We ran &#8220;vanilla&#8221; NNMF, whereas the paper under discussion used a modified method.  Their main modification consisted of adding a soft
+spatial contiguity constraint.  However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was
+needed. The paper under discussion also mentions that they tried a hierarchial variant of NNMF, which we have not yet tried.
-surface. One can picture an area of the cortex as a slice of a six-layered cake11.
+surface.One can picture an area of the cortex as a slice of a six-layered cake11.
@@ -238,12 +253,6 @@
-__
-   9A radial profile is a profile along a line perpendicular to the cortical surface.
-   10We ran &#8220;vanilla&#8221; NNMF, whereas the paper under discussion used a modified method.  Their main modification consisted of adding a soft
-spatial contiguity constraint.  However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was
-needed. The paper under discussion also mentions that they tried a hierarchial variant of NNMF, which we have not yet tried.
-   11Outside of isocortex, the number of layers varies.
@@ -259,23 +268,6 @@
-Significance
-The method developed in aim (1) will be applied to each cortical area to find a set of marker genes such that the
-combinatorial expression pattern of those genes uniquely picks out the target area. Finding marker genes will be useful for
-drug discovery as well as for experimentation because marker genes can be used to design interventions which selectively
-target individual cortical areas.
-The application of the marker gene finding algorithm to the cortex will also support the development of new neuroanatom-
-ical methods. In addition to finding markers for each individual cortical areas, we will find a small panel of genes that can
-find many of the areal boundaries at once. This panel of marker genes will allow the development of an ISH protocol that
-will allow experimenters to more easily identify which anatomical areas are present in small samples of cortex.
-The method developed in aim (2) will provide a genoarchitectonic viewpoint that will contribute to the creation of a
-better map. The development of present-day cortical maps was driven by the application of histological stains. If a different
-set of stains had been available which identified a different set of features, then today&#8217;s cortical maps may have come out
-differently. It is likely that there are many repeated, salient spatial patterns in the gene expression which have not yet been
-captured by any stain. Therefore, cortical anatomy needs to incorporate what we can learn from looking at the patterns of
-gene expression.
-While we do not here propose to analyze human gene expression data, it is conceivable that the methods we propose to
-develop could be used to suggest modifications to the human cortical map as well.
@@ -289,7 +281,8 @@
-  12The sagittal data do not cover the entire cortex, and also have greater registration error[13].  Genes were selected by the Allen Institute for
+  11Outside of isocortex, the number of layers varies.
+   12The sagittal data do not cover the entire cortex, and also have greater registration error[13].  Genes were selected by the Allen Institute for
@@ -305,7 +298,30 @@
-Preliminary Studies
+Significance
+The method developed in aim (1) will be applied to each cortical area to find a set of marker genes such that the combinatorial
+expression pattern of those genes uniquely picks out the target area. Finding marker genes will be useful for drug discovery
+as well as for experimentation because marker genes can be used to design interventions which selectively target individual
+cortical areas.
+The application of the marker gene finding algorithm to the cortex will also support the development of new neuroanatom-
+ical methods. In addition to finding markers for each individual cortical areas, we will find a small panel of genes that can
+find many of the areal boundaries at once. This panel of marker genes will allow the development of an ISH protocol that
+will allow experimenters to more easily identify which anatomical areas are present in small samples of cortex.
+The method developed in aim (2) will provide a genoarchitectonic viewpoint that will contribute to the creation of a
+better map. The development of present-day cortical maps was driven by the application of histological stains. If a different
+set of stains had been available which identified a different set of features, then today&#8217;s cortical maps may have come out
+differently. It is likely that there are many repeated, salient spatial patterns in the gene expression which have not yet been
+captured by any stain. Therefore, cortical anatomy needs to incorporate what we can learn from looking at the patterns of
+gene expression.
+While we do not here propose to analyze human gene expression data, it is conceivable that the methods we propose
+to develop could be used to suggest modifications to the human cortical map as well. In fact, the methods we will develop
+will be applicable to other datasets beyond the brain.  We will provide an open-source toolbox to allow other researchers
+to easily use our methods.  With these methods, researchers with gene expression for any area of the body will be able to
+efficiently find marker genes for anatomical regions, or to use gene expression to discover new anatomical patterning.  As
+described above, marker genes have a variety of uses in the development of drugs and experimental manipulations, and in
+the anatomical characterization of tissue samples. The discovery of new ways to carve up anatomical structures into regions
+will widely impact all areas of biology.
+The approach: Preliminary Studies
@@ -591,8 +607,8 @@
-Research Design and Methods
-Flatmapping and segmentation of cortical layers**
+The approach: what we plan to do
+Flatmap and segment cortical layers
@@ -633,15 +649,48 @@
-Apply these algorithms to the cortex Using the methods developed in Aim 1, we will present, for each cortical area,
-a short list of markers to identify that area; and we will also present lists of &#8220;panels&#8221; of genes that can be used to delineate
-_________________________________________
+Apply these algorithms to the cortex
+___
-many areas at once. Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will
-identifyand explain how the statistical structure in the gene expression data led to any unexpected or interesting features
-of thesemaps.
+Using the methods developed in Aim 1, we will present, for each cortical area, a short list of markers to identify that
+area; and we will also present lists of &#8220;panels&#8221; of genes that can be used to delineate many areas at once. Using the methods
+developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify and explain how the statistical
+structure in the gene expression data led to any unexpected or interesting features of these maps.
+Timeline and milestones
+Aim 1
+&#x2219;Oct-Nov 2009: develop an automated mechanism for segmenting the cortical voxels into layers
+&#x2219;Nov 2009 (milestone): a preliminary automated mechanism for segmenting the cortical voxels into layers
+&#x2219;Oct 2009-Feb 2010:  develop scoring methods and to test them in various supervised learning frameworks.  Also test
+out various dimensionality reduction schemes in combination with supervised learning.
+&#x2219;Dec 2009-April 2010:  create or extend supervised learning frameworks which use multivariate versions of the best
+scoring methods
+&#x2219;January 2010 (milestone): submit a publication on single marker genes for cortical areas
+&#x2219;February-June 2010: explore the best way to integrate radial profiles with supervised learning. Explore the best way
+to make supervised learning techniques robust against incorrect labels (i.e. when the areas drawn on the input cortical
+map are slightly off). Quantitatively compare the performance of different supervised learning techniques.
+&#x2219;May-July 2010: Validate marker genes found in the ABA dataset by checking against other gene expression datasets
+&#x2219;June 2010: submit a paper describing a method fulfilling Aim 1
+&#x2219;July 2010:  submit a paper describing combinations of marker genes for each cortical area, and a small number of
+marker genes that can, in combination, define most of the areas at once
+&#x2219;April-July 2010: create documentation and unit tests for software toolbox for Aim 1.
+&#x2219;August 2010-: respond to user bug reports for Aim 1 software toolbox.
+Aim 2
+&#x2219;April-September 2010: explore dimensionality reduction algorithms for Aim 2
+&#x2219;June-November 2010:  explore standard hierarchial clustering algorithms, used in combination with dimensionality
+reduction, for Aim 2
+&#x2219;July-December 2010:  explore co-clustering algorithms.  Think about how radial profile information can be used for
+Aim 2. Adapt clustering algorithms to use radial profile information.
+&#x2219;January-March 2011:  Quantitatively compare the performance of different dimensionality reduction and clustering
+techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
+&#x2219;January-June 2011:  using the methods developed for Aim 2, explore the genomic anatomy of the cortex.  Read the
+literature and talk to people to learn about research related to unexpected and interesting discoveries.
+&#x2219;February-May 2011: create documentation and unit tests for software toolbox for Aim 2.
+&#x2219;June 2011-: respond to user bug reports for Aim 1 software toolbox.
+&#x2219;March 2011: submit a paper describing a method fulfilling Aim 2
+&#x2219;May 2011: submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
+&#x2219;May-August 2011: revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1.
--- a/grant.txt	Tue Apr 21 04:05:54 2009 -0700
+++ b/grant.txt	Tue Apr 21 05:34:25 2009 -0700
@@ -23,7 +23,13 @@
-== Background and significance ==
+== The challenge topic ==
+
+This proposal addresses challenge topic 06-HG-101. Massive new datasets obtained with techniques such as in situ hybridization (ISH), immunohistochemistry, in situ transgenic reporter, microarray voxelation, and others, allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated methods to relate spatial variation in gene expression to anatomy. We want to find marker genes for specific anatomical regions, and also to draw new anatomical maps based on gene expression patterns.
+
+== The Challenge and Potential impact ==
+
+Now we will discuss each of our three aims in turn. For each aim, we will develop a conceptual framework for thinking about the task, and we will present our strategy for solving it. Next we will discuss related work. At the conclusion of each section, we will summarize why our strategy is different from what has been done before. At the end of this section, we will describe the potential impact.
@@ -201,7 +207,20 @@
-\vspace{0.3cm}**Significance**
+
+=== Related work ===
+
+\cite{ng_anatomic_2009} describes the application of AGEA to the cortex. The paper describes interesting results on the structure of correlations between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either of our aims, as it neither finds marker genes, nor does it suggest a cortical map based on gene expression data. Neither of the other components of AGEA can be applied to cortical areas; AGEA's Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA's hierarchial clustering does not produce clusters corresponding to the cortical areas\footnote{In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas (there may be clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these). The reason that Gene Finder cannot the find marker genes for cortical areas is that, although the user chooses a seed voxel, Gene Finder chooses the ROI for which genes will be found, and it creates that ROI by (pairwise voxel correlation) clustering around the seed.}.
+
+
+%% Most of the projects which have been discussed have been done by the same groups that develop the public datasets. Although these projects make their algorithms available for use on their own website, none of them have released an open-source software toolkit; instead, users are restricted to using the provided algorithms only on their own dataset.
+
+In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes, (b) there has been almost no comparison of different algorithms or scoring methods, and (c) there has been no work on computationally finding marker genes for cortical areas, or on finding a hierarchial clustering that will yield a map of cortical areas de novo from gene expression data.
+
+Our project is guided by a concrete application with a well-specified criterion of success (how well we can find marker genes for \begin{latex}/\end{latex} reproduce the layout of cortical areas), which will provide a solid basis for comparing different methods.
+
+
+== Significance ==
@@ -209,28 +228,16 @@
+
-
-While we do not here propose to analyze human gene expression data, it is conceivable that the methods we propose to develop could be used to suggest modifications to the human cortical map as well.  
-
-
-=== Related work ===
-
-\cite{ng_anatomic_2009} describes the application of AGEA to the cortex. The paper describes interesting results on the structure of correlations between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either of our aims, as it neither finds marker genes, nor does it suggest a cortical map based on gene expression data. Neither of the other components of AGEA can be applied to cortical areas; AGEA's Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA's hierarchial clustering does not produce clusters corresponding to the cortical areas\footnote{In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas (there may be clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these). The reason that Gene Finder cannot the find marker genes for cortical areas is that, although the user chooses a seed voxel, Gene Finder chooses the ROI for which genes will be found, and it creates that ROI by (pairwise voxel correlation) clustering around the seed.}.
-
-
-%% Most of the projects which have been discussed have been done by the same groups that develop the public datasets. Although these projects make their algorithms available for use on their own website, none of them have released an open-source software toolkit; instead, users are restricted to using the provided algorithms only on their own dataset.
-
-In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes, (b) there has been almost no comparison of different algorithms or scoring methods, and (c) there has been no work on computationally finding marker genes for cortical areas, or on finding a hierarchial clustering that will yield a map of cortical areas de novo from gene expression data.
-
-Our project is guided by a concrete application with a well-specified criterion of success (how well we can find marker genes for \begin{latex}/\end{latex} reproduce the layout of cortical areas), which will provide a solid basis for comparing different methods.
+While we do not here propose to analyze human gene expression data, it is conceivable that the methods we propose to develop could be used to suggest modifications to the human cortical map as well. In fact, the methods we will develop will be applicable to other datasets beyond the brain. We will provide an open-source toolbox to allow other researchers to easily use our methods. With these methods, researchers with gene expression for any area of the body will be able to efficiently find marker genes for anatomical regions, or to use gene expression to discover new anatomical patterning. As described above, marker genes have a variety of uses in the development of drugs and experimental manipulations, and in the anatomical characterization of tissue samples. The discovery of new ways to carve up anatomical structures into regions will widely impact all areas of biology.
+
-
-== Preliminary Studies ==
+== The approach: Preliminary Studies ==
@@ -445,10 +452,10 @@
-== Research Design and Methods ==
-
-
-\vspace{0.3cm}**Flatmapping and segmentation of cortical layers**
+== The approach: what we plan to do ==
+
+
+\vspace{0.3cm}**Flatmap and segment cortical layers**
@@ -514,6 +521,7 @@
+
@@ -523,6 +531,34 @@
+== Timeline and milestones ==
+
+=== Aim 1 ===
+
+* Oct-Nov 2009: develop an automated mechanism for segmenting the cortical voxels into layers
+* Nov 2009 (milestone): a preliminary automated mechanism for segmenting the cortical voxels into layers
+* Oct 2009-Feb 2010: develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning.
+* Dec 2009-April 2010: create or extend supervised learning frameworks which use multivariate versions of the best scoring methods
+* January 2010 (milestone): submit a publication on single marker genes for cortical areas
+* February-June 2010: explore the best way to integrate radial profiles with supervised learning. Explore the best way to make supervised learning techniques robust against incorrect labels (i.e. when the areas drawn on the input cortical map are slightly off). Quantitatively compare the performance of different supervised learning techniques.
+* May-July 2010: Validate marker genes found in the ABA dataset by checking against other gene expression datasets 
+* June 2010: submit a paper describing a method fulfilling Aim 1
+* July 2010: submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
+* April-July 2010: create documentation and unit tests for software toolbox for Aim 1.
+* August 2010-: respond to user bug reports for Aim 1 software toolbox.
+
+=== Aim 2 ===
+* April-September 2010: explore dimensionality reduction algorithms for Aim 2
+* June-November 2010: explore standard hierarchial clustering algorithms, used in combination with dimensionality reduction, for Aim 2
+* July-December 2010: explore co-clustering algorithms. Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial profile information.
+* January-March 2011:  Quantitatively compare the performance of different dimensionality reduction and clustering techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
+* January-June 2011: using the methods developed for Aim 2, explore the genomic anatomy of the cortex. Read the literature and talk to people to learn about research related to unexpected and interesting discoveries.
+* February-May 2011:  create documentation and unit tests for software toolbox for Aim 2.
+* June 2011-: respond to user bug reports for Aim 1 software toolbox.
+* March 2011: submit a paper describing a method fulfilling Aim 2
+* May 2011: submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
+* May-August 2011: revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. 
+
author	bshanks@bshanks.dyndns.org
date	Tue Apr 21 05:34:25 2009 -0700 (16 years ago)
parents	aafe6f8c3593
children	ae1e1da359d2
files	grant.doc grant.html grant.odt grant.pdf grant.txt