cg

changeset 42:282ba15dcfbe
.
author: bshanks@bshanks.dyndns.org
date: Tue Apr 14 23:33:43 2009 -0700 (16 years ago)
parents: 34e681823d3a
children: 8cce366da1e5
files: grant.doc grant.html grant.odt grant.pdf grant.txt
--- a/grant.html	Tue Apr 14 02:53:00 2009 -0700
+++ b/grant.html	Tue Apr 14 23:33:43 2009 -0700
@@ -1,12 +1,12 @@
-Massive new datasets obtained with techniques such as in situ hybridization (ISH) and BAC-transgenics allow the expres-
-sion levels of many genes at many locations to be compared.  Our goal is to develop automated methods to relate spatial
-variation in gene expression to anatomy.  We want to find marker genes for specific anatomical regions, and also to draw
-new anatomical maps based on gene expression patterns. We have three specific aims:
+Massivenew datasets obtained with techniques such as in situ hybridization (ISH), immunohistochemistry, or in situ trans-
+genic reporter allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated
+methods to relate spatial variation in gene expression to anatomy.  We want to find marker genes for specific anatomical
+regions, and also to draw new anatomical maps based on gene expression patterns. We have three specific aims:
-(2) develop an algorithm to suggest new ways of carving up a structure into anatomical subregions, based on spatial
-patterns in gene expression
+(2) develop an algorithm to suggest new ways of carving up a structure into anatomical regions, based on spatial patterns
+in gene expression
@@ -19,17 +19,17 @@
-The task of looking for marker genes for anatomical subregions means that one is looking for a set of genes such that, if
-the expression level of those genes is known, then the locations of the subregions can be inferred.
-If we define the subregions so that they cover the entire anatomical structure to be divided, then instead of saying that we
-are using gene expression to find the locations of the subregions, we may say that we are using gene expression to determine
-to which subregion each voxel within the structure belongs.  We call this a classification task, because each voxel is being
-assigned to a class (namely, its subregion).
+The task of looking for marker genes for anatomical regions means that one is looking for a set of genes such that, if the
+expression level of those genes is known, then the locations of the regions can be inferred.
+If we define the regions so that they cover the entire anatomical structure to be divided, then instead of saying that we
+are using gene expression to find the locations of the regions, we may say that we are using gene expression to determine to
+which region each voxel within the structure belongs. We call this a classification task, because each voxel is being assigned
+to a class (namely, its region).
-the subregions may be expressed as a function. The input to this function is a voxel, along with the gene expression levels
-within that voxel; the output is the subregional identity of the target voxel, that is, the subregion to which the target voxel
-belongs. We call this function a classifier. In general, the input to a classifier is called an instance, and the output is called
-a label (or a class label).
+the regions may be expressed as a function.  The input to this function is a voxel, along with the gene expression levels
+within that voxel; the output is the regional identity of the target voxel, that is, the region to which the target voxel belongs.
+We call this function a classifier.  In general, the input to a classifier is called an instance, and the output is called a label
+(or a class label).
@@ -37,7 +37,7 @@
-(voxels) for which the labels (subregions) are known.
+(voxels) for which the labels (regions) are known.
@@ -87,16 +87,16 @@
-The task of deciding how to carve up a structure into anatomical subregions can be put into these terms. The instances
-are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels
-from the same subregion have similar gene expression profiles, at least compared to the other subregions. This means that
-clustering voxels is the same as finding potential subregions; we seek a partitioning of the voxels into subregions, that is,
-into clusters of voxels with similar gene expression.
-It is desirable to determine not just one set of subregions, but also how these subregions relate to each other, if at all;
-perhaps some of the subregions are more similar to each other than to the rest, suggesting that, although at a fine spatial
-scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large subregion.
-This suggests the outcome of clustering may be a hierarchial tree of clusters, rather than a single set of clusters which
-partition the voxels. This is called hierarchial clustering.
+The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are
+once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from
+the same region have similar gene expression profiles, at least compared to the other regions.  This means that clustering
+voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels
+with similar gene expression.
+It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps
+some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they
+could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests
+the outcome of clustering may be a hierarchial tree of clusters, rather than a single set of clusters which partition the voxels.
+This is called hierarchial clustering.
@@ -124,25 +124,24 @@
-Another use for dimensionality reduction is to visualize the relationships between subregions.  For example, one might
-want tomake a 2-D plot upon which each subregion is represented by a single point, and with the property that subregions
-with similar gene expression profiles should be nearby on the plot (that is, the property that distance between pairs of points
-in the plot should be proportional to some measure of dissimilarity in gene expression). It is likely that no arrangement of
-the points on a 2-D plan will exactly satisfy this property &#8211; however, dimensionality reduction techniques allow one to find
-arrangements of points that approximately satisfy that property.  Note that in this application, dimensionality reduction
-is being applied after clustering; whereas in the previous paragraph, we were talking about using dimensionality reduction
-before clustering.
+Another use for dimensionality reduction is to visualize the relationships between regions. For example, one might want
+to make a 2-D plot upon which each region is represented by a single point, and with the property that regions with similar
+gene expression profiles should be nearby on the plot (that is, the property that distance between pairs of points in the plot
+should be proportional to some measure of dissimilarity in gene expression). It is likely that no arrangement of the points on
+a 2-D plan will exactly satisfy this property &#8211; however, dimensionality reduction techniques allow one to find arrangements
+of points that approximately satisfy that property. Note that in this application, dimensionality reduction is being applied
+after clustering; whereas in the previous paragraph, we were talking about using dimensionality reduction before clustering.
-pattern which seems to pick out a single, spatially continguous subregion.  Therefore, it seems likely that an anatomically
-interesting subregion will have multiple genes which each individually pick it out2.  This suggests the following procedure:
-cluster together genes which pick out similar subregions, and then to use the more popular common subregions as the
-final clusters. In the Preliminary Data we show that a number of anatomically recognized cortical regions, as well as some
-&#8220;superregions&#8221; formed by lumping together a few regions, are associated with gene clusters in this fashion.
+pattern which seems to pick out a single, spatially continguous region.  Therefore, it seems likely that an anatomically
+interesting region will have multiple genes which each individually pick it out2.  This suggests the following procedure:
+cluster together genes which pick out similar regions, and then to use the more popular common regions as the final clusters.
+In the Preliminary Data we show that a number of anatomically recognized cortical regions, as well as some &#8220;superregions&#8221;
+formed by lumping together a few regions, are associated with gene clusters in this fashion.
@@ -172,13 +171,19 @@
-Significance
-The method developed in aim (1) will be applied to each cortical area to find a set of marker genes such that the
-combinatorial expression pattern of those genes uniquely picks out the target area. Finding marker genes will be useful for
+The ABA is not the only large public spatial gene expression dataset.   Other such resources include GENSAT[?],
+GenePaint[?], its sister project GeneAtlas[?], BGEM[?], EMAGE[?], EurExpress (http://www.eurexpress.org/ee/; Eur-
+Express data is also entered into EMAGE), todo. With the exception of the ABA, GenePaint, and EMAGE, most of these
+resources, have not (yet) extracted the expression intensity from the ISH images and registered the results into a single 3-D
+space, and only ABA and EMAGE make this form of data available for public download from the website.  Many of these
+resources focus on developmental gene expression.
-possible that the currently accepted cortical maps divide the cortex into subregions which are unnatural from the point of view of gene expression;
-perhaps there is some other way to map the cortex for which each subregion can be identified by single genes.
+possible that the currently accepted cortical maps divide the cortex into regions which are unnatural from the point of view of gene expression;
+perhaps there is some other way to map the cortex for which each region can be identified by single genes.
+Significance
+Themethod developed in aim (1) will be applied to each cortical area to find a set of marker genes such that the
+combinatorial expression pattern of those genes uniquely picks out the target area. Finding marker genes will be useful for
@@ -195,7 +200,6 @@
-There does not appear to be much work on the automated analysis of spatial gene expression data.
@@ -205,48 +209,63 @@
-We are aware of two existing efforts to relate spatial gene expression data to anatomy through computational methods.
+We are aware of four existing efforts to relate spatial gene expression data to anatomy through computational methods.
+[? ] refers to GeneAtlas.  GeneAtlas allows the user to construct a search query by freely demarcating one or two 2-D
+regions on sagittal slices, and then to specify either the strength of expression or the name of another gene whose expression
+pattern is to be matched. GeneAtlas differs from our Aim 1 in at least two ways. First, GeneAtlas finds only single genes,
+whereas we will also look for combinations of genes3. Second, at least for the custom spatial search, Gene Atlas appears to
+use a simple pointwise scoring method (strength of expression), whereas we will also use geometric metrics such as gradient
+similarity.
+[? ] todo
-usefulness of such research. We have run NNMF on the cortical dataset3 and while the results are promising (see Preliminary
+usefulness of such research. We have run NNMF on the cortical dataset4 and while the results are promising (see Preliminary
-yields a list of genes which are overexpressed in that cluster.
+yields a list of genes which are overexpressed in that cluster.  (note:  the ABA website also contains pre-prepared lists of
+overexpressed genes for selected structures)
-instead preferring cortical layers4.  Therefore, Gene Finder cannot be used to find marker genes for cortical areas.  Second,
-Gene Finder finds only single genes, whereas we will also look for combinations of genes5.  Third, gene finder can only use
+_________________________________________
+   3See Preliminary Data for an example of an area which cannot be marked by any single gene in the dataset, but which can be marked by a
+combination.
+    4We ran &#8220;vanilla&#8221; NNMF, whereas the paper under discussion used a modified method.  Their main modification consisted of adding a soft
+spatial contiguity constraint.  However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was
+needed. The paper under discussion also mentions that they tried a hierarchial variant of NNMF, which we have not yet tried.
+instead preferring cortical layers5.  Therefore, Gene Finder cannot be used to find marker genes for cortical areas.  Second,
+Gene Finder finds only single genes, whereas we will also look for combinations of genes6.  Third, gene finder can only use
-Finder uses a simple pointwise score6, whereas we will also use geometric metrics such as gradient similarity.
+Finder uses a simple pointwise score7, whereas we will also use geometric metrics such as gradient similarity.
+The hierarchial clustering is different from our Aim 2 in at least three ways.  First, the clustering finds clusters corre-
+sponding to layers, but no clusters corresponding to cortical areas8 9 Our Aim 2 will not be accomplished until a clustering
+is produced which yields areas.  Second, AGEA uses perhaps the simplest possible similarity score (correlation), and does
+no dimensionality reduction before calculating similarity.  While it is possible that a more complex system will not do any
+better than this, we believe further exploration of alternative methods of scoring and dimensionality reduction is warranted.
+Third, AGEA did not look at clusters of genes; in Preliminary Data we have shown that clusters of genes may identify
+intersting spatial regions such as cortical areas.
+Finally, with the except of [5], none of the publications discussed above compare the results obtained by using different
+algorithms or scoring methods.  [5] reports that both mNNMF and hierarchial mNNMF clustering were useful, and that
+hierarchial recursive bifurcation gave similar results.
+To summarize, in comparison to our Aim 1, none of the previous projects explores combinations of marker genes, and
+w/r/t both aims, there has been almost no experimentation with or comparison of different algorithms or scoring methods.
+todo
-   3We ran &#8220;vanilla&#8221; NNMF, whereas the paper under discussion used a modified method.  Their main modification consisted of adding a soft
-spatial contiguity constraint.  However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was
-needed.  The paper under discussion mentions that they also tried a hierarchial variant of NNMF, but since they didn&#8217;t report its results, we
-assume that those result were not any more impressive than the results of the non-hierarchial variant.
-    4Because of the way in which Gene Finder chooses a cluster, layers will always be preferred to areas if pairwise correlations between the gene
+   5Because of the way in which Gene Finder chooses a cluster, layers will always be preferred to areas if pairwise correlations between the gene
-    5See Preliminary Data for an example of an area which cannot be marked by any single gene in the dataset, but which can be marked by a
+    6See Preliminary Data for an example of an area which cannot be marked by any single gene in the dataset, but which can be marked by a
-    6&#8220;Expression energy ratio&#8221;, which captures overexpression.
-The hierarchial clustering is different from our Aim 2 in at least three ways.  First, the clustering finds clusters cor-
-responding to layers, but no clusters corresponding to areas7  8  Our Aim 2 will not be accomplished until a clustering is
-produced which yields areas.  Second, AGEA uses perhaps the simplest possible similarity score (correlation), and does no
-dimensionality reduction before calculating similarity. While it is possible that a more complex system will not do any better
-than this, we believe further exploration of alternative methods of scoring and dimensionality reduction is warranted. Third,
-AGEA did not look at clusters of genes; in Preliminary Data we have shown that clusters of genes may identify intersting
-spatial subregions such as cortical areas.
-_______
-   7This is for the same reason as in footnote 4.
-    8There are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area
+    7&#8220;Expression energy ratio&#8221;, which captures overexpression.
+    8This is for the same reason as in footnote 5.
+    9There are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area
@@ -255,14 +274,14 @@
-Using Caret[1], we created a mesh representation of the surface of the selected region.  For each gene, for each node of
-the mesh, we used Caret to calculate an average of the gene expression of the voxels &#8220;underneath&#8221; that mesh node.  We
-then used Caret to flatten the cortex, creating a two-dimensional mesh.
+Using Caret[1], we created a mesh representation of the surface of the selected voxels.  For each gene, for each node of
+the mesh, we calculated an average of the gene expression of the voxels &#8220;underneath&#8221; that mesh node.  We then flattened
+the cortex, creating a two-dimensional mesh.
-these manual traces into Caret-format regional boundary data on the mesh surface. Using Caret, we projected the regions
-onto the 2-d mesh, and then onto the grid, and then we converted the region data into MATLAB format.
+these manual traces into Caret-format regional boundary data on the mesh surface. We projected the regions onto the 2-d
+mesh, and then onto the grid, and then we converted the region data into MATLAB format.
@@ -328,8 +347,8 @@
-Fig. . The top row of Fig.   displays the 3 genes which most match area AUD, according to a pointwise method9.  The
-bottom row displays the 3 genes which most match AUD according to a method which considers local geometry10  The
+Fig. . The top row of Fig.   displays the 3 genes which most match area AUD, according to a pointwise method10.  The
+bottom row displays the 3 genes which most match AUD according to a method which considers local geometry11  The
@@ -338,14 +357,14 @@
-natorially.  according to logistic regression, gene wwc111  is the best fit single gene for predicting whether or not a pixel on
+natorially.  according to logistic regression, gene wwc112  is the best fit single gene for predicting whether or not a pixel on
-   9For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
+  10For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
-   10For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
+   11For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
-   11&#8220;WW, C2 and coiled-coil domain containing 1&#8221;; EntrezGene ID 211652
+   12&#8220;WW, C2 and coiled-coil domain containing 1&#8221;; EntrezGene ID 211652
@@ -358,7 +377,7 @@
-Gene mtif212 is shown in figure the upper-right of Fig. . Mtif2 captures MO&#8217;s upper-left boundary, but not its lower-right
+Gene mtif213 is shown in figure the upper-right of Fig. . Mtif2 captures MO&#8217;s upper-left boundary, but not its lower-right
@@ -369,7 +388,7 @@
-surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%13. As noted above,
+surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%14. As noted above,
@@ -381,13 +400,23 @@
-  12&#8220;mitochondrial translational initiation factor 2&#8221;; EntrezGene ID 76784
-   135-fold cross-validation.
+  13&#8220;mitochondrial translational initiation factor 2&#8221;; EntrezGene ID 76784
+   145-fold cross-validation.
+Further work on flatmapping
+In anatomy, the manifold of interest is usually either defined by a combination of two relevant anatomical axes (todo),
+or by the surface of the structure (as is the case with the cortex). In the former case, the manifold of interest is a plane, but
+in the latter case it is curved. If the manifold is curved, there are various methods for mapping the manifold into a plane.
+In the case of the cerebral cortex, it remains to be seen which method of mapping the manifold into a plane is optimal
+for this application.  We will compare mappings which attempt to preserve size (such as the one used by Caret[1]) with
+mappings which preserve angle (conformal maps).
+Although there is much 2-D organization in anatomy, there are also structures whose shape is fundamentally 3-dimensional.
+If possible, we would like the method we develop to include a statistical test that warns the user if the assumption of 2-D
+structure seems to be wrong.
@@ -469,19 +498,10 @@
-    In anatomy, the manifold of interest is usually either defined by a combination of two relevant anatomical axes (todo),
-or by the surface of the structure (as is the case with the cortex). In the former case, the manifold of interest is a plane, but
-in the latter case it is curved. If the manifold is curved, there are various methods for mapping the manifold into a plane.
-    The method that we will develop will begin by mapping the data into a 2-D plane.   Although the manifold that
-characterized cortical areas is known to be the cortical surface, it remains to be seen which method of mapping the manifold
-into a plane is optimal for this application. We will compare mappings which attempt to preserve size (such as the one used
-by Caret[1]) with mappings which preserve angle (conformal maps).
-    Although there is much 2-D organization in anatomy, there are also structures whose shape is fundamentally 3-dimensional.
-If possible, we would like the method we develop to include a statistical test that warns the user if the assumption of 2-D
-structure seems to be wrong.
+    &#8220;genomic anatomy&#8221; is a name found in the titles of one of the cited papers which seems good
--- a/grant.txt	Tue Apr 14 02:53:00 2009 -0700
+++ b/grant.txt	Tue Apr 14 23:33:43 2009 -0700
@@ -3,11 +3,11 @@
-Massive new datasets obtained with techniques such as in situ hybridization (ISH) and BAC-transgenics allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated methods to relate spatial variation in gene expression to anatomy. We want to find marker genes for specific anatomical regions, and also to draw new anatomical maps based on gene expression patterns. We have three specific aims:\\
+Massive new datasets obtained with techniques such as in situ hybridization (ISH), immunohistochemistry, or in situ transgenic reporter allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated methods to relate spatial variation in gene expression to anatomy. We want to find marker genes for specific anatomical regions, and also to draw new anatomical maps based on gene expression patterns. We have three specific aims:\\
-(2) develop an algorithm to suggest new ways of carving up a structure into anatomical subregions, based on spatial patterns in gene expression\\
+(2) develop an algorithm to suggest new ways of carving up a structure into anatomical regions, based on spatial patterns in gene expression\\
@@ -24,15 +24,15 @@
-The task of looking for marker genes for anatomical subregions means that one is looking for a set of genes such that, if the expression level of those genes is known, then the locations of the subregions can be inferred. 
-
-If we define the subregions so that they cover the entire anatomical structure to be divided, then instead of saying that we are using gene expression to find the locations of the subregions, we may say that we are using gene expression to determine to which subregion each voxel within the structure belongs. We call this a __classification task__, because each voxel is being assigned to a class (namely, its subregion).
-
-Therefore, an understanding of the relationship between the combination of their expression levels and the locations of the subregions may be expressed as a function. The input to this function is a voxel, along with the gene expression levels within that voxel; the output is the subregional identity of the target voxel, that is, the subregion to which the target voxel belongs. We call this function a __classifier__. In general, the input to a classifier is called an __instance__, and the output is called a __label__ (or a __class label__).
+The task of looking for marker genes for anatomical regions means that one is looking for a set of genes such that, if the expression level of those genes is known, then the locations of the regions can be inferred. 
+
+If we define the regions so that they cover the entire anatomical structure to be divided, then instead of saying that we are using gene expression to find the locations of the regions, we may say that we are using gene expression to determine to which region each voxel within the structure belongs. We call this a __classification task__, because each voxel is being assigned to a class (namely, its region).
+
+Therefore, an understanding of the relationship between the combination of their expression levels and the locations of the regions may be expressed as a function. The input to this function is a voxel, along with the gene expression levels within that voxel; the output is the regional identity of the target voxel, that is, the region to which the target voxel belongs. We call this function a __classifier__. In general, the input to a classifier is called an __instance__, and the output is called a __label__ (or a __class label__).
-In the machine learning literature, this sort of procedure may be thought of as a __supervised learning task__, defined as a task in which the goal is to learn a mapping from instances to labels, and the training data consists of a set of instances (voxels) for which the labels (subregions) are known. 
+In the machine learning literature, this sort of procedure may be thought of as a __supervised learning task__, defined as a task in which the goal is to learn a mapping from instances to labels, and the training data consists of a set of instances (voxels) for which the labels (regions) are known. 
@@ -72,9 +72,9 @@
-The task of deciding how to carve up a structure into anatomical subregions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same subregion have similar gene expression profiles, at least compared to the other subregions. This means that clustering voxels is the same as finding potential subregions; we seek a partitioning of the voxels into subregions, that is, into clusters of voxels with similar gene expression.
-
-It is desirable to determine not just one set of subregions, but also how these subregions relate to each other, if at all; perhaps some of the subregions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large subregion. This suggests the outcome of clustering may be a hierarchial tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchial clustering.
+The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same region have similar gene expression profiles, at least compared to the other regions. This means that clustering voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
+
+It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchial tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchial clustering.
@@ -95,7 +95,7 @@
-Another use for dimensionality reduction is to visualize the relationships between subregions. For example, one might want to make a 2-D plot upon which each subregion is represented by a single point, and with the property that subregions with similar gene expression profiles should be nearby on the plot (that is, the property that distance between pairs of points in the plot should be proportional to some measure of dissimilarity in gene expression). It is likely that no arrangement of the points on a 2-D plan will exactly satisfy this property -- however, dimensionality reduction techniques allow one to find arrangements of points that approximately satisfy that property. Note that in this application, dimensionality reduction is being applied after clustering; whereas in the previous paragraph, we were talking about using dimensionality reduction before clustering.
+Another use for dimensionality reduction is to visualize the relationships between regions. For example, one might want to make a 2-D plot upon which each region is represented by a single point, and with the property that regions with similar gene expression profiles should be nearby on the plot (that is, the property that distance between pairs of points in the plot should be proportional to some measure of dissimilarity in gene expression). It is likely that no arrangement of the points on a 2-D plan will exactly satisfy this property -- however, dimensionality reduction techniques allow one to find arrangements of points that approximately satisfy that property. Note that in this application, dimensionality reduction is being applied after clustering; whereas in the previous paragraph, we were talking about using dimensionality reduction before clustering.
@@ -105,7 +105,7 @@
-Gene clusters could also be used to directly yield a clustering on instances. This is because many genes have an expression pattern which seems to pick out a single, spatially continguous subregion. Therefore, it seems likely that an anatomically interesting subregion will have multiple genes which each individually pick it out\footnote{This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes. However, it is possible that the currently accepted cortical maps divide the cortex into subregions which are unnatural from the point of view of gene expression; perhaps there is some other way to map the cortex for which each subregion can be identified by single genes.}. This suggests the following procedure: cluster together genes which pick out similar subregions, and then to use the more popular common subregions as the final clusters. In the Preliminary Data we show that a number of anatomically recognized cortical regions, as well as some "superregions" formed by lumping together a few regions, are associated with gene clusters in this fashion.
+Gene clusters could also be used to directly yield a clustering on instances. This is because many genes have an expression pattern which seems to pick out a single, spatially continguous region. Therefore, it seems likely that an anatomically interesting region will have multiple genes which each individually pick it out\footnote{This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes. However, it is possible that the currently accepted cortical maps divide the cortex into regions which are unnatural from the point of view of gene expression; perhaps there is some other way to map the cortex for which each region can be identified by single genes.}. This suggests the following procedure: cluster together genes which pick out similar regions, and then to use the more popular common regions as the final clusters. In the Preliminary Data we show that a number of anatomically recognized cortical regions, as well as some "superregions" formed by lumping together a few regions, are associated with gene clusters in this fashion.
@@ -129,6 +129,7 @@
+The ABA is not the only large public spatial gene expression dataset. Other such resources include GENSAT\cite{gong_gene_2003}, GenePaint\cite{visel_genepaint_2004}, its sister project GeneAtlas\cite{carson_data_2005}, BGEM\cite{magdaleno_bgem_2006}, EMAGE\cite{?}, EurExpress (http://www.eurexpress.org/ee/; EurExpress data is also entered into EMAGE), todo. With the exception of the ABA, GenePaint, and EMAGE, most of these resources, have not (yet) extracted the expression intensity from the ISH images and registered the results into a single 3-D space, and only ABA and EMAGE make this form of data available for public download from the website. Many of these resources focus on developmental gene expression.
@@ -144,18 +145,20 @@
-There does not appear to be much work on the automated analysis of spatial gene expression data. 
-
-We are aware of two existing efforts to relate spatial gene expression data to anatomy through computational methods.
+We are aware of four existing efforts to relate spatial gene expression data to anatomy through computational methods.
+
+\cite{carson_data_2005} refers to GeneAtlas. GeneAtlas allows the user to construct a search query by freely demarcating one or two 2-D regions on sagittal slices, and then to specify either the strength of expression or the name of another gene whose expression pattern is to be matched. GeneAtlas differs from our Aim 1 in at least two ways. First, GeneAtlas finds only single genes, whereas we will also look for combinations of genes\footnote{See Preliminary Data for an example of an area which cannot be marked by any single gene in the dataset, but which can be marked by a combination.}. Second, at least for the custom spatial search, Gene Atlas appears to use a simple pointwise scoring method (strength of expression), whereas we will also use geometric metrics such as gradient similarity. 
+
+\cite{venkataraman_emage_2008} todo
-Factorization (NNMF), and a hierarchial recursive bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the usefulness of such research. We have run NNMF on the cortical dataset\footnote{We ran "vanilla" NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was needed. The paper under discussion mentions that they also tried a hierarchial variant of NNMF, but since they didn't report its results, we assume that those result were not any more impressive than the results of the non-hierarchial variant.} and while the results are promising (see Preliminary Data), we think that it will be possible to find a better method (we also think that more automation of the parts that this paper's authors did manually will be possible).
+Factorization (NNMF), and a hierarchial recursive bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the usefulness of such research. We have run NNMF on the cortical dataset\footnote{We ran "vanilla" NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was needed. The paper under discussion also mentions that they tried a hierarchial variant of NNMF, which we have not yet tried.} and while the results are promising (see Preliminary Data), we think that it will be possible to find a better method (we also think that more automation of the parts that this paper's authors did manually will be possible).
@@ -164,7 +167,7 @@
-which are overexpressed in that cluster. 
+which are overexpressed in that cluster. (note: the ABA website also contains pre-prepared lists of overexpressed genes for selected structures)
@@ -174,9 +177,11 @@
-The hierarchial clustering is different from our Aim 2 in at least three ways. First, the clustering finds clusters corresponding to layers, but no clusters corresponding to areas\footnote{This is for the same reason as in footnote \ref{layersNotAreas}.} \footnote{There are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these.} Our Aim 2 will not be accomplished until a clustering is produced which yields areas. Second, AGEA uses perhaps the simplest possible similarity score (correlation), and does no dimensionality reduction before calculating similarity. While it is possible that a more complex system will not do any better than this, we believe further exploration of alternative methods of scoring and dimensionality reduction is warranted. Third, AGEA did not look at clusters of genes; in Preliminary Data we have shown that clusters of genes may identify intersting spatial subregions such as cortical areas.
-
-
+The hierarchial clustering is different from our Aim 2 in at least three ways. First, the clustering finds clusters corresponding to layers, but no clusters corresponding to cortical areas\footnote{This is for the same reason as in footnote \ref{layersNotAreas}.} \footnote{There are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these.} Our Aim 2 will not be accomplished until a clustering is produced which yields areas. Second, AGEA uses perhaps the simplest possible similarity score (correlation), and does no dimensionality reduction before calculating similarity. While it is possible that a more complex system will not do any better than this, we believe further exploration of alternative methods of scoring and dimensionality reduction is warranted. Third, AGEA did not look at clusters of genes; in Preliminary Data we have shown that clusters of genes may identify intersting spatial regions such as cortical areas.
+
+Finally, with the except of \cite{thompson_genomic_2008}, none of the publications discussed above compare the results obtained by using different algorithms or scoring methods. \cite{thompson_genomic_2008} reports that both mNNMF and hierarchial mNNMF clustering were useful, and that hierarchial recursive bifurcation gave similar results.
+
+To summarize, in comparison to our Aim 1, none of the previous projects explores combinations of marker genes, and w/r/t both aims, there has been almost no experimentation with or comparison of different algorithms or scoring methods. todo
@@ -189,11 +194,11 @@
-Using Caret\cite{van_essen_integrated_2001}, we created a mesh representation of the surface of the selected region. For each gene, for each node of the mesh, we used Caret to calculate an average of the gene expression of the voxels "underneath" that mesh node. We then used Caret to flatten the cortex, creating a two-dimensional mesh. 
+Using Caret\cite{van_essen_integrated_2001}, we created a mesh representation of the surface of the selected voxels. For each gene, for each node of the mesh, we calculated an average of the gene expression of the voxels "underneath" that mesh node. We then flattened the cortex, creating a two-dimensional mesh. 
-We manually traced the boundaries of each cortical area from the ABA coronal reference atlas slides. We then converted these manual traces into Caret-format regional boundary data on the mesh surface. Using Caret, we projected the regions onto the 2-d mesh, and then onto the grid, and then we converted the region data into MATLAB format.
+We manually traced the boundaries of each cortical area from the ABA coronal reference atlas slides. We then converted these manual traces into Caret-format regional boundary data on the mesh surface. We projected the regions onto the 2-d mesh, and then onto the grid, and then we converted the region data into MATLAB format.
@@ -344,9 +349,21 @@
+
+\vspace{0.3cm}**Further work on flatmapping**
+
+
+In anatomy, the manifold of interest is usually either defined by a combination of two relevant anatomical axes (todo), or by the surface of the structure (as is the case with the cortex). In the former case, the manifold of interest is a plane, but in the latter case it is curved. If the manifold is curved, there are various methods for mapping the manifold into a plane.
+
+In the case of the cerebral cortex, it remains to be seen which method of mapping the manifold into a plane is optimal for this application. We will compare mappings which attempt to preserve size (such as the one used by Caret\cite{van_essen_integrated_2001}) with mappings which preserve angle (conformal maps). 
+
+Although there is much 2-D organization in anatomy, there are also structures whose shape is fundamentally 3-dimensional. If possible, we would like the method we develop to include a statistical test that warns the user if the assumption of 2-D structure seems to be wrong.
+
+
+
@@ -390,12 +407,6 @@
-In anatomy, the manifold of interest is usually either defined by a combination of two relevant anatomical axes (todo), or by the surface of the structure (as is the case with the cortex). In the former case, the manifold of interest is a plane, but in the latter case it is curved. If the manifold is curved, there are various methods for mapping the manifold into a plane.
-
-The method that we will develop will begin by mapping the data into a 2-D plane. Although the manifold that characterized cortical areas is known to be the cortical surface, it remains to be seen which method of mapping the manifold into a plane is optimal for this application. We will compare mappings which attempt to preserve size (such as the one used by Caret\cite{van_essen_integrated_2001}) with mappings which preserve angle (conformal maps). 
-
-Although there is much 2-D organization in anatomy, there are also structures whose shape is fundamentally 3-dimensional. If possible, we would like the method we develop to include a statistical test that warns the user if the assumption of 2-D structure seems to be wrong.
-
@@ -409,3 +420,7 @@
+
+
+"genomic anatomy" is a name found in the titles of one of the cited papers which seems good
+
author	bshanks@bshanks.dyndns.org
date	Tue Apr 14 23:33:43 2009 -0700 (16 years ago)
parents	34e681823d3a
children	8cce366da1e5
files	grant.doc grant.html grant.odt grant.pdf grant.txt