cg

changeset 101:89815d210b5c
.
author: bshanks@bshanks.dyndns.org
date: Wed Apr 22 07:06:46 2009 -0700 (16 years ago)
parents: fa7c0a924e7a
children: 4cca7c7d91d1
files: grant.doc grant.html grant.odt grant.pdf grant.txt nih-blank.cls
--- a/grant.html	Wed Apr 22 06:45:17 2009 -0700
+++ b/grant.html	Wed Apr 22 07:06:46 2009 -0700
@@ -30,53 +30,53 @@
-The Challenge and Potential impact
+______________
+ The Challenge and Potential impact
-Aim 1: Given a map of regions, find genes that mark the regions
-
+ Aim 1: Given a map of regions, find genes that mark the regions
+  
-pressed in area SS.    Machine learning terminology:  classifiers The task of looking for marker genes for
-                known anatomical regions means that one is looking for a set of genes such that,  if
-                the expression level of those genes is known, then the locations of the regions can be
-                inferred.
-                   If  we  define  the  regions  so  that  they  cover  the  entire  anatomical  structure  to  be
-                subdivided,  we  may  say  that  we  are  using  gene  expression  in  each  voxel  to  assign
-                that voxel to the proper area.   We call this a classification task,  because each voxel
-                is being assigned to a class (namely, its region).  An understanding of the relationship
-                between the combination of their expression levels and the locations of the regions may
-                be expressed as a function.  The input to this function is a voxel, along with the gene
-                expression levels within that voxel; the output is the regional identity of the target voxel,
+pressed in area SS.       Machine learning terminology:  classifiers The task of looking for marker genes for
+                                       known anatomical regions means that one is looking for a set of genes such that,  if
+                                       the expression level of those genes is known, then the locations of the regions can be
+                                       inferred.
+                                            If  we  define  the  regions  so  that  they  cover  the  entire  anatomical  structure  to  be
+                                       subdivided,  we  may  say  that  we  are  using  gene  expression  in  each  voxel  to  assign
+                                       that voxel to the proper area.   We call this a classification task,  because each voxel
+                                       is being assigned to a class (namely, its region).  An understanding of the relationship
+                                       between the combination of their expression levels and the locations of the regions may
+                                       be expressed as a function.  The input to this function is a voxel, along with the gene
+                                       expression levels within that voxel; the output is the regional identity of the target voxel,
-The object of aim 1 is not to produce a single classifier,  but rather to develop an automated method for
+      The object of aim 1 is not to produce a single classifier,  but rather to develop an automated method for
-Each gene expression level is called a feature, and the selection of which genes1  to include is called feature
+      Each gene expression level is called a feature, and the selection of which genes1  to include is called feature
-One class of feature selection methods assigns some sort of score to each candidate gene. The top-ranked
+      One class of feature selection methods assigns some sort of score to each candidate gene. The top-ranked
-Although the classifier itself may only look at the gene expression data within each voxel before classifying
+      Although the classifier itself may only look at the gene expression data within each voxel before classifying
-_________________________________________
-    1Strictly speaking, the features are gene expression levels, but we&#8217;ll call them genes.
-Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects
+      Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects
+    1Strictly speaking, the features are gene expression levels, but we&#8217;ll call them genes.
@@ -169,94 +169,102 @@
-                                  We are aware of six existing efforts to find marker genes using spa-
-                               tial gene expression data using automated methods.
-                                  [13] mentions the possibility of constructing a spatial region for each
-                               gene,  and then,  for each anatomical structure of interest,  computing
-                               what proportion of this structure is covered by the gene&#8217;s spatial region.
+                                  We now turn to efforts to find marker genes using spatial gene ex-
+                               pression data using automated methods.
-search for combinations of genes that define a region in concert but not separately.
-[15 ] describes AGEA, &#8221;Anatomic Gene Expression Atlas&#8221;. AGEA has three components. Gene Finder: The
-user selects a seed voxel and the system (1) chooses a cluster which includes the seed voxel, (2) yields a list of
-genes which are overexpressed in that cluster. Correlation: The user selects a seed voxel and the system then
-shows the user how much correlation there is between the gene expression profile of the seed voxel and every
-other voxel. Clusters: will be described later
-[6 ] looks at the mean expression level of genes within anatomical regions, and applies a Student&#8217;s t-test with
-Bonferroni correction to determine whether the mean expression level of a gene is significantly higher in the
-target region.
-[15 ] and [6] differ from our Aim 1 in at least three ways.  First, [15] and [6] find only single genes, whereas
-we will also look for combinations of genes.  Second, [15] and [6] can only use overexpression as a marker,
-whereas we will also search for underexpression. Third, [15] and [6] use scores based on pointwise expression
-levels, whereas we will also use geometric scores such as gradient similarity (described in Preliminary Studies).
-Figures 4, 1, and 3 in the Preliminary Studies section contain evidence that each of our three choices is the right
-one.
+                               search for combinations of genes that define a region in concert but not
+                               separately.
+                                  [15] describes AGEA, &#8221;Anatomic Gene Expression Atlas&#8221;.  AGEA
+has three components. Gene Finder: The user selects a seed voxel and the system (1) chooses a cluster which
+includes the seed voxel, (2) yields a list of genes which are overexpressed in that cluster.  Correlation:  The
+user selects a seed voxel and the system then shows the user how much correlation there is between the gene
+expression profile of the seed voxel and every other voxel. Clusters: will be described later. [6] looks at the mean
+expression level of genes within anatomical regions, and applies a Student&#8217;s t-test with Bonferroni correction to
+determine whether the mean expression level of a gene is significantly higher in the target region.  [15] and [6]
+differ from our Aim 1 in at least three ways.  First, [15] and [6] find only single genes, whereas we will also look
+for combinations of genes. Second, [15] and [6] can only use overexpression as a marker, whereas we will also
+search for underexpression.  Third, [15] and [6] use scores based on pointwise expression levels, whereas we
+will also use geometric scores such as gradient similarity (described in Preliminary Studies). Figures 4, 1, and 3
+in the Preliminary Studies section contain evidence that each of our three choices is the right one.
-_____________________
-    2By &#8220;fundamentally spatial&#8221; we mean that there is information from a large number of spatial locations indexed by spatial coordinates;
-not just data which have only a few different locations or which is indexed by anatomical label.
+Machine learning terminology: clustering
+If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is
+referred to as unsupervised learning in the jargon of machine learning.  One thing that you can do with such a
+dataset is to group instances together.  A set of similar instances is called a cluster, and the activity of finding
+grouping the data into clusters is called clustering or cluster analysis.
+_________________________________________
+    2By &#8220;fundamentally spatial&#8221; we mean that there is information from a large number of spatial locations indexed by spatial coordinates;
+not just data which have only a few different locations or which is indexed by anatomical label.
+The task of deciding how to carve up a structure into anatomical regions can be put into these terms.  The
+instances  are  once  again  voxels  (or  pixels)  along  with  their  associated  gene  expression  profiles.   We  make
+the assumption  that  voxels  from  the  same  anatomical  region  have  similar  gene  expression  profiles,  at  least
+compared to the other regions.  This means that clustering voxels is the same as finding potential regions; we
+seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
+It is desirable to determine not just one set of regions, but also how these regions relate to each other.  The
+outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition
+the voxels. This is called hierarchical clustering.
+Similarity scores A crucial choice when designing a clustering method is how to measure similarity, across
+either pairs  of  instances,  or  clusters,  or  both.   There  is  much  overlap  between  scoring  methods  for  feature
+selection (discussed above under Aim 1) and scoring methods for similarity.
-the upper row).                        Machine learning terminology: clustering
-                                  If  one  is  given  a  dataset  consisting  merely  of  instances,  with  no
-                               class labels, then analysis of the dataset is referred to as unsupervised
-                               learning in the jargon of machine learning.  One thing that you can do
-                               with  such  a  dataset  is  to  group  instances  together.   A  set  of  similar
-                               instances is called a cluster,  and the activity of finding grouping the
-                               data into clusters is called clustering or cluster analysis.
-                                  The task of deciding how to carve up a structure into anatomical
-                               regions can be put into these terms.   The instances are once again
-                               voxels (or pixels) along with their associated gene expression profiles.
-                               We make the assumption that voxels from the same anatomical region
-                               have similar gene expression profiles, at least compared to the other
-                               regions.  This means that clustering voxels is the same as finding po-
-                               tential regions; we seek a partitioning of the voxels into regions, that is,
-                               into clusters of voxels with similar gene expression.
-                                  It is desirable to determine not just one set of regions, but also how
-                               these regions relate to each other.  The outcome of clustering may be
-                               a hierarchical tree of clusters, rather than a single set of clusters which
-partition the voxels. This is called hierarchical clustering.
-Similarity scores A crucial choice when designing a clustering method is how to measure similarity, across
-either pairs  of  instances,  or  clusters,  or  both.   There  is  much  overlap  between  scoring  methods  for  feature
-selection (discussed above under Aim 1) and scoring methods for similarity.
-Spatially contiguous clusters;  image segmentation We have shown that aim 2 is a type of clustering
-task. In fact, it is a special type of clustering task because we have an additional constraint on clusters; voxels
-grouped together into a cluster must be spatially contiguous.  In Preliminary Studies, we show that one can get
-reasonable results without enforcing this constraint; however, we plan to compare these results against other
-methods which guarantee contiguous clusters.
-Image segmentation is the task of partitioning the pixels in a digital image into clusters, usually contiguous
-clusters. Aim 2 is similar to an image segmentation task. There are two main differences; in our task, there are
-thousands of color channels (one for each gene), rather than just three3. A more crucial difference is that there
-are various cues which are appropriate for detecting sharp object boundaries in a visual scene but which are not
-appropriate for segmenting abstract spatial data such as gene expression. Although many image segmentation
-algorithms can be expected to work well for segmenting other sorts of spatially arranged data, some of these
-algorithms are specialized for visual images.
+the upper row).                           Spatially  contiguous  clusters;  image  segmentation  We  have
+                               shown that aim 2 is a type of clustering task.   In fact,  it is a special
+                               type  of  clustering  task  because  we  have  an  additional  constraint  on
+                               clusters; voxels grouped together into a cluster must be spatially con-
+                               tiguous.  In Preliminary Studies, we show that one can get reasonable
+                               results without enforcing this constraint; however, we plan to compare
+                               these results against other methods which guarantee contiguous clus-
+                               ters.
+                                  Image segmentation is the task of partitioning the pixels in a digital
+                               image into clusters, usually contiguous clusters.  Aim 2 is similar to an
+                               image segmentation task. There are two main differences; in our task,
+                               there are thousands of color channels (one for each gene), rather than
+                               just three3.   A more crucial difference is that there are various cues
+                               which are appropriate for detecting sharp object boundaries in a visual
+                               scene  but  which  are  not  appropriate  for  segmenting  abstract  spatial
+                               data such as gene expression.   Although many image segmentation
+                               algorithms can be expected to work well for segmenting other sorts of
+                               spatially arranged data, some of these algorithms are specialized for
+visual images.
+go by the name of feature extraction or dimensionality reduction. The small set of features that such a technique
+yields is called the reduced feature set.  Note that the features in the reduced feature set do not necessarily
+correspond to genes; each feature in the reduced set may be any function of the set of gene expression levels.
+Clustering genes rather than voxels Although the ultimate goal is to cluster the instances (voxels or pixels),
+one strategy to achieve this goal is to first cluster the features (genes). There are two ways that clusters of genes
+could be used.
+Gene clusters could be used as part of dimensionality reduction: rather than have one feature for each gene,
+we could have one reduced feature for each gene cluster.
+Gene clusters could also be used to directly yield a clustering on instances. This is because many genes have
+an expression pattern which seems to pick out a single, spatially contiguous region. This suggests the following
-go by the name of feature extraction or dimensionality reduction. The small set of features that such a technique
-yields is called the reduced feature set.  Note that the features in the reduced feature set do not necessarily
-correspond to genes; each feature in the reduced set may be any function of the set of gene expression levels.
+procedure:  cluster together genes which pick out similar regions, and then to use the more popular common
+regions as the final clusters. In Preliminary Studies, Figure 7, we show that a number of anatomically recognized
+cortical regions, as well as some &#8220;superregions&#8221; formed by lumping together a few regions, are associated with
+gene clusters in this fashion.
@@ -281,23 +289,7 @@
-Slco1a5, Tshz2, Trhr, Col12a1, Ets1.      Clustering genes rather than voxels Although the ultimate goal is
-                               to cluster the instances (voxels or pixels), one strategy to achieve this
-                               goal is to first cluster the features (genes).  There are two ways that
-                               clusters of genes could be used.
-                                  Gene clusters could be used as part of dimensionality reduction:
-                               rather than have one feature for each gene, we could have one reduced
-                               feature for each gene cluster.
-                                  Gene clusters could also be used to directly yield a clustering on
-                               instances.  This is because many genes have an expression pattern
-                               which  seems  to  pick  out  a  single,  spatially  contiguous  region.   This
-                               suggests the following procedure:  cluster together genes which pick
-                               out similar regions, and then to use the more popular common regions
-                               as the final clusters.  In Preliminary Studies, Figure 7, we show that a
-                               number of anatomically recognized cortical regions, as well as some
-                               &#8220;superregions&#8221; formed by lumping together a few regions, are associ-
-                               ated with gene clusters in this fashion.
-                                Related work
+Slco1a5, Tshz2, Trhr, Col12a1, Ets1.    Related work
@@ -319,18 +311,35 @@
-                                  [10]  applies  their  technique  for  finding  combinations  of  marker
-                               genes for the purpose of clustering genes around a &#8220;seed gene&#8221;.
+                                  [10] applies their technique for finding combinations of marker genes
+                               for the purpose of clustering genes around a &#8220;seed gene&#8221;.
-                               cation has not yet been found. The projects using gene expression on
-cortex did not attempt to make use of the radial profile of gene expression.  Also, none of these projects did a
+                               cation has not yet been found.   The projects using gene expression
+                               on cortex did not attempt to make use of the radial profile of gene ex-
+                               pression.  Also, none of these projects did a separate dimensionality
+                               reduction step before clustering pixels, none tried to cluster genes first
+                               in order to guide automated clustering of pixels into spatial regions, and
+                               none used co-clustering algorithms.
+                               Aim 3: apply the methods developed to the cerebral cortex
+                               Background
+                                  The cortex is divided into areas and layers. Because of the cortical
+                               columnar organization, the parcellation of the cortex into areas can be
+                               drawn as a 2-D map on the surface of the cortex. In the third dimension,
+                               the boundaries between the areas continue downwards into the cortical
+                               depth, perpendicular to the surface.  The layer boundaries run parallel
+                               to the surface.  One can picture an area of the cortex as a slice of a
+                               six-layered cake6.
+                                  It is known that different cortical areas have distinct roles in both
+                               normal functioning and in disease processes, yet there are no known
-separate dimensionality reduction step before clustering pixels, none tried to cluster genes first in order to guide
-automated clustering of pixels into spatial regions, and none used co-clustering algorithms.
-Aim 3: apply the methods developed to the cerebral cortex
+     6Outside of isocortex, the number of layers varies.
+                               marker genes for most cortical areas. When it is necessary to divide a
+                               tissue sample into cortical areas, this is a manual process that requires
+                               a skilled human to combine multiple visual cues and interpret them in
+                               the context of their approximate location upon the cortical surface.
@@ -344,52 +353,40 @@
-landmark Isomap, 7 dimensions were used.                       Background
-                                                            The cortex is divided into areas and lay-
-                                                         ers.  Because of the cortical columnar or-
-                                                         ganization,  the  parcellation  of  the  cortex
-                                                         into areas can be drawn as a 2-D map on
-                                                         the surface of the cortex.   In the third di-
-                                                         mension, the boundaries between the ar-
-                                                         eas  continue  downwards  into  the  cortical
-                                                         depth,  perpendicular to the surface.   The
-                                                         layer  boundaries  run  parallel  to  the  sur-
-                                                         face. One can picture an area of the cortex
-                                                         as a slice of a six-layered cake6.
-                                                            It is known that different cortical areas
-                                                         have distinct roles in both normal function-
-                                                         ing and in disease processes, yet there are
-                                                         no known marker genes for most cortical
-                                                         areas.   When  it  is  necessary  to  divide  a
-                                                         tissue sample into cortical areas, this is a
-                                                         manual process that requires a skilled hu-
-                                                         man to combine multiple visual cues and
-                                                         interpret  them  in  the  context  of  their  ap-
-                                                         proximate  location  upon  the  cortical  sur-
-                                                         face.
-                                                            Even  the  questions  of  how  many  ar-
+landmark Isomap, 7 dimensions were used.                           Even  the  questions  of  how  many  ar-
-upon map can be seen by contrasting the recent maps given by Swanson[22] on the one hand, and Paxinos
-and Franklin[17] on the other. While the maps are certainly very similar in their general arrangement, significant
-differences remain.
-The Allen Mouse Brain Atlas dataset
-The Allen Mouse Brain Atlas (ABA) data were produced by doing in-situ hybridization on slices of male,
-56-day-old C57BL/6J mouse brains. Pictures were taken of the processed slice, and these pictures were semi-
-automatically analyzed to create a digital measurement of gene expression levels at each location in each slice.
-Per slice, cellular spatial resolution is achieved.  Using this method, a single physical slice can only be used
-to measure one single gene; many different mouse brains were needed in order to measure the expression of
-many genes.
-An automated nonlinear alignment procedure located the 2D data from the various slices in a single 3D
-coordinate system.  In the final 3D coordinate system, voxels are cubes with 200 microns on a side.  There are
-67x41x58 = 159,326 voxels in the 3D coordinate system, of which 51,533 are in the brain[15].
+                                                         upon map can be seen by contrasting the
+                                                         recent maps given by Swanson[22] on the
+                                                         one hand, and Paxinos and Franklin[17] on
+                                                         the  other.   While  the  maps  are  certainly
+                                                         very similar in their general arrangement,
+                                                         significant differences remain.
+                                                            The Allen Mouse Brain Atlas dataset
+                                                            The  Allen  Mouse  Brain  Atlas  (ABA)
+                                                         data  were  produced  by  doing  in-situ  hy-
+                                                         bridization  on  slices  of  male,  56-day-old
+                                                         C57BL/6J  mouse  brains.    Pictures  were
+                                                         taken of the processed slice, and these pic-
+                                                         tures were semi-automatically analyzed to
+                                                         create a digital measurement of gene ex-
+                                                         pression  levels  at  each  location  in  each
+                                                         slice.   Per slice,  cellular spatial resolution
+                                                         is  achieved.   Using  this  method,  a  single
+                                                         physical slice can only be used to measure
+                                                         one  single  gene;   many  different  mouse
+                                                         brains  were  needed  in  order  to  measure
+                                                         the expression of many genes.
+                                                            An automated nonlinear alignment pro-
+                                                         cedure located the 2D data from the var-
+ious slices in a single 3D coordinate system.   In the final 3D coordinate system,  voxels are cubes with 200
+microns on a side.  There are 67x41x58 = 159,326 voxels in the 3D coordinate system, of which 51,533 are in
+the brain[15].
-_________________________________________
-    6Outside of isocortex, the number of layers varies.
@@ -400,6 +397,10 @@
+_________________________________________
+    7The sagittal data do not cover the entire cortex,  and also have greater registration error[15].  Genes were selected by the Allen
+Institute for coronal sectioning based on, &#8220;classes of known neuroscientific interest... or through post hoc identification of a marked
+non-ubiquitous expression pattern&#8221;[15].
@@ -435,28 +436,26 @@
-_________________________________________
-    7The sagittal data do not cover the entire cortex,  and also have greater registration error[15].  Genes were selected by the Allen
-Institute for coronal sectioning based on, &#8220;classes of known neuroscientific interest... or through post hoc identification of a marked
-non-ubiquitous expression pattern&#8221;[15].
-     8In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer
-are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area.  Therefore, a
-pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas.
-The approach: Preliminary Studies
-Format conversion between SEV, MATLAB, NIFTI
+_______________________________
+ The approach: Preliminary Studies
+ Format conversion between SEV, MATLAB, NIFTI
-Flatmap of cortex
+ Flatmap of cortex
+    8In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer
+are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area.  Therefore, a
+pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas.
+     9SEV is a sparse format for spatial data. It is the format in which the ABA data is made available.
@@ -483,8 +482,6 @@
-__________________
-    9SEV is a sparse format for spatial data. It is the format in which the ABA data is made available.
@@ -555,6 +552,8 @@
+_
+   105-fold cross-validation.
@@ -575,8 +574,6 @@
-__
-   105-fold cross-validation.
@@ -622,13 +619,6 @@
-_________________________________________
-   11Not just any redrawing is acceptable, only those which appear to be justified as a natural spatial domain of gene expression by
-multiple sources of evidence.  Interestingly, the need to detect &#8220;natural spatial domains of gene expression&#8221; in a data-driven fashion
-means that the methods of Aim 2 might be useful in achieving Aim 1, as well &#8211; particularly discriminative dimensionality reduction.
-    12Actually, we have already begun to explore decision trees. For each cortical area, we have used the C4.5 algorithm to find a decision
-tree for that area. We achieved good classification accuracy on our training set, but the number of genes that appeared in each tree was
-too large. We plan to implement a pruning procedure to generate trees that use fewer genes.
@@ -657,6 +647,13 @@
+_________________________________________
+   11Not just any redrawing is acceptable, only those which appear to be justified as a natural spatial domain of gene expression by
+multiple sources of evidence.  Interestingly, the need to detect &#8220;natural spatial domains of gene expression&#8221; in a data-driven fashion
+means that the methods of Aim 2 might be useful in achieving Aim 1, as well &#8211; particularly discriminative dimensionality reduction.
+    12Actually, we have already begun to explore decision trees. For each cortical area, we have used the C4.5 algorithm to find a decision
+tree for that area. We achieved good classification accuracy on our training set, but the number of genes that appeared in each tree was
+too large. We plan to implement a pruning procedure to generate trees that use fewer genes.
@@ -682,13 +679,13 @@
-Timeline and milestones
+____________________________________________________________________________
+ Timeline and milestones
-October 2009-April 2010:  Develop scoring methods, dimensionality reduction, and supervised learning meth-
-ods.
+October 2009-April 2010: Develop scoring and supervised learning methods.
@@ -696,13 +693,12 @@
-Revealing new ways to parcellate a structure into regions
-June 2010-March 2011:  Explore dimensionality reduction algorithms for Aim 2.  Explore clustering algorithms.
-Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
+      Revealing new ways to parcellate a structure into regions
+June 2010-March 2011:  Explore dimensionality reduction algorithms.   Explore clustering algorithms.   Adapt
+clustering algorithms to use radial profile information. Compare the performance of techniques.
-February-May 2011:  Using the methods developed for Aim 2, explore the genomic anatomy of the cortex.  If
-new ways of organizing the cortex into areas are discovered, interpret the results.  Prepare software toolbox for
-Aim 2.
+February-May  2011:  Using  the  methods  developed  for  Aim  2,  explore  the  genomic  anatomy  of  the  cortex,
+interpret the results. Prepare software toolbox for Aim 2.
--- a/grant.txt	Wed Apr 22 06:45:17 2009 -0700
+++ b/grant.txt	Wed Apr 22 07:06:46 2009 -0700
@@ -1,5 +1,6 @@
+\usepackage[small,compact]{titlesec} 
@@ -48,6 +49,7 @@
+\vspace{0.3cm}\hrule
@@ -140,15 +142,14 @@
-We are aware of six existing efforts to find marker genes using spatial gene expression data using automated methods.
+We now turn to efforts to find marker genes using spatial gene expression data using automated methods.
-\cite{lee_high-resolution_2007} mentions the possibility of constructing a spatial region for each gene, and then, for each anatomical structure of interest, computing what proportion of this structure is covered by the gene's spatial region.
-
-GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifying either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
-
+%% \cite{lee_high-resolution_2007} mentions the possibility of constructing a spatial region for each gene, and then, for each anatomical structure of interest, computing what proportion of this structure is covered by the gene's spatial region. 
+
+GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifying either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately. 
@@ -156,16 +157,11 @@
-expression profile of the seed voxel and every other voxel. **Clusters**: will be described later
-
-\cite{chin_genome-scale_2007} looks at the mean expression level of genes within anatomical regions, and applies a Student's t-test with Bonferroni correction to determine whether the mean expression level of a gene is significantly higher in the target region.
-
-\cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} differ from our Aim 1 in at least three ways. First, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} find only single genes, whereas we will also look for combinations of genes. Second, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} can only use overexpression as a marker, whereas we will also search for underexpression. Third, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} use scores based on pointwise expression levels, whereas we will also use geometric scores such as gradient similarity (described in Preliminary Studies). Figures \ref{MOcombo}, \ref{hole}, and \ref{AUDgeometry} in the Preliminary Studies section contain evidence that each of our three choices is the right one.
-
-
+expression profile of the seed voxel and every other voxel. **Clusters**: will be described later. \cite{chin_genome-scale_2007} looks at the mean expression level of genes within anatomical regions, and applies a Student's t-test with Bonferroni correction to determine whether the mean expression level of a gene is significantly higher in the target region. \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} differ from our Aim 1 in at least three ways. First, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} find only single genes, whereas we will also look for combinations of genes. Second, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} can only use overexpression as a marker, whereas we will also search for underexpression. Third, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} use scores based on pointwise expression levels, whereas we will also use geometric scores such as gradient similarity (described in Preliminary Studies). Figures \ref{MOcombo}, \ref{hole}, and \ref{AUDgeometry} in the Preliminary Studies section contain evidence that each of our three choices is the right one.
+
@@ -173,6 +169,23 @@
+
+
+\vspace{0.3cm}**Machine learning terminology: clustering**
+
+If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is referred to as __unsupervised learning__ in the jargon of machine learning. One thing that you can do with such a dataset is to group instances together. A set of similar instances is called a __cluster__, and the activity of finding grouping the data into clusters is called __clustering__ or __cluster analysis__.
+
+The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same anatomical  region have similar gene expression profiles, at least compared to the other regions. This means that clustering voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
+
+%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
+
+It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
+
+
+\vspace{0.3cm}**Similarity scores**
+A crucial choice when designing a clustering method is how to measure similarity, across either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature selection (discussed above under Aim 1) and scoring methods for similarity. 
+
+
@@ -180,23 +193,6 @@
-
-
-\vspace{0.3cm}**Machine learning terminology: clustering**
-
-If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is referred to as __unsupervised learning__ in the jargon of machine learning. One thing that you can do with such a dataset is to group instances together. A set of similar instances is called a __cluster__, and the activity of finding grouping the data into clusters is called __clustering__ or __cluster analysis__.
-
-The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same anatomical  region have similar gene expression profiles, at least compared to the other regions. This means that clustering voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
-
-%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
-
-It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
-
-
-\vspace{0.3cm}**Similarity scores**
-A crucial choice when designing a clustering method is how to measure similarity, across either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature selection (discussed above under Aim 1) and scoring methods for similarity. 
-
-
@@ -347,6 +343,7 @@
+\vspace{0.3cm}\hrule
@@ -596,22 +593,23 @@
+\vspace{0.3cm}\hrule
-\\ **October 2009-April 2010**: Develop scoring methods, dimensionality reduction, and supervised learning methods.
+\\ **October 2009-April 2010**: Develop scoring and supervised learning methods.
-\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms for Aim 2. Explore clustering algorithms. Adapt clustering algorithms to use radial profile information. Compare the performance of techniques. 
+\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms. Explore clustering algorithms. Adapt clustering algorithms to use radial profile information. Compare the performance of techniques. 
-\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, interpret the results. Prepare software toolbox for Aim 2.
+\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex, interpret the results. Prepare software toolbox for Aim 2.
--- a/nih-blank.cls	Wed Apr 22 06:45:17 2009 -0700
+++ b/nih-blank.cls	Wed Apr 22 07:06:46 2009 -0700
@@ -48,7 +48,8 @@
-\RequirePackage[letterpaper,left=0.5in,top=0.5in,bottom=0.575in,right=0.55in,nohead,nofoot]{geometry} 
+%\RequirePackage[letterpaper,left=0.5in,top=0.5in,bottom=0.575in,right=0.55in,nohead,nofoot]{geometry} 
+\RequirePackage[letterpaper,left=0.5in,top=0.5in,bottom=0.52in,right=0.55in,nohead,nofoot]{geometry} 
@@ -72,8 +73,12 @@
+
+%% changed by bayle shanks
+
-\renewcommand{\footrulewidth}{0.75pt}
+%\renewcommand{\footrulewidth}{0.75pt}
+\renewcommand{\footrulewidth}{0pt}
@@ -90,7 +95,8 @@
-\addtolength{\headheight}{2.5pt}
+%\addtolength{\headheight}{2.5pt}
+\addtolength{\headheight}{0.5pt}
author	bshanks@bshanks.dyndns.org
date	Wed Apr 22 07:06:46 2009 -0700 (16 years ago)
parents	fa7c0a924e7a
children	4cca7c7d91d1
files	grant.doc grant.html grant.odt grant.pdf grant.txt nih-blank.cls