cg
diff grant.txt @ 102:4cca7c7d91d1
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Wed Apr 22 07:09:37 2009 -0700 (16 years ago) |
parents | fa7c0a924e7a |
children | 6ea7e2e5e6c3 |
line diff
1.1 --- a/grant.txt Wed Apr 22 06:45:17 2009 -0700
1.2 +++ b/grant.txt Wed Apr 22 07:09:37 2009 -0700
1.3 @@ -1,5 +1,6 @@
1.4 \documentclass[11pt]{nih-blank}
1.5
1.6 +\usepackage[small,compact]{titlesec}
1.7
1.8 %%\piname{Stevens, Charles F.}
1.9
1.10 @@ -48,6 +49,7 @@
1.11
1.12 This proposal addresses challenge topic 06-HG-101. Massive new datasets obtained with techniques such as in situ hybridization (ISH), immunohistochemistry, in situ transgenic reporter, microarray voxelation, and others, allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated methods to relate spatial variation in gene expression to anatomy. We want to find marker genes for specific anatomical regions, and also to draw new anatomical maps based on gene expression patterns.
1.13
1.14 +\vspace{0.3cm}\hrule
1.15 == The Challenge and Potential impact ==
1.16
1.17 Each of our three aims will be discussed in turn. For each aim, we will develop a conceptual framework for thinking about the task, and we will present our strategy for solving it. Next we will discuss related work. At the conclusion of each section, we will summarize why our strategy is different from what has been done before. At the end of this section, we will describe the potential impact.
1.18 @@ -140,15 +142,14 @@
1.19
1.20 As noted above, there has been much work on both supervised learning and there are many available algorithms for each. However, the algorithms require the scientist to provide a framework for representing the problem domain, and the way that this framework is set up has a large impact on performance. Creating a good framework can require creatively reconceptualizing the problem domain, and is not merely a mechanical "fine-tuning" of numerical parameters. For example, we believe that domain-specific scoring measures (such as gradient similarity, which is discussed in Preliminary Studies) may be necessary in order to achieve the best results in this application.
1.21
1.22 -We are aware of six existing efforts to find marker genes using spatial gene expression data using automated methods.
1.23 +We now turn to efforts to find marker genes using spatial gene expression data using automated methods.
1.24
1.25 %%GeneAtlas\cite{carson_digital_2005} allows the user to construct a search query by freely demarcating one or two 2-D regions on sagittal slices, and then to specify either the strength of expression or the name of another gene whose expression pattern is to be matched.
1.26
1.27 -\cite{lee_high-resolution_2007} mentions the possibility of constructing a spatial region for each gene, and then, for each anatomical structure of interest, computing what proportion of this structure is covered by the gene's spatial region.
1.28 -
1.29 -GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifying either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
1.30 -
1.31 %% \footnote{For the similiarity score (match score) between two images (in this case, the query and the gene expression images), GeneAtlas uses the sum of a weighted L1-norm distance between vectors whose components represent the number of cells within a pixel (actually, many of these projects use quadrilaterals instead of square pixels; but we will refer to them as pixels for simplicity) whose expression is within four discretization levels. EMAGE uses Jaccard similarity (the number of true pixels in the intersection of the two images, divided by the number of pixels in their union).}
1.32 +%% \cite{lee_high-resolution_2007} mentions the possibility of constructing a spatial region for each gene, and then, for each anatomical structure of interest, computing what proportion of this structure is covered by the gene's spatial region.
1.33 +
1.34 +GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifying either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
1.35
1.36 \cite{ng_anatomic_2009} describes AGEA, "Anatomic Gene Expression
1.37 Atlas". AGEA has three
1.38 @@ -156,16 +157,11 @@
1.39 cluster which includes the seed voxel, (2) yields a list of genes
1.40 which are overexpressed in that cluster. **Correlation**: The user selects a seed voxel and the system
1.41 then shows the user how much correlation there is between the gene
1.42 -expression profile of the seed voxel and every other voxel. **Clusters**: will be described later
1.43 -
1.44 -\cite{chin_genome-scale_2007} looks at the mean expression level of genes within anatomical regions, and applies a Student's t-test with Bonferroni correction to determine whether the mean expression level of a gene is significantly higher in the target region.
1.45 -
1.46 -\cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} differ from our Aim 1 in at least three ways. First, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} find only single genes, whereas we will also look for combinations of genes. Second, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} can only use overexpression as a marker, whereas we will also search for underexpression. Third, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} use scores based on pointwise expression levels, whereas we will also use geometric scores such as gradient similarity (described in Preliminary Studies). Figures \ref{MOcombo}, \ref{hole}, and \ref{AUDgeometry} in the Preliminary Studies section contain evidence that each of our three choices is the right one.
1.47 -
1.48 -
1.49 +expression profile of the seed voxel and every other voxel. **Clusters**: will be described later. \cite{chin_genome-scale_2007} looks at the mean expression level of genes within anatomical regions, and applies a Student's t-test with Bonferroni correction to determine whether the mean expression level of a gene is significantly higher in the target region. \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} differ from our Aim 1 in at least three ways. First, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} find only single genes, whereas we will also look for combinations of genes. Second, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} can only use overexpression as a marker, whereas we will also search for underexpression. Third, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} use scores based on pointwise expression levels, whereas we will also use geometric scores such as gradient similarity (described in Preliminary Studies). Figures \ref{MOcombo}, \ref{hole}, and \ref{AUDgeometry} in the Preliminary Studies section contain evidence that each of our three choices is the right one.
1.50
1.51 \cite{hemert_matching_2008} describes a technique to find combinations of marker genes to pick out an anatomical region. They use an evolutionary algorithm to evolve logical operators which combine boolean (thresholded) images in order to match a target image. %%Their match score is Jaccard similarity.
1.52
1.53 +
1.54 In summary, there has been fruitful work on finding marker genes, but only one of the previous projects explores combinations of marker genes, and none of these publications compare the results obtained by using different algorithms or scoring methods.
1.55
1.56
1.57 @@ -173,6 +169,23 @@
1.58
1.59 === Aim 2: From gene expression data, discover a map of regions ===
1.60
1.61 +
1.62 +
1.63 +\vspace{0.3cm}**Machine learning terminology: clustering**
1.64 +
1.65 +If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is referred to as __unsupervised learning__ in the jargon of machine learning. One thing that you can do with such a dataset is to group instances together. A set of similar instances is called a __cluster__, and the activity of finding grouping the data into clusters is called __clustering__ or __cluster analysis__.
1.66 +
1.67 +The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same anatomical region have similar gene expression profiles, at least compared to the other regions. This means that clustering voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
1.68 +
1.69 +%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
1.70 +
1.71 +It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
1.72 +
1.73 +
1.74 +\vspace{0.3cm}**Similarity scores**
1.75 +A crucial choice when designing a clustering method is how to measure similarity, across either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature selection (discussed above under Aim 1) and scoring methods for similarity.
1.76 +
1.77 +
1.78 \begin{wrapfigure}{L}{0.35\textwidth}\centering
1.79 \includegraphics[scale=.27]{MO_vs_Wwc1_jet.eps}\includegraphics[scale=.27]{MO_vs_Mtif2_jet.eps}
1.80
1.81 @@ -180,23 +193,6 @@
1.82 \caption{Upper left: $wwc1$. Upper right: $mtif2$. Lower left: wwc1 + mtif2 (each pixel's value on the lower left is the sum of the corresponding pixels in the upper row).}
1.83 \label{MOcombo}\end{wrapfigure}
1.84
1.85 -
1.86 -
1.87 -\vspace{0.3cm}**Machine learning terminology: clustering**
1.88 -
1.89 -If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is referred to as __unsupervised learning__ in the jargon of machine learning. One thing that you can do with such a dataset is to group instances together. A set of similar instances is called a __cluster__, and the activity of finding grouping the data into clusters is called __clustering__ or __cluster analysis__.
1.90 -
1.91 -The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same anatomical region have similar gene expression profiles, at least compared to the other regions. This means that clustering voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
1.92 -
1.93 -%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
1.94 -
1.95 -It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
1.96 -
1.97 -
1.98 -\vspace{0.3cm}**Similarity scores**
1.99 -A crucial choice when designing a clustering method is how to measure similarity, across either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature selection (discussed above under Aim 1) and scoring methods for similarity.
1.100 -
1.101 -
1.102 \vspace{0.3cm}**Spatially contiguous clusters; image segmentation**
1.103 We have shown that aim 2 is a type of clustering task. In fact, it is a special type of clustering task because we have an additional constraint on clusters; voxels grouped together into a cluster must be spatially contiguous. In Preliminary Studies, we show that one can get reasonable results without enforcing this constraint; however, we plan to compare these results against other methods which guarantee contiguous clusters.
1.104
1.105 @@ -347,6 +343,7 @@
1.106
1.107
1.108
1.109 +\vspace{0.3cm}\hrule
1.110
1.111 == The approach: Preliminary Studies ==
1.112
1.113 @@ -596,22 +593,23 @@
1.114 %%\vspace{0.3cm}**Extension to probabalistic maps**
1.115 %%Presently, we do not have a probabalistic atlas which is registered to the ABA space. However, in anticipation of the availability of such maps, we would like to explore extensions to our Aim 1 techniques which can handle probabalistic maps.
1.116
1.117 +\vspace{0.3cm}\hrule
1.118
1.119 == Timeline and milestones ==
1.120
1.121 \vspace{0.3cm}**Finding marker genes**
1.122 \\ **September-November 2009**: Develop an automated mechanism for segmenting the cortical voxels into layers
1.123 \\ **November 2009 (milestone)**: Have completed construction of a flatmapped, cortical dataset with information for each layer
1.124 -\\ **October 2009-April 2010**: Develop scoring methods, dimensionality reduction, and supervised learning methods.
1.125 +\\ **October 2009-April 2010**: Develop scoring and supervised learning methods.
1.126 \\ **January 2010 (milestone)**: Submit a publication on single marker genes for cortical areas
1.127 \\ **February-July 2010**: Continue to develop scoring methods and supervised learning frameworks. Extend techniques for robustness. Compare the performance of techniques. Validate marker genes. Prepare software toolbox for Aim 1.
1.128 \\ **June 2010 (milestone)**: Submit a paper describing a method fulfilling Aim 1. Release toolbox.
1.129 \\ **July 2010 (milestone)**: Submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
1.130
1.131 \vspace{0.3cm}**Revealing new ways to parcellate a structure into regions**
1.132 -\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms for Aim 2. Explore clustering algorithms. Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
1.133 +\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms. Explore clustering algorithms. Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
1.134 \\ **March 2011 (milestone)**: Submit a paper describing a method fulfilling Aim 2. Release toolbox.
1.135 -\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, interpret the results. Prepare software toolbox for Aim 2.
1.136 +\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex, interpret the results. Prepare software toolbox for Aim 2.
1.137 \\ **May 2011 (milestone)**: Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
1.138 \\ **May-August 2011**: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Possibly submit another paper.
1.139