cg

diff grant.txt @ 92:b4b79f107b2a
.
author: bshanks@bshanks.dyndns.org
date: Tue Apr 21 14:28:12 2009 -0700 (16 years ago)
parents: 7c5d98f0cd5a
children: 9f36acf8d9a8
--- a/grant.txt	Tue Apr 21 06:11:15 2009 -0700
+++ b/grant.txt	Tue Apr 21 14:28:12 2009 -0700
@@ -231,7 +231,27 @@
-While we do not here propose to analyze human gene expression data, it is conceivable that the methods we propose to develop could be used to suggest modifications to the human cortical map as well. In fact, the methods we will develop will be applicable to other datasets beyond the brain. We will provide an open-source toolbox to allow other researchers to easily use our methods. With these methods, researchers with gene expression for any area of the body will be able to efficiently find marker genes for anatomical regions, or to use gene expression to discover new anatomical patterning. As described above, marker genes have a variety of uses in the development of drugs and experimental manipulations, and in the anatomical characterization of tissue samples. The discovery of new ways to carve up anatomical structures into regions will widely impact all areas of biology.
+While we do not here propose to analyze human gene expression data, it
+is conceivable that the methods we propose to develop could be used to
+suggest modifications to the human cortical map as well. In fact, the
+methods we will develop will be applicable to other datasets beyond
+the brain. We will provide an open-source toolbox to allow other
+researchers to easily use our methods. With these methods, researchers
+with gene expression for any area of the body will be able to
+efficiently find marker genes for anatomical regions, or to use gene
+expression to discover new anatomical patterning. As described above,
+marker genes have a variety of uses in the development of drugs and
+experimental manipulations, and in the anatomical characterization of
+tissue samples. The discovery of new ways to carve up anatomical
+structures into regions may lead to the discovery of
+new anatomical subregions in various structures, which will widely
+impact all areas of biology.
+
+Although our particular application involves the 3D spatial
+distribution of gene expression, we anticipate that the methods
+developed in aims (1) and (2) will not be limited to gene expression
+data, but rather will generalize to any sort of
+high-dimensional data over points located in a low-dimensional space.
@@ -431,19 +451,19 @@
+
+
+
+We have applied the following dimensionality reduction algorithms to reduce the dimensionality of the gene expression profile associated with each voxel: Principal Components Analysis (PCA), Simple PCA (SPCA), Multi-Dimensional Scaling (MDS), Isomap, Landmark Isomap, Laplacian eigenmaps, Local Tangent Space Alignment (LTSA), Hessian locally linear embedding, Diffusion maps, Stochastic Neighbor Embedding (SNE), Stochastic Proximity Embedding (SPE), Fast Maximum Variance Unfolding (FastMVU), Non-negative Matrix Factorization (NNMF). Space constraints prevent us from showing many of the results, but as a sample, PCA, NNMF, and landmark Isomap are shown in the first, second, and third rows of Figure \ref{dimReduc}.
+
+After applying the dimensionality reduction, we ran clustering algorithms on the reduced data. To date we have tried k-means and spectral clustering. The results of k-means after PCA, NNMF, and landmark Isomap are shown in the last row of Figure \ref{dimReduc}. To compare, the leftmost picture on the bottom row of Figure \ref{dimReduc} shows some of the major subdivisions of cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparion of these techniques as applied to the domain of genomic anatomy.
+
+
-
-
-We have applied the following dimensionality reduction algorithms to reduce the dimensionality of the gene expression profile associated with each voxel: Principal Components Analysis (PCA), Simple PCA (SPCA), Multi-Dimensional Scaling (MDS), Isomap, Landmark Isomap, Laplacian eigenmaps, Local Tangent Space Alignment (LTSA), Hessian locally linear embedding, Diffusion maps, Stochastic Neighbor Embedding (SNE), Stochastic Proximity Embedding (SPE), Fast Maximum Variance Unfolding (FastMVU), Non-negative Matrix Factorization (NNMF). Space constraints prevent us from showing many of the results, but as a sample, PCA, NNMF, and landmark Isomap are shown in the first, second, and third rows of Figure \ref{dimReduc}.
-
-After applying the dimensionality reduction, we ran clustering algorithms on the reduced data. To date we have tried k-means and spectral clustering. The results of k-means after PCA, NNMF, and landmark Isomap are shown in the last row of Figure \ref{dimReduc}. To compare, the leftmost picture on the bottom row of Figure \ref{dimReduc} shows some of the major subdivisions of cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparion of these techniques as applied to the domain of genomic anatomy.
-
-
-
@@ -453,7 +473,9 @@
-\vspace{0.3cm}**Flatmap and segment cortical layers**
+%%\vspace{0.3cm}**Flatmap cortex and segment cortical layers**
+
+=== Flatmap cortex and segment cortical layers ===
@@ -466,20 +488,35 @@
-\vspace{0.3cm}**Develop algorithms that find genetic markers for anatomical regions**
-
-
-
-
-
-
-
-# Develop scoring measures for evaluating how good individual genes are at marking areas: we will compare pointwise, geometric, and information-theoretic measures.
-# Develop a procedure to find single marker genes for anatomical regions: for each cortical area, by using or combining the scoring measures developed, we will rank the genes by their ability to delineate each area. 
-# Extend the procedure to handle difficult areas by using combinatorial coding: for areas that cannot be identified by any single gene, identify them with a handful of genes. We will consider both (a) algorithms that incrementally/greedily combine single gene markers into sets, such as forward stepwise regression and decision trees, and also (b) supervised learning techniques which use soft constraints to minimize the number of features, such as sparse support vector machines.
+%%\vspace{0.3cm}**Develop algorithms that find genetic markers for anatomical regions**
+%%\vspace{0.3cm}****
+
+
+=== Develop algorithms that find genetic markers for anatomical regions ===
+
+%%\vspace{0.3cm}**Scoring measures and feature selection** 
+
+%%We will develop scoring methods for evaluating how good individual genes are at marking areas. We will compare pointwise, geometric, and information-theoretic measures. We already developed one entirely new scoring method (gradient similarity), but we may develop more. Scoring measures that we will explore will include the L1 norm, correlation, expression energy ratio, conditional entropy, gradient similarity, Jaccard similarity, Dice similarity, Hough transform, and statistical tests such as Hotelling's T-square test (a multivariate generalization of Student's t-test), ANOVA, and a multivariate version of the Mann-Whitney U test (a non-parametric test). 
+
+We will develop scoring methods for evaluating how good individual genes are at marking areas. We will compare pointwise, geometric, and information-theoretic measures. We already developed one entirely new scoring method (gradient similarity), but we may develop more. Scoring measures that we will explore will include the L1 norm, correlation, expression energy ratio, conditional entropy, gradient similarity, Jaccard similarity, Dice similarity, Hough transform, and statistical tests such as Student's t-test, and the Mann-Whitney U test (a non-parametric test). In addition, any predictive procedure induces a scoring measure on genes by taking the prediction error when using that gene to predict the target. 
+
+Using some combination of these measures, we will develop a procedure to find single marker genes for anatomical regions: for each cortical area, we will rank the genes by their ability to delineate each area.
+
+Some cortical areas have no single marker genes but can be identified by combinatorial coding. This requires multivariate scoring measures and feature selection procedures. Many of the measures, such as expression energy, gradient similarity, Jaccard, Dice, Hough, Student's t, and Mann-Whitney U are univariate. We will extend these scoring measures for use in multivariate feature selection, that is, for scoring how well combinations of genes, rather than individual genes, can distinguish a target area. There are existing multivariate forms of some of the univariate scoring measures, for example, Hotelling's T-square is a multivariate analog of Student's t. 
+
+We will develop a feature selection procedure for choosing the best small set of marker genes for a given anatomical area. In addition to using the scoring measures that we develop, we will also explore (a) feature selection using a stepwise wrapper over "vanilla" predictive methods such as logistic regression, (b) predictive methods such as decision trees which incrementally/greedily combine single gene markers into sets, and (c) predictive methods which use soft constraints to minimize number of features used, such as sparse support vector machines. 
+
+todo
+
+Some of these methods, such as the Hough transform, are designed to be resistant to registration error and error in the anatomical map. 
+
+We will also consider extensions to scoring measures that may improve their robustness to registration error and to error in the anatomical map; for example, a wrapper that runs a scoring method on small displacements and distortions of the data adds robustness to registration error at the expense of computation time. It is possible that some areas in the anatomical map do not correspond to natural domains of gene expression.
+
-# Linear discriminant analysis
+
+A future publication on the method that we develop in Aim 1 will review the scoring measures and quantitatively compare their performance in order to provide a foundation for future research of methods of marker gene finding. We will measure the robustness of the scoring measures as well as their absolute performance on our dataset.
+
@@ -493,6 +530,7 @@
+
@@ -508,6 +546,7 @@
+# Linear discriminant analysis
@@ -520,7 +559,7 @@
-Using the methods developed in Aim 1, we will present, for each cortical area, a short list of markers to identify that area; and we will also present lists of "panels" of genes that can be used to delineate many areas at once. Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify and explain how the statistical structure in the gene expression data led to any unexpected or interesting features of these maps.
+Using the methods developed in Aim 1, we will present, for each cortical area, a short list of markers to identify that area; and we will also present lists of "panels" of genes that can be used to delineate many areas at once. Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify and explain how the statistical structure in the gene expression data led to any unexpected or interesting features of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of areas, which are discovered.
author	bshanks@bshanks.dyndns.org
date	Tue Apr 21 14:28:12 2009 -0700 (16 years ago)
parents	7c5d98f0cd5a
children	9f36acf8d9a8