cg

diff grant.txt @ 97:1849a5bd1ce9
.
author: bshanks@bshanks.dyndns.org
date: Wed Apr 22 05:27:25 2009 -0700 (16 years ago)
parents: a25a60a4bf43
children: a75c226cbdd6
--- a/grant.txt	Tue Apr 21 18:53:40 2009 -0700
+++ b/grant.txt	Wed Apr 22 05:27:25 2009 -0700
@@ -1,9 +1,30 @@
-\documentclass{nih-blank}
+\documentclass[11pt]{nih-blank}
+
+
+%%\renewcommand{\rmdefault}{phv} %% Arial
+%%\renewcommand{\sfdefault}{phv} %% Arial
+
+%%\usepackage[T1]{fontenc}
+%%\usepackage[scaled]{uarial}
+
+%%  \fontencoding{T1}
+%%  \fontfamily{garamond}
+
+%%  \fontseries{m}
+%%  \fontshape{it}
+
+%%  \fontfamily{arial}
+%%  \fontsize{11}{15}
+%%  \selectfont
+
+\begin{document}
+
+
@@ -463,7 +484,7 @@
-We have applied the following dimensionality reduction algorithms to reduce the dimensionality of the gene expression profile associated with each voxel: Principal Components Analysis (PCA), Simple PCA (SPCA), Multi-Dimensional Scaling (MDS), Isomap, Landmark Isomap, Laplacian eigenmaps, Local Tangent Space Alignment (LTSA), Hessian locally linear embedding, Diffusion maps, Stochastic Neighbor Embedding (SNE), Stochastic Proximity Embedding (SPE), Fast Maximum Variance Unfolding (FastMVU), Non-negative Matrix Factorization (NNMF). Space constraints prevent us from showing many of the results, but as a sample, PCA, NNMF, and landmark Isomap are shown in the first, second, and third rows of Figure \ref{dimReduc}.
+We have applied the following dimensionality reduction algorithms to reduce the dimensionality of the gene expression profile associated with each pixel: Principal Components Analysis (PCA), Simple PCA (SPCA), Multi-Dimensional Scaling (MDS), Isomap, Landmark Isomap, Laplacian eigenmaps, Local Tangent Space Alignment (LTSA), Stochastic Proximity Embedding (SPE), Fast Maximum Variance Unfolding (FastMVU), Non-negative Matrix Factorization (NNMF). Space constraints prevent us from showing many of the results, but as a sample, PCA, NNMF, and landmark Isomap are shown in the first, second, and third rows of Figure \ref{dimReduc}.
@@ -475,7 +496,7 @@
-We also clustered the genes using gradient similarity to see if the spatial regions defined by any clusters matched known anatomical regions. Figure \ref{geneClusters} shows, for ten sample gene clusters, each cluster's average expression pattern, compared to a known anatomical boundary. This suggests that it is worth attempting to cluster genes, and then to use the results to cluster voxels.
+We also clustered the genes using gradient similarity to see if the spatial regions defined by any clusters matched known anatomical regions. Figure \ref{geneClusters} shows, for ten sample gene clusters, each cluster's average expression pattern, compared to a known anatomical boundary. This suggests that it is worth attempting to cluster genes, and then to use the results to cluster pixels.
@@ -504,10 +525,8 @@
-%%\vspace{0.3cm}**Scoring measures and feature selection** 
-
+\vspace{0.3cm}**Scoring measures and feature selection** 
-
@@ -519,50 +538,55 @@
-An area may be difficult to identify because the boundaries are misdrawn in the atlas, or because the shape of the natural domain of gene expression corresponding to the area is different from the shape of the area as recognized by anatomists. We will extend our procedure to handle difficult areas by combining areas or redrawing their boundaries. We will develop extensions to our procedure which (a) detect when a difficult area could be fit if its boundary were redrawn slightly, and (b) detect when a difficult area could be combined with adjacent areas to create a larger area which can be fit.
+An area may be difficult to identify because the boundaries are misdrawn in the atlas, or because the shape of the natural domain of gene expression corresponding to the area is different from the shape of the area as recognized by anatomists. We will extend our procedure to handle difficult areas by combining areas or redrawing their boundaries. We will develop extensions to our procedure which (a) detect when a difficult area could be fit if its boundary were redrawn slightly\footnote{Not just any redrawing is acceptable, only those which appear to be justified as a natural spatial domain of gene expression by multiple sources of evidence. Interestingly, the need to detect "natural spatial domains of gene expression" in a data-driven fashion means that the methods of Aim 2 might be useful in achieving Aim 1, as well -- particularly discriminative dimensionality reduction.}, and (b) detect when a difficult area could be combined with adjacent areas to create a larger area which can be fit.
-
-\vspace{0.3cm}**Application to cortical areas**
-
-
-
-# confirm with EMAGE, GeneAtlas, GENSAT, etc, to fight overfitting, two hemis
-
-
-\vspace{0.3cm}**Develop algorithms to suggest a division of a structure into anatomical parts**
-
-# Explore dimensionality reduction algorithms applied to pixels: including TODO
-# Explore dimensionality reduction algorithms applied to genes: including TODO
-# Explore clustering algorithms applied to pixels: including TODO
-# Explore clustering algorithms applied to genes: including gene shaving\cite{hastie_gene_2000}, TODO
-# Develop an algorithm to use dimensionality reduction and/or hierarchial clustering to create anatomical maps
-# Run this algorithm on the cortex: present a hierarchial, genoarchitectonic map of the cortex
-
-# Linear discriminant analysis
-
-# jbt, coclustering
-
-# self-organizing map
-
-# Linear discriminant analysis
-
-
-# compare using clustering scores
-
-# multivariate gradient similarity
-
-# deep belief nets
-
-
-
-\vspace{0.3cm}**Apply these algorithms to the cortex**
-
-Using the methods developed in Aim 1, we will present, for each cortical area, a short list of markers to identify that area; and we will also present lists of "panels" of genes that can be used to delineate many areas at once. Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify and explain how the statistical structure in the gene expression data led to any unexpected or interesting features of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of areas, which are discovered.
+
+
+=== Develop algorithms to suggest a division of a structure into anatomical parts ===
+
+\vspace{0.3cm}**Explore dimensionality reduction on gene expression profiles**
+We have already described the application of ten dimensionality reduction algorithms for the purpose of replacing the gene expression profiles, which are vectors of about 4000 gene expression levels, with a smaller number of features. We plan to further explore and interpret these results, as well as to apply other unsupervised learning algorithms, including independent components analysis, self-organizing maps, and generative models such as deep Boltzmann machines. We will explore ways to quantitatively compare the relevance of the different dimensionality reduction methods for identifying cortical areal boundaries.
+
+\vspace{0.3cm}**Explore dimensionality reduction on pixels**
+Instead of applying dimensionality reduction to the gene expression profiles, the same techniques can be applied instead to the pixels\footnote{Consider a matrix whose rows represent pixel locations, and whose columns represent genes. An entry in this matrix represents the gene expression level at a given pixel. One can look at this matrix as a collection of pixels, each corresponding to a vector of many gene expression levels; or one can look at it as a collection of genes, each corresponding to a vector giving that gene's expression at each pixel. Similarly, dimensionality reduction can be used to replace a large number of genes with a small number of features, or it can be used to replace a large number of pixels with a small number of features.}. It is possible that the features generated in this way by some dimensionality reduction techniques will directly correspond to interesting spatial regions.
+
+
+\vspace{0.3cm}**Explore clustering and segmentation algorithms on pixels**
+We will explore clustering and segmentation algorithms in order to segment the pixels into regions. We will explore k-means, spectral clustering, gene shaving\cite{hastie_gene_2000}, recursive division clustering, multivariate generalizations of edge detectors, multivariate generalizations of watershed transformations, region growing, active contours, graph partitioning methods, and recursive agglomerative clustering with various linkage functions. These methods can be combined with dimensionality reduction.
+
+\vspace{0.3cm}**Explore clustering on genes**
+We have already shown that the procedure of clustering genes according to gradient similarity, and then creating an averaged prototype of each cluster's expression pattern, yields some spatial patterns which match cortical areas. We will further explore the clustering of genes.
+
+In addition to using the cluster expression prototypes directly to identify spatial regions, this might be useful as a component of dimensionality reduction. For example, one could imagine clustering similar genes and then replacing their expression levels with a single average expression level, thereby removing some redundancy from the gene expression profiles. One could then perform clustering on pixels (possibly after a second dimensionality reduction step) in order to identify spatial regions. It remains to be seen whether removal of redundancy would help or hurt the ultimate goal of identifying interesting spatial regions.
+
+\vspace{0.3cm}**Explore co-clustering**
+There are some algorithms which simultaineously incorporate clustering on instances and on features (in our case, genes and pixels), for example, IRM\cite{kemp_learning_2006}. These are called co-clustering or biclustering algorithms.
+
+
+
+
+\vspace{0.3cm}**Quantitatively compare different methods**
+In order to tell which method is best for genomic anatomy, for each experimental method we will compare the cortical map found by unsupervised learning to a cortical map derived from the Allen Reference Atlas. In order to compare the experimental clustering with the reference clustering, we will explore various quantitative metrics that purport to measure how similar two clusterings are, such as Jaccard, Rand index, Fowlkes-Mallows, variation of information, Larsen, Van Dongen, and others.
+
+
+\vspace{0.3cm}**Discriminative dimensionality reduction**
+In addition to using a purely data-driven approach to identify spatial regions, it might be useful to see how well the known regions can be reconstructed from a small number of features, even if those features are chosen by using knowledge of the regions. For example, linear discriminant analysis could be used as a dimensionality reduction technique in order to identify a few features which are the best linear summary of gene expression profiles for the purpose of discriminating between regions. This reduced feature set could then be used to cluster pixels into regions. Perhaps the resulting clusters will be similar to the reference atlas, yet more faithful to natural spatial domains of gene expression than the reference atlas is.
+
+
+=== Apply the new methods to the cortex ===
+Using the methods developed in Aim 1, we will present, for each cortical area, a short list of markers to identify that area; and we will also present lists of "panels" of genes that can be used to delineate many areas at once. 
+
+Because in most cases the ABA coronal dataset only contains one ISH per gene, it is possible for an unrelated combination of genes to seem to identify an area when in fact it is only coincidence. There are two ways we will validate our marker genes to guard against this. First, we will confirm that putative combinations of marker genes express the same pattern in both hemispheres. Second, we will manually validate our final results on other gene expression datasets such as EMAGE, GeneAtlas, and GENSAT.
+
+Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify and explain how the statistical structure in the gene expression data led to any unexpected or interesting features of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of areas, which are discovered.
+
+
+
@@ -573,22 +597,21 @@
-=== Finding marker genes ===
-
-* September-November 2009: Develop an automated mechanism for segmenting the cortical voxels into layers
-* November 2009 (milestone): Have completed construction of a flatmapped, cortical dataset with information for each layer
-* October 2009-April 2010: Develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised learning frameworks which use multivariate versions of the best scoring methods.
-* January 2010 (milestone): Submit a publication on single marker genes for cortical areas
-* February-July 2010: Continue to develop scoring methods and supervised learning frameworks. Explore the best way to integrate radial profiles with supervised learning. Explore the best way to make supervised learning techniques robust against incorrect labels (i.e. when the areas drawn on the input cortical map are slightly off). Quantitatively compare the performance of different supervised learning techniques. Validate marker genes found in the ABA dataset by checking against other gene expression datasets. Create documentation and unit tests for software toolbox for Aim 1. Respond to user bug reports for Aim 1 software toolbox.
-* June 2010 (milestone): Submit a paper describing a method fulfilling Aim 1. Release toolbox.
-* July 2010 (milestone): Submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
-
-=== Revealing new ways to parcellate a structure into regions ===
-* June 2010-March 2011: Explore dimensionality reduction algorithms for Aim 2. Explore standard hierarchial clustering algorithms, used in combination with dimensionality reduction, for Aim 2. Explore co-clustering algorithms. Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial profile information. Quantitatively compare the performance of different dimensionality reduction and clustering techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
-* March 2011 (milestone): Submit a paper describing a method fulfilling Aim 2. Release toolbox.
-* February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, read the literature and talk to people to learn about research related to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 2 software toolbox.
-* May 2011 (milestone): Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
-* May-August 2011: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Follow up on responses to our papers. Possibly submit another paper.
+\vspace{0.3cm}**Finding marker genes**
+\\ **September-November 2009**: Develop an automated mechanism for segmenting the cortical voxels into layers
+\\ **November 2009 (milestone)**: Have completed construction of a flatmapped, cortical dataset with information for each layer
+\\ **October 2009-April 2010**: Develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised learning frameworks which use multivariate versions of the best scoring methods.
+\\ **January 2010 (milestone)**: Submit a publication on single marker genes for cortical areas
+\\ **February-July 2010**: Continue to develop scoring methods and supervised learning frameworks. Explore the best way to integrate radial profiles with supervised learning. Explore the best way to make supervised learning techniques robust against incorrect labels (i.e. when the areas drawn on the input cortical map are slightly off). Quantitatively compare the performance of different supervised learning techniques. Validate marker genes found in the ABA dataset by checking against other gene expression datasets. Create documentation and unit tests for software toolbox for Aim 1. Respond to user bug reports for Aim 1 software toolbox.
+\\ **June 2010 (milestone)**: Submit a paper describing a method fulfilling Aim 1. Release toolbox.
+\\ **July 2010 (milestone)**: Submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
+
+\vspace{0.3cm}**Revealing new ways to parcellate a structure into regions**
+\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms for Aim 2. Explore standard hierarchial clustering algorithms, used in combination with dimensionality reduction, for Aim 2. Explore co-clustering algorithms. Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial profile information. Quantitatively compare the performance of different dimensionality reduction and clustering techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
+\\ **March 2011 (milestone)**: Submit a paper describing a method fulfilling Aim 2. Release toolbox.
+\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, read the literature and talk to people to learn about research related to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 2 software toolbox.
+\\ **May 2011 (milestone)**: Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
+\\ **May-August 2011**: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Follow up on responses to our papers. Possibly submit another paper.
@@ -603,3 +626,4 @@
+\end{document}
author	bshanks@bshanks.dyndns.org
date	Wed Apr 22 05:27:25 2009 -0700 (16 years ago)
parents	a25a60a4bf43
children	a75c226cbdd6