cg
changeset 99:a48955c639d4
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Wed Apr 22 06:43:51 2009 -0700 (16 years ago) |
parents | a75c226cbdd6 |
children | fa7c0a924e7a |
files | grant.doc grant.html grant.odt grant.pdf grant.txt |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Wed Apr 22 06:23:09 2009 -0700
2.2 +++ b/grant.html Wed Apr 22 06:43:51 2009 -0700
2.3 @@ -77,7 +77,7 @@
2.4 calculate a voxel’s sub-score, then we say it is a local scoring method. If only information from the voxel itself is
2.5 used to calculate a voxel’s sub-score, then we say it is a pointwise scoring method.
2.6 Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects
2.7 -have idiosyncratic anatomy. Subjects may be improperly registred to the atlas. The method used to measure
2.8 +have idiosyncratic anatomy. Subjects may be improperly registered to the atlas. The method used to measure
2.9 gene expression may be noisy. The atlas may have errors. It is even possible that some areas in the anatomical
2.10 atlas are “wrong” in that they do not have the same shape as the natural domains of gene expression to which
2.11 they correspond. These sources of error can affect the displacement and the shape of both the gene expression
2.12 @@ -175,7 +175,7 @@
2.13 gene, and then, for each anatomical structure of interest, computing
2.14 what proportion of this structure is covered by the gene’s spatial region.
2.15 GeneAtlas[5] and EMAGE [26] allow the user to construct a search
2.16 - query by demarcating regions and then specifing either the strength of
2.17 + query by demarcating regions and then specifying either the strength of
2.18 expression or the name of another gene or dataset whose expression
2.19 pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to
2.20 search for combinations of genes that define a region in concert but not separately.
2.21 @@ -195,8 +195,8 @@
2.22 one.
2.23 [10 ] describes a technique to find combinations of marker genes to pick out an anatomical region. They use
2.24 an evolutionary algorithm to evolve logical operators which combine boolean (thresholded) images in order to
2.25 -match a target image. Their match score is Jaccard similarity.
2.26 -_________________________________________
2.27 +match a target image.
2.28 +_____________________
2.29 2By “fundamentally spatial” we mean that there is information from a large number of spatial locations indexed by spatial coordinates;
2.30 not just data which have only a few different locations or which is indexed by anatomical label.
2.31 In summary, there has been fruitful work on finding marker genes, but only one of the previous projects
2.32 @@ -226,8 +226,8 @@
2.33 into clusters of voxels with similar gene expression.
2.34 It is desirable to determine not just one set of regions, but also how
2.35 these regions relate to each other. The outcome of clustering may be
2.36 - a hierarchial tree of clusters, rather than a single set of clusters which
2.37 -partition the voxels. This is called hierarchial clustering.
2.38 + a hierarchical tree of clusters, rather than a single set of clusters which
2.39 +partition the voxels. This is called hierarchical clustering.
2.40 Similarity scores A crucial choice when designing a clustering method is how to measure similarity, across
2.41 either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature
2.42 selection (discussed above under Aim 1) and scoring methods for similarity.
2.43 @@ -290,7 +290,7 @@
2.44 feature for each gene cluster.
2.45 Gene clusters could also be used to directly yield a clustering on
2.46 instances. This is because many genes have an expression pattern
2.47 - which seems to pick out a single, spatially continguous region. This
2.48 + which seems to pick out a single, spatially contiguous region. This
2.49 suggests the following procedure: cluster together genes which pick
2.50 out similar regions, and then to use the more popular common regions
2.51 as the final clusters. In Preliminary Studies, Figure 7, we show that a
2.52 @@ -306,31 +306,30 @@
2.53 [23] describes an analysis of the anatomy of the hippocampus us-
2.54 ing the ABA dataset. In addition to manual analysis, two clustering
2.55 methods were employed, a modified Non-negative Matrix Factoriza-
2.56 - tion (NNMF), and a hierarchial recursive bifurcation clustering scheme
2.57 + tion (NNMF), and a hierarchical recursive bifurcation clustering scheme
2.58 based on correlation as the similarity score. The paper yielded impres-
2.59 sive results, proving the usefulness of computational genomic anatomy.
2.60 We have run NNMF on the cortical dataset
2.61 - AGEA[15] includes a preset hierarchial clustering of voxels based
2.62 + AGEA[15] includes a preset hierarchical clustering of voxels based
2.63 on a recursive bifurcation algorithm with correlation as the similarity
2.64 metric. EMAGE[26] allows the user to select a dataset from among a
2.65 large number of alternatives, or by running a search query, and then to
2.66 - cluster the genes within that dataset. EMAGE clusters via hierarchial
2.67 - complete linkage clustering with un-centred correlation as the similarity
2.68 - score.
2.69 - [6] clustered genes. For each cluster, prototypical spatial expres-
2.70 - sion patterns were created by averaging the genes in the cluster. The
2.71 - prototypes were analyzed manually, without clustering voxels.
2.72 + cluster the genes within that dataset. EMAGE clusters via hierarchical
2.73 + complete linkage clustering.
2.74 + [6] clusters genes. For each cluster, prototypical spatial expression
2.75 + patterns were created by averaging the genes in the cluster. The pro-
2.76 + totypes were analyzed manually, without clustering voxels.
2.77 [10] applies their technique for finding combinations of marker
2.78 genes for the purpose of clustering genes around a “seed gene”.
2.79 In summary, although these projects obtained clusterings, there has
2.80 not been much comparison between different algorithms or scoring
2.81 methods, so it is likely that the best clustering method for this appli-
2.82 -cation has not yet been found. The projects using gene expression on cortex did not attempt to make use of
2.83 + cation has not yet been found. The projects using gene expression on
2.84 +cortex did not attempt to make use of the radial profile of gene expression. Also, none of these projects did a
2.85 _________________________________________
2.86 5A radial profile is a profile along a line perpendicular to the cortical surface.
2.87 -the radial profile of gene expression. Also, none of these projects did a separate dimensionality reduction step
2.88 -before clustering pixels, none tried to cluster genes first in order to guide automated clustering of pixels into
2.89 -spatial regions, and none used co-clustering algorithms.
2.90 +separate dimensionality reduction step before clustering pixels, none tried to cluster genes first in order to guide
2.91 +automated clustering of pixels into spatial regions, and none used co-clustering algorithms.
2.92 Aim 3: apply the methods developed to the cerebral cortex
2.93
2.94
2.95 @@ -400,11 +399,11 @@
2.96 of correlations between voxel gene expression profiles within a handful of cortical areas. However, this sort
2.97 of analysis is not related to either of our aims, as it neither finds marker genes, nor does it suggest a cortical
2.98 map based on gene expression data. Neither of the other components of AGEA can be applied to cortical
2.99 -areas; AGEA’s Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA’s hierarchial
2.100 +areas; AGEA’s Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA’s hierarchical
2.101 clustering does not produce clusters corresponding to the cortical areas8.
2.102 In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes,
2.103 (b) there has been almost no comparison of different algorithms or scoring methods, and (c) there has been no
2.104 -work on computationally finding marker genes for cortical areas, or on finding a hierarchial clustering that will
2.105 +work on computationally finding marker genes for cortical areas, or on finding a hierarchical clustering that will
2.106 yield a map of cortical areas de novo from gene expression data.
2.107 Our project is guided by a concrete application with a well-specified criterion of success (how well we can
2.108 find marker genes for / reproduce the layout of cortical areas), which will provide a solid basis for comparing
2.109 @@ -414,7 +413,7 @@
2.110 Figure 7: Prototypes corresponding to sample gene
2.111 clusters, clustered by gradient similarity. Region bound-
2.112 aries for the region that most matches each prototype
2.113 -are overlayed. The method developed in aim (1) will be applied to
2.114 +are overlaid. The method developed in aim (1) will be applied to
2.115 each cortical area to find a set of marker genes such
2.116 that the combinatorial expression pattern of those
2.117 genes uniquely picks out the target area. Finding
2.118 @@ -529,7 +528,7 @@
2.119 cortical areas, while also validating the relevancy of our new scoring method, gradient similarity.
2.120 Combinations of multiple genes are useful and necessary for some areas
2.121 In Figure 4, we give an example of a cortical area which is not marked by any single gene, but which
2.122 -can be identified combinatorially. Acccording to logistic regression, gene wwc1 is the best fit single gene for
2.123 +can be identified combinatorially. According to logistic regression, gene wwc1 is the best fit single gene for
2.124 predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left
2.125 picture in Figure 4 shows wwc1’s spatial expression pattern over the cortex. The lower-right boundary of MO is
2.126 represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D
2.127 @@ -542,7 +541,7 @@
2.128 necessary.
2.129 Multivariate supervised learning
2.130 Forward stepwise logistic regression Logistic regression is a popular method for predictive modeling of cate-
2.131 -gorial data. As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise
2.132 +gorical data. As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise
2.133 logistic regression to find single genes, pairs of genes, and triplets of genes which predict areal identify. This is
2.134 an example of feature selection integrated with prediction using a stepwise wrapper. Some of the single genes
2.135 found were shown in various figures throughout this document, and Figure 4 shows a combination of genes
2.136 @@ -565,7 +564,7 @@
2.137 shown in the last row of Figure 6. To compare, the leftmost picture on the bottom row of Figure 6 shows some
2.138 of the major subdivisions of cortex. These results clearly show that different dimensionality reduction techniques
2.139 capture different aspects of the data and lead to different clusterings, indicating the utility of our proposal to
2.140 -produce a detailed comparion of these techniques as applied to the domain of genomic anatomy.
2.141 +produce a detailed comparison of these techniques as applied to the domain of genomic anatomy.
2.142 Many areas are captured by clusters of genes We also clustered the genes using gradient similarity to
2.143 see if the spatial regions defined by any clusters matched known anatomical regions. Figure 7 shows, for ten
2.144 sample gene clusters, each cluster’s average expression pattern, compared to a known anatomical boundary.
2.145 @@ -605,7 +604,7 @@
2.146 selection using a stepwise wrapper over “vanilla” classifiers such as logistic regression, (b) supervised learning
2.147 methods such as decision trees which incrementally/greedily combine single gene markers into sets, and (c)
2.148 supervised learning methods which use soft constraints to minimize number of features used, such as sparse
2.149 -support vector machines.
2.150 +support vector machines (SVMs).
2.151 Since errors of displacement and of shape may cause genes and target areas to match less than they should,
2.152 we will consider the robustness of feature selection methods in the presence of error. Some of these methods,
2.153 such as the Hough transform, are designed to be resistant in the presence of error, but many are not. We will
2.154 @@ -644,7 +643,7 @@
2.155 to quantitatively compare the relevance of the different dimensionality reduction methods for identifying cortical
2.156 areal boundaries.
2.157 Dimensionality reduction on pixels Instead of applying dimensionality reduction to the gene expression
2.158 -profiles, the same techniques can be applied instead to the pixels13. It is possible that the features generated in
2.159 +profiles, the same techniques can be applied instead to the pixels. It is possible that the features generated in
2.160 this way by some dimensionality reduction techniques will directly correspond to interesting spatial regions.
2.161 Clustering and segmentation on pixels We will explore clustering and segmentation algorithms in order to
2.162 segment the pixels into regions. We will explore k-means, spectral clustering, gene shaving[9], recursive division
2.163 @@ -660,7 +659,7 @@
2.164 the gene expression profiles. One could then perform clustering on pixels (possibly after a second dimensionality
2.165 reduction step) in order to identify spatial regions. It remains to be seen whether removal of redundancy would
2.166 help or hurt the ultimate goal of identifying interesting spatial regions.
2.167 -Co-clustering There are some algorithms which simultaineously incorporate clustering on instances and on
2.168 +Co-clustering There are some algorithms which simultaneously incorporate clustering on instances and on
2.169 features (in our case, genes and pixels), for example, IRM[11]. These are called co-clustering or biclustering
2.170 algorithms.
2.171 Radial profiles We wil explore the use of the radial profile of gene expression under each pixel.
2.172 @@ -675,12 +674,6 @@
2.173 best linear summary of gene expression profiles for the purpose of discriminating between regions. This reduced
2.174 feature set could then be used to cluster pixels into regions. Perhaps the resulting clusters will be similar to the
2.175 reference atlas, yet more faithful to natural spatial domains of gene expression than the reference atlas is.
2.176 -_________________________________________
2.177 - 13Consider a matrix whose rows represent pixel locations, and whose columns represent genes. An entry in this matrix represents the
2.178 -gene expression level at a given pixel. One can look at this matrix as a collection of pixels, each corresponding to a vector of many gene
2.179 -expression levels; or one can look at it as a collection of genes, each corresponding to a vector giving that gene’s expression at each
2.180 -pixel. Similarly, dimensionality reduction can be used to replace a large number of genes with a small number of features, or it can be
2.181 -used to replace a large number of pixels with a small number of features.
2.182 Apply the new methods to the cortex
2.183 Using the methods developed in Aim 1, we will present, for each cortical area, a short list of markers to identify
2.184 that area; and we will also present lists of “panels” of genes that can be used to delineate many areas at once.
2.185 @@ -689,7 +682,7 @@
2.186 validate our marker genes to guard against this. First, we will confirm that putative combinations of marker genes
2.187 express the same pattern in both hemispheres. Second, we will manually validate our final results on other gene
2.188 expression datasets such as EMAGE, GeneAtlas, and GENSAT[8].
2.189 -Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify
2.190 +Using the methods developed in Aim 2, we will present one or more hierarchical cortical maps. We will identify
2.191 and explain how the statistical structure in the gene expression data led to any unexpected or interesting features
2.192 of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of
2.193 areas, which are discovered.
2.194 @@ -698,35 +691,26 @@
2.195 September-November 2009: Develop an automated mechanism for segmenting the cortical voxels into layers
2.196 November 2009 (milestone): Have completed construction of a flatmapped, cortical dataset with information
2.197 for each layer
2.198 -October 2009-April 2010: Develop scoring methods and to test them in various supervised learning frameworks.
2.199 -Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend
2.200 -supervised learning frameworks which use multivariate versions of the best scoring methods.
2.201 +October 2009-April 2010: Develop scoring methods, dimensionality reduction, and supervised learning meth-
2.202 +ods.
2.203 January 2010 (milestone): Submit a publication on single marker genes for cortical areas
2.204 -February-July 2010: Continue to develop scoring methods and supervised learning frameworks. Explore the
2.205 -best way to integrate radial profiles with supervised learning. Explore the best way to make supervised learning
2.206 -techniques robust against incorrect labels (i.e. when the areas drawn on the input cortical map are slightly
2.207 -off). Quantitatively compare the performance of different supervised learning techniques. Validate marker genes
2.208 -found in the ABA dataset by checking against other gene expression datasets. Create documentation and unit
2.209 -tests for software toolbox for Aim 1. Respond to user bug reports for Aim 1 software toolbox.
2.210 +February-July 2010: Continue to develop scoring methods and supervised learning frameworks. Extend tech-
2.211 +niques for robustness. Compare the performance of techniques. Validate marker genes. Prepare software
2.212 +toolbox for Aim 1.
2.213 June 2010 (milestone): Submit a paper describing a method fulfilling Aim 1. Release toolbox.
2.214 July 2010 (milestone): Submit a paper describing combinations of marker genes for each cortical area, and a
2.215 small number of marker genes that can, in combination, define most of the areas at once
2.216 Revealing new ways to parcellate a structure into regions
2.217 -June 2010-March 2011: Explore dimensionality reduction algorithms for Aim 2. Explore standard hierarchial
2.218 -clustering algorithms, used in combination with dimensionality reduction, for Aim 2. Explore co-clustering algo-
2.219 -rithms. Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial
2.220 -profile information. Quantitatively compare the performance of different dimensionality reduction and clustering
2.221 -techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial
2.222 -profiles.
2.223 +June 2010-March 2011: Explore dimensionality reduction algorithms for Aim 2. Explore clustering algorithms.
2.224 +Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
2.225 March 2011 (milestone): Submit a paper describing a method fulfilling Aim 2. Release toolbox.
2.226 February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If
2.227 -new ways of organizing the cortex into areas are discovered, read the literature and talk to people to learn about
2.228 -research related to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2.
2.229 -Respond to user bug reports for Aim 2 software toolbox.
2.230 +new ways of organizing the cortex into areas are discovered, interpret the results. Prepare software toolbox for
2.231 +Aim 2.
2.232 May 2011 (milestone): Submit a paper on the genomic anatomy of the cortex, using the methods developed in
2.233 Aim 2
2.234 May-August 2011: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1.
2.235 -Follow up on responses to our papers. Possibly submit another paper.
2.236 +Possibly submit another paper.
2.237 Bibliography & References Cited
2.238 [1]Chris Adamson, Leigh Johnston, Terrie Inder, Sandra Rees, Iven Mareels, and Gary Egan. A Tracking
2.239 Approach to Parcellation of the Cerebral Cortex, volume Volume 3749/2005 of Lecture Notes in Computer
3.1 Binary file grant.odt has changed
4.1 Binary file grant.pdf has changed
5.1 --- a/grant.txt Wed Apr 22 06:23:09 2009 -0700
5.2 +++ b/grant.txt Wed Apr 22 06:43:51 2009 -0700
5.3 @@ -79,7 +79,7 @@
5.4
5.5 Although the classifier itself may only look at the gene expression data within each voxel before classifying that voxel, the algorithm which constructs the classifier may look over the entire dataset. We can categorize score-based feature selection methods depending on how the score of calculated. Often the score calculation consists of assigning a sub-score to each voxel, and then aggregating these sub-scores into a final score (the aggregation is often a sum or a sum of squares or average). If only information from nearby voxels is used to calculate a voxel's sub-score, then we say it is a __local scoring method__. If only information from the voxel itself is used to calculate a voxel's sub-score, then we say it is a __pointwise scoring method__.
5.6
5.7 -Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects have idiosyncratic anatomy. Subjects may be improperly registred to the atlas. The method used to measure gene expression may be noisy. The atlas may have errors. It is even possible that some areas in the anatomical atlas are "wrong" in that they do not have the same shape as the natural domains of gene expression to which they correspond. These sources of error can affect the displacement and the shape of both the gene expression data and the anatomical target areas. Therefore, it is important to use feature selection methods which are robust to these kinds of errors.
5.8 +Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects have idiosyncratic anatomy. Subjects may be improperly registered to the atlas. The method used to measure gene expression may be noisy. The atlas may have errors. It is even possible that some areas in the anatomical atlas are "wrong" in that they do not have the same shape as the natural domains of gene expression to which they correspond. These sources of error can affect the displacement and the shape of both the gene expression data and the anatomical target areas. Therefore, it is important to use feature selection methods which are robust to these kinds of errors.
5.9
5.10
5.11 === Our strategy for Aim 1 ===
5.12 @@ -146,7 +146,7 @@
5.13
5.14 \cite{lee_high-resolution_2007} mentions the possibility of constructing a spatial region for each gene, and then, for each anatomical structure of interest, computing what proportion of this structure is covered by the gene's spatial region.
5.15
5.16 -GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifing either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
5.17 +GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifying either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
5.18
5.19 %% \footnote{For the similiarity score (match score) between two images (in this case, the query and the gene expression images), GeneAtlas uses the sum of a weighted L1-norm distance between vectors whose components represent the number of cells within a pixel (actually, many of these projects use quadrilaterals instead of square pixels; but we will refer to them as pixels for simplicity) whose expression is within four discretization levels. EMAGE uses Jaccard similarity (the number of true pixels in the intersection of the two images, divided by the number of pixels in their union).}
5.20
5.21 @@ -164,7 +164,7 @@
5.22
5.23
5.24
5.25 -\cite{hemert_matching_2008} describes a technique to find combinations of marker genes to pick out an anatomical region. They use an evolutionary algorithm to evolve logical operators which combine boolean (thresholded) images in order to match a target image. Their match score is Jaccard similarity.
5.26 +\cite{hemert_matching_2008} describes a technique to find combinations of marker genes to pick out an anatomical region. They use an evolutionary algorithm to evolve logical operators which combine boolean (thresholded) images in order to match a target image. %%Their match score is Jaccard similarity.
5.27
5.28 In summary, there has been fruitful work on finding marker genes, but only one of the previous projects explores combinations of marker genes, and none of these publications compare the results obtained by using different algorithms or scoring methods.
5.29
5.30 @@ -188,9 +188,9 @@
5.31
5.32 The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same anatomical region have similar gene expression profiles, at least compared to the other regions. This means that clustering voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
5.33
5.34 -%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchial tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchial clustering.
5.35 -
5.36 -It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchial tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchial clustering.
5.37 +%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
5.38 +
5.39 +It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
5.40
5.41
5.42 \vspace{0.3cm}**Similarity scores**
5.43 @@ -231,7 +231,7 @@
5.44
5.45 Gene clusters could be used as part of dimensionality reduction: rather than have one feature for each gene, we could have one reduced feature for each gene cluster.
5.46
5.47 -Gene clusters could also be used to directly yield a clustering on instances. This is because many genes have an expression pattern which seems to pick out a single, spatially continguous region. This suggests the following procedure: cluster together genes which pick out similar regions, and then to use the more popular common regions as the final clusters. In Preliminary Studies, Figure \ref{geneClusters}, we show that a number of anatomically recognized cortical regions, as well as some "superregions" formed by lumping together a few regions, are associated with gene clusters in this fashion.
5.48 +Gene clusters could also be used to directly yield a clustering on instances. This is because many genes have an expression pattern which seems to pick out a single, spatially contiguous region. This suggests the following procedure: cluster together genes which pick out similar regions, and then to use the more popular common regions as the final clusters. In Preliminary Studies, Figure \ref{geneClusters}, we show that a number of anatomically recognized cortical regions, as well as some "superregions" formed by lumping together a few regions, are associated with gene clusters in this fashion.
5.49
5.50 %% Therefore, it seems likely that an anatomically interesting region will have multiple genes which each individually pick it out\footnote{This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes. However, it is possible that the currently accepted cortical maps divide the cortex into regions which are unnatural from the point of view of gene expression; perhaps there is some other way to map the cortex for which each region can be identified by single genes. Another possibility is that, although the cluster prototype fits an anatomical region, the individual genes are each somewhat different from the prototype.}.
5.51
5.52 @@ -249,20 +249,20 @@
5.53 \cite{thompson_genomic_2008} describes an analysis of the anatomy of
5.54 the hippocampus using the ABA dataset. In addition to manual analysis,
5.55 two clustering methods were employed, a modified Non-negative Matrix
5.56 -Factorization (NNMF), and a hierarchial recursive bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the usefulness of computational genomic anatomy. We have run NNMF on the cortical dataset
5.57 -
5.58 -%% \footnote{We ran "vanilla" NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was needed. The paper under discussion also mentions that they tried a hierarchial variant of NNMF, which we have not yet tried.} and while the results are promising, they also demonstrate that NNMF is not necessarily the best dimensionality reduction method for this application (see Preliminary Studies, Figure \ref{dimReduc}).
5.59 +Factorization (NNMF), and a hierarchical recursive bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the usefulness of computational genomic anatomy. We have run NNMF on the cortical dataset
5.60 +
5.61 +%% \footnote{We ran "vanilla" NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was needed. The paper under discussion also mentions that they tried a hierarchical variant of NNMF, which we have not yet tried.} and while the results are promising, they also demonstrate that NNMF is not necessarily the best dimensionality reduction method for this application (see Preliminary Studies, Figure \ref{dimReduc}).
5.62
5.63 %% In addition, this paper described a visual screening of the data, specifically, a visual analysis of 6000 genes with the primary purpose of observing how the spatial pattern of their expression coincided with the regions that had been identified by NNMF. We propose to do this sort of screening automatically, which would yield an objective, quantifiable result, rather than qualitative observations.
5.64
5.65 -%% \cite{thompson_genomic_2008} reports that both mNNMF and hierarchial mNNMF clustering were useful, and that hierarchial recursive bifurcation gave similar results.
5.66 -
5.67 -
5.68 -AGEA\cite{ng_anatomic_2009} includes a preset hierarchial clustering of voxels based on a recursive bifurcation algorithm with correlation as the similarity metric. EMAGE\cite{venkataraman_emage_2008} allows the user to select a dataset from among a large number of alternatives, or by running a search query, and then to cluster the genes within that dataset. EMAGE clusters via hierarchial complete linkage clustering with un-centred correlation as the similarity score.
5.69 +%% \cite{thompson_genomic_2008} reports that both mNNMF and hierarchical mNNMF clustering were useful, and that hierarchical recursive bifurcation gave similar results.
5.70 +
5.71 +
5.72 +AGEA\cite{ng_anatomic_2009} includes a preset hierarchical clustering of voxels based on a recursive bifurcation algorithm with correlation as the similarity metric. EMAGE\cite{venkataraman_emage_2008} allows the user to select a dataset from among a large number of alternatives, or by running a search query, and then to cluster the genes within that dataset. EMAGE clusters via hierarchical complete linkage clustering. %% with un-centered correlation as the similarity score.
5.73
5.74 %%\cite{chin_genome-scale_2007} clustered genes, starting out by selecting 135 genes out of 20,000 which had high variance over voxels and which were highly correlated with many other genes. They computed the matrix of (rank) correlations between pairs of these genes, and ordered the rows of this matrix as follows: "the first row of the matrix was chosen to show the strongest contrast between the highest and lowest correlation coefficient for that row. The remaining rows were then arranged in order of decreasing similarity using a least squares metric". The resulting matrix showed four clusters. For each cluster, prototypical spatial expression patterns were created by averaging the genes in the cluster. The prototypes were analyzed manually, without clustering voxels.
5.75
5.76 -\cite{chin_genome-scale_2007} clustered genes. For each cluster, prototypical spatial expression patterns were created by averaging the genes in the cluster. The prototypes were analyzed manually, without clustering voxels.
5.77 +\cite{chin_genome-scale_2007} clusters genes. For each cluster, prototypical spatial expression patterns were created by averaging the genes in the cluster. The prototypes were analyzed manually, without clustering voxels.
5.78
5.79 \cite{hemert_matching_2008} applies their technique for finding combinations of marker genes for the purpose of clustering genes around a "seed gene". %%They do this by using the pattern of expression of the seed gene as the target image, and then searching for other genes which can be combined to reproduce this pattern. Other genes which are found are considered to be related to the seed. The same team also describes a method\cite{van_hemert_mining_2007} for finding "association rules" such as, "if this voxel is expressed in by any gene, then that voxel is probably also expressed in by the same gene". This could be useful as part of a procedure for clustering voxels.
5.80
5.81 @@ -309,12 +309,12 @@
5.82
5.83 === Related work ===
5.84
5.85 -\cite{ng_anatomic_2009} describes the application of AGEA to the cortex. The paper describes interesting results on the structure of correlations between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either of our aims, as it neither finds marker genes, nor does it suggest a cortical map based on gene expression data. Neither of the other components of AGEA can be applied to cortical areas; AGEA's Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA's hierarchial clustering does not produce clusters corresponding to the cortical areas\footnote{In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas (there may be clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these). The reason that Gene Finder cannot the find marker genes for cortical areas is that, although the user chooses a seed voxel, Gene Finder chooses the ROI for which genes will be found, and it creates that ROI by (pairwise voxel correlation) clustering around the seed.}.
5.86 +\cite{ng_anatomic_2009} describes the application of AGEA to the cortex. The paper describes interesting results on the structure of correlations between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either of our aims, as it neither finds marker genes, nor does it suggest a cortical map based on gene expression data. Neither of the other components of AGEA can be applied to cortical areas; AGEA's Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA's hierarchical clustering does not produce clusters corresponding to the cortical areas\footnote{In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas (there may be clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these). The reason that Gene Finder cannot the find marker genes for cortical areas is that, although the user chooses a seed voxel, Gene Finder chooses the ROI for which genes will be found, and it creates that ROI by (pairwise voxel correlation) clustering around the seed.}.
5.87
5.88
5.89 %% Most of the projects which have been discussed have been done by the same groups that develop the public datasets. Although these projects make their algorithms available for use on their own website, none of them have released an open-source software toolkit; instead, users are restricted to using the provided algorithms only on their own dataset.
5.90
5.91 -In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes, (b) there has been almost no comparison of different algorithms or scoring methods, and (c) there has been no work on computationally finding marker genes for cortical areas, or on finding a hierarchial clustering that will yield a map of cortical areas de novo from gene expression data.
5.92 +In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes, (b) there has been almost no comparison of different algorithms or scoring methods, and (c) there has been no work on computationally finding marker genes for cortical areas, or on finding a hierarchical clustering that will yield a map of cortical areas de novo from gene expression data.
5.93
5.94 Our project is guided by a concrete application with a well-specified criterion of success (how well we can find marker genes for \begin{latex}/\end{latex} reproduce the layout of cortical areas), which will provide a solid basis for comparing different methods.
5.95
5.96 @@ -322,7 +322,7 @@
5.97 == Significance ==
5.98 \begin{wrapfigure}{L}{0.5\textwidth}\centering
5.99 \includegraphics[scale=.2]{cosine_similarity1_rearrange_colorize.eps}
5.100 -\caption{Prototypes corresponding to sample gene clusters, clustered by gradient similarity. Region boundaries for the region that most matches each prototype are overlayed.}
5.101 +\caption{Prototypes corresponding to sample gene clusters, clustered by gradient similarity. Region boundaries for the region that most matches each prototype are overlaid.}
5.102 \label{geneClusters}\end{wrapfigure}
5.103
5.104
5.105 @@ -443,14 +443,14 @@
5.106
5.107 \vspace{0.3cm}**Combinations of multiple genes are useful and necessary for some areas**
5.108
5.109 -In Figure \ref{MOcombo}, we give an example of a cortical area which is not marked by any single gene, but which can be identified combinatorially. Acccording to logistic regression, gene wwc1 is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the dorsal surface. Gene mtif2 is shown in the upper-right. Mtif2 captures MO's upper-left boundary, but not its lower-right boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these two figures, we get the lower-left image. This combination captures area MO much better than any single gene.
5.110 +In Figure \ref{MOcombo}, we give an example of a cortical area which is not marked by any single gene, but which can be identified combinatorially. According to logistic regression, gene wwc1 is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the dorsal surface. Gene mtif2 is shown in the upper-right. Mtif2 captures MO's upper-left boundary, but not its lower-right boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these two figures, we get the lower-left image. This combination captures area MO much better than any single gene.
5.111
5.112 This shows that our proposal to develop a method to find combinations of marker genes is both possible and necessary.
5.113
5.114 %% wwc1\footnote{"WW, C2 and coiled-coil domain containing 1"; EntrezGene ID 211652}
5.115 %% mtif2\footnote{"mitochondrial translational initiation factor 2"; EntrezGene ID 76784}
5.116
5.117 -%%Acccording to logistic regression, gene wwc1\footnote{"WW, C2 and coiled-coil domain containing 1"; EntrezGene ID 211652} is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the lateral surface.
5.118 +%%According to logistic regression, gene wwc1\footnote{"WW, C2 and coiled-coil domain containing 1"; EntrezGene ID 211652} is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the lateral surface.
5.119
5.120 %%Gene mtif2\footnote{"mitochondrial translational initiation factor 2"; EntrezGene ID 76784} is shown in figure the upper-right of Fig. \ref{MOcombo}. Mtif2 captures MO's upper-left boundary, but not its lower-right boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these two figures, we get the lower-left of Figure \ref{MOcombo}. This combination captures area MO much better than any single gene.
5.121
5.122 @@ -466,7 +466,7 @@
5.123
5.124
5.125 \vspace{0.3cm}**Forward stepwise logistic regression**
5.126 -Logistic regression is a popular method for predictive modeling of categorial data. As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise logistic regression to find single genes, pairs of genes, and triplets of genes which predict areal identify. This is an example of feature selection integrated with prediction using a stepwise wrapper. Some of the single genes found were shown in various figures throughout this document, and Figure \ref{MOcombo} shows a combination of genes which was found.
5.127 +Logistic regression is a popular method for predictive modeling of categorical data. As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise logistic regression to find single genes, pairs of genes, and triplets of genes which predict areal identify. This is an example of feature selection integrated with prediction using a stepwise wrapper. Some of the single genes found were shown in various figures throughout this document, and Figure \ref{MOcombo} shows a combination of genes which was found.
5.128
5.129 %%We felt that, for single genes, gradient similarity did a better job than logistic regression at capturing our subjective impression of a "good gene".
5.130
5.131 @@ -489,7 +489,7 @@
5.132
5.133
5.134
5.135 -After applying the dimensionality reduction, we ran clustering algorithms on the reduced data. To date we have tried k-means and spectral clustering. The results of k-means after PCA, NNMF, and landmark Isomap are shown in the last row of Figure \ref{dimReduc}. To compare, the leftmost picture on the bottom row of Figure \ref{dimReduc} shows some of the major subdivisions of cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparion of these techniques as applied to the domain of genomic anatomy.
5.136 +After applying the dimensionality reduction, we ran clustering algorithms on the reduced data. To date we have tried k-means and spectral clustering. The results of k-means after PCA, NNMF, and landmark Isomap are shown in the last row of Figure \ref{dimReduc}. To compare, the leftmost picture on the bottom row of Figure \ref{dimReduc} shows some of the major subdivisions of cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparison of these techniques as applied to the domain of genomic anatomy.
5.137
5.138
5.139
5.140 @@ -533,7 +533,7 @@
5.141
5.142 Some cortical areas have no single marker genes but can be identified by combinatorial coding. This requires multivariate scoring measures and feature selection procedures. Many of the measures, such as expression energy, gradient similarity, Jaccard, Dice, Hough, Student's t, and Mann-Whitney U are univariate. We will extend these scoring measures for use in multivariate feature selection, that is, for scoring how well combinations of genes, rather than individual genes, can distinguish a target area. There are existing multivariate forms of some of the univariate scoring measures, for example, Hotelling's T-square is a multivariate analog of Student's t.
5.143
5.144 -We will develop a feature selection procedure for choosing the best small set of marker genes for a given anatomical area. In addition to using the scoring measures that we develop, we will also explore (a) feature selection using a stepwise wrapper over "vanilla" classifiers such as logistic regression, (b) supervised learning methods such as decision trees which incrementally/greedily combine single gene markers into sets, and (c) supervised learning methods which use soft constraints to minimize number of features used, such as sparse support vector machines.
5.145 +We will develop a feature selection procedure for choosing the best small set of marker genes for a given anatomical area. In addition to using the scoring measures that we develop, we will also explore (a) feature selection using a stepwise wrapper over "vanilla" classifiers such as logistic regression, (b) supervised learning methods such as decision trees which incrementally/greedily combine single gene markers into sets, and (c) supervised learning methods which use soft constraints to minimize number of features used, such as sparse support vector machines (SVMs).
5.146
5.147 Since errors of displacement and of shape may cause genes and target areas to match less than they should, we will consider the robustness of feature selection methods in the presence of error. Some of these methods, such as the Hough transform, are designed to be resistant in the presence of error, but many are not. We will consider extensions to scoring measures that may improve their robustness; for example, a wrapper that runs a scoring method on small displacements and distortions of the data adds robustness to registration error at the expense of computation time.
5.148
5.149 @@ -552,8 +552,9 @@
5.150 We have already described the application of ten dimensionality reduction algorithms for the purpose of replacing the gene expression profiles, which are vectors of about 4000 gene expression levels, with a smaller number of features. We plan to further explore and interpret these results, as well as to apply other unsupervised learning algorithms, including independent components analysis, self-organizing maps, and generative models such as deep Boltzmann machines. We will explore ways to quantitatively compare the relevance of the different dimensionality reduction methods for identifying cortical areal boundaries.
5.151
5.152 \vspace{0.3cm}**Dimensionality reduction on pixels**
5.153 -Instead of applying dimensionality reduction to the gene expression profiles, the same techniques can be applied instead to the pixels\footnote{Consider a matrix whose rows represent pixel locations, and whose columns represent genes. An entry in this matrix represents the gene expression level at a given pixel. One can look at this matrix as a collection of pixels, each corresponding to a vector of many gene expression levels; or one can look at it as a collection of genes, each corresponding to a vector giving that gene's expression at each pixel. Similarly, dimensionality reduction can be used to replace a large number of genes with a small number of features, or it can be used to replace a large number of pixels with a small number of features.}. It is possible that the features generated in this way by some dimensionality reduction techniques will directly correspond to interesting spatial regions.
5.154 -
5.155 +Instead of applying dimensionality reduction to the gene expression profiles, the same techniques can be applied instead to the pixels. It is possible that the features generated in this way by some dimensionality reduction techniques will directly correspond to interesting spatial regions.
5.156 +
5.157 +%% \footnote{Consider a matrix whose rows represent pixel locations, and whose columns represent genes. An entry in this matrix represents the gene expression level at a given pixel. One can look at this matrix as a collection of pixels, each corresponding to a vector of many gene expression levels; or one can look at it as a collection of genes, each corresponding to a vector giving that gene's expression at each pixel. Similarly, dimensionality reduction can be used to replace a large number of genes with a small number of features, or it can be used to replace a large number of pixels with a small number of features.}
5.158
5.159 \vspace{0.3cm}**Clustering and segmentation on pixels**
5.160 We will explore clustering and segmentation algorithms in order to segment the pixels into regions. We will explore k-means, spectral clustering, gene shaving\cite{hastie_gene_2000}, recursive division clustering, multivariate generalizations of edge detectors, multivariate generalizations of watershed transformations, region growing, active contours, graph partitioning methods, and recursive agglomerative clustering with various linkage functions. These methods can be combined with dimensionality reduction.
5.161 @@ -564,7 +565,7 @@
5.162 In addition to using the cluster expression prototypes directly to identify spatial regions, this might be useful as a component of dimensionality reduction. For example, one could imagine clustering similar genes and then replacing their expression levels with a single average expression level, thereby removing some redundancy from the gene expression profiles. One could then perform clustering on pixels (possibly after a second dimensionality reduction step) in order to identify spatial regions. It remains to be seen whether removal of redundancy would help or hurt the ultimate goal of identifying interesting spatial regions.
5.163
5.164 \vspace{0.3cm}**Co-clustering**
5.165 -There are some algorithms which simultaineously incorporate clustering on instances and on features (in our case, genes and pixels), for example, IRM\cite{kemp_learning_2006}. These are called co-clustering or biclustering algorithms.
5.166 +There are some algorithms which simultaneously incorporate clustering on instances and on features (in our case, genes and pixels), for example, IRM\cite{kemp_learning_2006}. These are called co-clustering or biclustering algorithms.
5.167
5.168 \vspace{0.3cm}**Radial profiles**
5.169 We wil explore the use of the radial profile of gene expression under each pixel.
5.170 @@ -583,7 +584,7 @@
5.171
5.172 Because in most cases the ABA coronal dataset only contains one ISH per gene, it is possible for an unrelated combination of genes to seem to identify an area when in fact it is only coincidence. There are two ways we will validate our marker genes to guard against this. First, we will confirm that putative combinations of marker genes express the same pattern in both hemispheres. Second, we will manually validate our final results on other gene expression datasets such as EMAGE, GeneAtlas, and GENSAT\cite{gong_gene_2003}.
5.173
5.174 -Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify and explain how the statistical structure in the gene expression data led to any unexpected or interesting features of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of areas, which are discovered.
5.175 +Using the methods developed in Aim 2, we will present one or more hierarchical cortical maps. We will identify and explain how the statistical structure in the gene expression data led to any unexpected or interesting features of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of areas, which are discovered.
5.176
5.177
5.178
5.179 @@ -600,18 +601,18 @@
5.180 \vspace{0.3cm}**Finding marker genes**
5.181 \\ **September-November 2009**: Develop an automated mechanism for segmenting the cortical voxels into layers
5.182 \\ **November 2009 (milestone)**: Have completed construction of a flatmapped, cortical dataset with information for each layer
5.183 -\\ **October 2009-April 2010**: Develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised learning frameworks which use multivariate versions of the best scoring methods.
5.184 +\\ **October 2009-April 2010**: Develop scoring methods, dimensionality reduction, and supervised learning methods.
5.185 \\ **January 2010 (milestone)**: Submit a publication on single marker genes for cortical areas
5.186 -\\ **February-July 2010**: Continue to develop scoring methods and supervised learning frameworks. Explore the best way to integrate radial profiles with supervised learning. Explore the best way to make supervised learning techniques robust against incorrect labels (i.e. when the areas drawn on the input cortical map are slightly off). Quantitatively compare the performance of different supervised learning techniques. Validate marker genes found in the ABA dataset by checking against other gene expression datasets. Create documentation and unit tests for software toolbox for Aim 1. Respond to user bug reports for Aim 1 software toolbox.
5.187 +\\ **February-July 2010**: Continue to develop scoring methods and supervised learning frameworks. Extend techniques for robustness. Compare the performance of techniques. Validate marker genes. Prepare software toolbox for Aim 1.
5.188 \\ **June 2010 (milestone)**: Submit a paper describing a method fulfilling Aim 1. Release toolbox.
5.189 \\ **July 2010 (milestone)**: Submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
5.190
5.191 \vspace{0.3cm}**Revealing new ways to parcellate a structure into regions**
5.192 -\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms for Aim 2. Explore standard hierarchial clustering algorithms, used in combination with dimensionality reduction, for Aim 2. Explore co-clustering algorithms. Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial profile information. Quantitatively compare the performance of different dimensionality reduction and clustering techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
5.193 +\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms for Aim 2. Explore clustering algorithms. Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
5.194 \\ **March 2011 (milestone)**: Submit a paper describing a method fulfilling Aim 2. Release toolbox.
5.195 -\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, read the literature and talk to people to learn about research related to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 2 software toolbox.
5.196 +\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, interpret the results. Prepare software toolbox for Aim 2.
5.197 \\ **May 2011 (milestone)**: Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
5.198 -\\ **May-August 2011**: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Follow up on responses to our papers. Possibly submit another paper.
5.199 +\\ **May-August 2011**: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Possibly submit another paper.
5.200
5.201 \newpage
5.202