cg
changeset 89:79f51f8c878b
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Tue Apr 21 05:50:39 2009 -0700 (16 years ago) |
parents | ae1e1da359d2 |
children | 9e85d264837c |
files | grant.doc grant.html grant.odt grant.pdf grant.txt |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Tue Apr 21 05:38:52 2009 -0700
2.2 +++ b/grant.html Tue Apr 21 05:50:39 2009 -0700
2.3 @@ -363,6 +363,9 @@
2.4 surface pixel
2.5 ∙ For each gene, a 2-D matrix whose entries represent the average expression
2.6 level underneath each surface pixel
2.7 +_________________________________________
2.8 + 16SEV is a sparse format for spatial data. It is the format in which the ABA data is made available.
2.9 +
2.10
2.11 Figure 2: Gene Pitx2
2.12 is selectively underex-
2.13 @@ -377,8 +380,8 @@
2.14 Cortical layers are found at different depths in different parts of the cortex. In preparation for
2.15 extracting the layer-specific datasets, we have extended Caret with routines that allow the depth
2.16 of the ROI for volume-to-surface projection to vary.
2.17 - In the Research Plan, we describe how we will automatically locate the layer depths. For
2.18 -validation, we have manually demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
2.19 +In the Research Plan, we describe how we will automatically locate the layer depths. For validation, we have manually
2.20 +demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
2.21 Feature selection and scoring methods
2.22 Underexpression of a gene can serve as a marker Underexpression of a gene can sometimes serve as a marker. See,
2.23 for example, Figure 2.
2.24 @@ -387,8 +390,6 @@
2.25 surface pixels.
2.26 One class of feature selection scoring methods contains methods which calculate some sort of “match” between each gene
2.27 image and the target image. Those genes which match the best are good candidates for features.
2.28 -_________________________________________
2.29 - 16SEV is a sparse format for spatial data. It is the format in which the ABA data is made available.
2.30 One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between
2.31 each gene and each cortical area. The top row of Figure 1 shows the three genes most correlated with area SS.
2.32
2.33 @@ -427,47 +428,73 @@
2.34 this reason we designed a non-pointwise local scoring method to detect when a gene had a pattern of expression which
2.35 looked like it had a boundary whose shape is similar to the shape of the target region. We call this scoring method “gradient
2.36 similarity”.
2.37 +One might say that gradient similarity attempts to measure how much the border of the area of gene expression and
2.38 +the border of the target region overlap. However, since gene expression falls off continuously rather than jumping from its
2.39 +maximum value to zero, the spatial pattern of a gene’s expression often does not have a discrete border. Therefore, instead
2.40 +of looking for a discrete border, we look for large gradients. Gradient similarity is a symmetric function over two images
2.41 +(i.e. two scalar fields). It is is high to the extent that matching pixels which have large values and large gradients also have
2.42 +gradients which are oriented in a similar direction. The formula is:
2.43 + ∑
2.44 + pixel<img src="cmsy7-32.png" alt="∈" />pixels cos(abs(∠∇1 -∠∇2)) ⋅|∇1| + |∇2|
2.45 + 2 ⋅ pixel_value1 + pixel_value2
2.46 + 2
2.47
2.48
2.49 Figure 4: Upper left: wwc1. Upper right:
2.50 mtif2. Lower left: wwc1 + mtif2 (each
2.51 pixel’s value on the lower left is the sum of
2.52 -the corresponding pixels in the upper row). One might say that gradient similarity attempts to measure how much the
2.53 - border of the area of gene expression and the border of the target region over-
2.54 - lap. However, since gene expression falls off continuously rather than jumping
2.55 - from its maximum value to zero, the spatial pattern of a gene’s expression often
2.56 - does not have a discrete border. Therefore, instead of looking for a discrete
2.57 - border, we look for large gradients. Gradient similarity is a symmetric function
2.58 - over two images (i.e. two scalar fields). It is is high to the extent that matching
2.59 - pixels which have large values and large gradients also have gradients which
2.60 - are oriented in a similar direction. The formula is:
2.61 - ∑
2.62 - pixel<img src="cmsy7-32.png" alt="∈" />pixels cos(abs(∠∇1 -∠∇2)) ⋅|∇1| + |∇2|
2.63 - 2 ⋅ pixel_value1 + pixel_value2
2.64 - 2
2.65 - where ∇1 and ∇2 are the gradient vectors of the two images at the current
2.66 +the corresponding pixels in the upper row). where ∇1 and ∇2 are the gradient vectors of the two images at the current
2.67 pixel; ∠∇i is the angle of the gradient of image i at the current pixel; |∇i| is
2.68 the magnitude of the gradient of image i at the current pixel; and pixel_valuei
2.69 is the value of the current pixel in image i.
2.70 The intuition is that we want to see if the borders of the pattern in the
2.71 two images are similar; if the borders are similar, then both images will have
2.72 corresponding pixels with large gradients (because this is a border) which are
2.73 -oriented in a similar direction (because the borders are similar).
2.74 -Most of the genes in Figure 5 were identified via gradient similarity.
2.75 -Gradient similarity provides information complementary to correlation
2.76 -To show that gradient similarity can provide useful information that cannot be detected via pointwise analyses, consider
2.77 -Fig. 3. The top row of Fig. 3 displays the 3 genes which most match area AUD, according to a pointwise method17. The
2.78 + oriented in a similar direction (because the borders are similar).
2.79 + Most of the genes in Figure 5 were identified via gradient similarity.
2.80 + Gradient similarity provides information complementary to cor-
2.81 + relation
2.82 + To show that gradient similarity can provide useful information that cannot
2.83 + be detected via pointwise analyses, consider Fig. 3. The top row of Fig. 3
2.84 + displays the 3 genes which most match area AUD, according to a pointwise
2.85 + method17. The bottom row displays the 3 genes which most match AUD ac-
2.86 + cording to a method which considers local geometry18 The pointwise method
2.87 + in the top row identifies genes which express more strongly in AUD than out-
2.88 + side of it; its weakness is that this includes many areas which don’t have a
2.89 + salient border matching the areal border. The geometric method identifies
2.90 +genes whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes
2.91 +genes which don’t express over the entire area. Genes which have high rankings using both pointwise and border criteria,
2.92 +such as Aph1a in the example, may be particularly good markers. None of these genes are, individually, a perfect marker
2.93 +for AUD; we deliberately chose a “difficult” area in order to better contrast pointwise with geometric methods.
2.94 +Areas which can be identified by single genes Using gradient similarity, we have already found single genes which
2.95 +roughly identify some areas and groupings of areas. For each of these areas, an example of a gene which roughly identifies
2.96 +it is shown in Figure 5. We have not yet cross-verified these genes in other atlases.
2.97 +In addition, there are a number of areas which are almost identified by single genes: COAa+NLOT (anterior part of
2.98 +cortical amygdalar area, nucleus of the lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate), VIS
2.99 +(visual), AUD (auditory).
2.100 +These results validate our expectation that the ABA dataset can be exploited to find marker genes for many cortical
2.101 +areas, while also validating the relevancy of our new scoring method, gradient similarity.
2.102 +Combinations of multiple genes are useful and necessary for some areas
2.103 +In Figure 4, we give an example of a cortical area which is not marked by any single gene, but which can be identified
2.104 +combinatorially. Acccording to logistic regression, gene wwc1 is the best fit single gene for predicting whether or not a
2.105 +pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure 4 shows wwc1’s spatial
2.106 +expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the
2.107 +gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding
2.108 +to the overshoot is the medial surface of the cortex. MO is only found on the dorsal surface. Gene mtif2 is shown in the
2.109 +upper-right. Mtif2 captures MO’s upper-left boundary, but not its lower-right boundary. Mtif2 does not express very much
2.110 +on the medial surface. By adding together the values at each pixel in these two figures, we get the lower-left image. This
2.111 +combination captures area MO much better than any single gene.
2.112 +This shows that our proposal to develop a method to find combinations of marker genes is both possible and necessary.
2.113 +Feature selection integrated with prediction As noted earlier, in general, any predictive method can be used for
2.114 +feature selection by running it inside a stepwise wrapper. Also, some predictive methods integrate soft constraints on number
2.115 +of features used. Examples of both of these will be seen in the section “Multivariate Predictive methods”.
2.116 _________________________________________
2.117 17For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.118 variable was the value of the expression of the gene underneath that pixel. The resulting scores were used to rank the genes in terms of how well
2.119 they predict area AUD.
2.120 -bottom row displays the 3 genes which most match AUD according to a method which considers local geometry18 The
2.121 -pointwise method in the top row identifies genes which express more strongly in AUD than outside of it; its weakness is
2.122 -that this includes many areas which don’t have a salient border matching the areal border. The geometric method identifies
2.123 -genes whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes
2.124 -genes which don’t express over the entire area. Genes which have high rankings using both pointwise and border criteria,
2.125 -such as Aph1a in the example, may be particularly good markers. None of these genes are, individually, a perfect marker
2.126 -for AUD; we deliberately chose a “difficult” area in order to better contrast pointwise with geometric methods.
2.127 + 18For each gene the gradient similarity between (a) a map of the expression of each gene on the cortical surface and (b) the shape of area AUD,
2.128 +was calculated, and this was used to rank the genes.
2.129 +
2.130
2.131
2.132
2.133 @@ -491,56 +518,7 @@
2.134 bors, but not from the entire rest of the
2.135 cortex). The genes are Pitx2, Aldh1a2,
2.136 Ppfibp1, Slco1a5, Tshz2, Trhr, Col12a1,
2.137 -Ets1. Areas which can be identified by single genes Using gradient simi-
2.138 - larity, we have already found single genes which roughly identify some areas
2.139 - and groupings of areas. For each of these areas, an example of a gene which
2.140 - roughly identifies it is shown in Figure 5. We have not yet cross-verified these
2.141 - genes in other atlases.
2.142 - In addition, there are a number of areas which are almost identified by single
2.143 - genes: COAa+NLOT (anterior part of cortical amygdalar area, nucleus of the
2.144 - lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate),
2.145 - VIS (visual), AUD (auditory).
2.146 - These results validate our expectation that the ABA dataset can be ex-
2.147 - ploited to find marker genes for many cortical areas, while also validating the
2.148 - relevancy of our new scoring method, gradient similarity.
2.149 - Combinations of multiple genes are useful and necessary for some
2.150 - areas
2.151 - In Figure 4, we give an example of a cortical area which is not marked by
2.152 - any single gene, but which can be identified combinatorially. Acccording to
2.153 - logistic regression, gene wwc1 is the best fit single gene for predicting whether
2.154 - or not a pixel on the cortical surface belongs to the motor area (area MO).
2.155 - The upper-left picture in Figure 4 shows wwc1’s spatial expression pattern over
2.156 - the cortex. The lower-right boundary of MO is represented reasonably well by
2.157 - this gene, but the gene overshoots the upper-left boundary. This flattened 2-D
2.158 - representation does not show it, but the area corresponding to the overshoot is
2.159 - the medial surface of the cortex. MO is only found on the dorsal surface. Gene
2.160 - mtif2 is shown in the upper-right. Mtif2 captures MO’s upper-left boundary,
2.161 - but not its lower-right boundary. Mtif2 does not express very much on the
2.162 - medial surface. By adding together the values at each pixel in these two figures,
2.163 - we get the lower-left image. This combination captures area MO much better
2.164 - than any single gene.
2.165 - This shows that our proposal to develop a method to find combinations of
2.166 - marker genes is both possible and necessary.
2.167 - Feature selection integrated with prediction As noted earlier, in gen-
2.168 - eral, any predictive method can be used for feature selection by running it
2.169 - inside a stepwise wrapper. Also, some predictive methods integrate soft con-
2.170 - straints on number of features used. Examples of both of these will be seen in
2.171 - the section “Multivariate Predictive methods”.
2.172 - Multivariate Predictive methods
2.173 - Forward stepwise logistic regression Logistic regression is a popular
2.174 - method for predictive modeling of categorial data. As a pilot run, for five
2.175 - cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise
2.176 - logistic regression to find single genes, pairs of genes, and triplets of genes
2.177 - which predict areal identify. This is an example of feature selection integrated
2.178 - with prediction using a stepwise wrapper. Some of the single genes found
2.179 - were shown in various figures throughout this document, and Figure 4 shows
2.180 - a combination of genes which was found.
2.181 - We felt that, for single genes, gradient similarity did a better job than
2.182 - logistic regression at capturing our subjective impression of a “good gene”.
2.183 -_________________________________________
2.184 - 18For each gene the gradient similarity between (a) a map of the expression of each gene on the cortical surface and (b) the shape of area AUD,
2.185 -was calculated, and this was used to rank the genes.
2.186 -
2.187 +Ets1. Multivariate Predictive methods
2.188
2.189
2.190
2.191 @@ -553,7 +531,24 @@
2.192 from left: NNMF. Right: Landmark Isomap. Additional details: In the
2.193 third and fourth rows, 7 dimensions were found, but only 6 displayed. In
2.194 the last row: for PCA, 50 dimensions were used; for NNMF, 6 dimensions
2.195 -were used; for landmark Isomap, 7 dimensions were used. SVM on all genes at once
2.196 +were used; for landmark Isomap, 7 dimensions were used. Forward stepwise logistic regression Lo-
2.197 + gistic regression is a popular method for pre-
2.198 + dictive modeling of categorial data. As a pi-
2.199 + lot run, for five cortical areas (SS, AUD, RSP,
2.200 + VIS, and MO), we performed forward stepwise
2.201 + logistic regression to find single genes, pairs of
2.202 + genes, and triplets of genes which predict areal
2.203 + identify. This is an example of feature selec-
2.204 + tion integrated with prediction using a stepwise
2.205 + wrapper. Some of the single genes found were
2.206 + shown in various figures throughout this doc-
2.207 + ument, and Figure 4 shows a combination of
2.208 + genes which was found.
2.209 + We felt that, for single genes, gradient simi-
2.210 + larity did a better job than logistic regression at
2.211 + capturing our subjective impression of a “good
2.212 + gene”.
2.213 + SVM on all genes at once
2.214 In order to see how well one can do when
2.215 looking at all genes at once, we ran a support
2.216 vector machine to classify cortical surface pix-
2.217 @@ -567,46 +562,35 @@
2.218 genes.
2.219 Data-driven redrawing of the cor-
2.220 tical map
2.221 - We have applied the following dimensional-
2.222 - ity reduction algorithms to reduce the dimen-
2.223 - sionality of the gene expression profile associ-
2.224 - ated with each voxel: Principal Components
2.225 - Analysis (PCA), Simple PCA (SPCA), Multi-
2.226 - Dimensional Scaling (MDS), Isomap, Land-
2.227 - mark Isomap, Laplacian eigenmaps, Local Tan-
2.228 - gent Space Alignment (LTSA), Hessian locally
2.229 - linear embedding, Diffusion maps, Stochastic
2.230 - Neighbor Embedding (SNE), Stochastic Prox-
2.231 - imity Embedding (SPE), Fast Maximum Vari-
2.232 - ance Unfolding (FastMVU), Non-negative Ma-
2.233 - trix Factorization (NNMF). Space constraints
2.234 - prevent us from showing many of the results,
2.235 - but as a sample, PCA, NNMF, and landmark
2.236 - Isomap are shown in the first, second, and third
2.237 -rows of Figure 6.
2.238
2.239 Figure 7: Prototypes corresponding to sample gene clusters,
2.240 clustered by gradient similarity. Region boundaries for the
2.241 -region that most matches each prototype are overlayed. After applying the dimensionality reduction, we ran clus-
2.242 +region that most matches each prototype are overlayed. We have applied the following dimensionality reduction al-
2.243 + gorithms to reduce the dimensionality of the gene expression
2.244 + profile associated with each voxel: Principal Components
2.245 + Analysis (PCA), Simple PCA (SPCA), Multi-Dimensional
2.246 + Scaling (MDS), Isomap, Landmark Isomap, Laplacian eigen-
2.247 + maps, Local Tangent Space Alignment (LTSA), Hessian lo-
2.248 + cally linear embedding, Diffusion maps, Stochastic Neigh-
2.249 + bor Embedding (SNE), Stochastic Proximity Embedding
2.250 + (SPE), Fast Maximum Variance Unfolding (FastMVU),
2.251 + Non-negative Matrix Factorization (NNMF). Space con-
2.252 + straints prevent us from showing many of the results, but as
2.253 + a sample, PCA, NNMF, and landmark Isomap are shown in
2.254 + the first, second, and third rows of Figure 6.
2.255 + After applying the dimensionality reduction, we ran clus-
2.256 tering algorithms on the reduced data. To date we have tried
2.257 - k-means and spectral clustering. The results of k-means af-
2.258 - ter PCA, NNMF, and landmark Isomap are shown in the
2.259 - last row of Figure 6. To compare, the leftmost picture on
2.260 - the bottom row of Figure 6 shows some of the major sub-
2.261 - divisions of cortex. These results clearly show that differ-
2.262 - ent dimensionality reduction techniques capture different as-
2.263 - pects of the data and lead to different clusterings, indicating
2.264 - the utility of our proposal to produce a detailed comparion
2.265 - of these techniques as applied to the domain of genomic
2.266 - anatomy.
2.267 - Many areas are captured by clusters of genes We
2.268 - also clustered the genes using gradient similarity to see if
2.269 - the spatial regions defined by any clusters matched known
2.270 -anatomical regions. Figure 7 shows, for ten sample gene clusters, each cluster’s average expression pattern, compared to
2.271 -a known anatomical boundary. This suggests that it is worth attempting to cluster genes, and then to use the results to
2.272 -cluster voxels.
2.273 -_____________________________
2.274 +k-means and spectral clustering. The results of k-means after PCA, NNMF, and landmark Isomap are shown in the last
2.275 +row of Figure 6. To compare, the leftmost picture on the bottom row of Figure 6 shows some of the major subdivisions of
2.276 +cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data
2.277 +and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparion of these techniques
2.278 +as applied to the domain of genomic anatomy.
2.279 +Many areas are captured by clusters of genes We also clustered the genes using gradient similarity to see if the
2.280 +_________________________________________
2.281 195-fold cross-validation.
2.282 +spatial regions defined by any clusters matched known anatomical regions. Figure 7 shows, for ten sample gene clusters, each
2.283 +cluster’s average expression pattern, compared to a known anatomical boundary. This suggests that it is worth attempting
2.284 +to cluster genes, and then to use the results to cluster voxels.
2.285 The approach: what we plan to do
2.286 Flatmap and segment cortical layers
2.287 There are multiple ways to flatten 3-D data into 2-D. We will compare mappings from manifolds to planes which attempt
2.288 @@ -647,47 +631,49 @@
2.289 # jbt, coclustering
2.290 # self-organizing map
2.291 # compare using clustering scores
2.292 +__________
2.293 + 20Already, for each cortical area, we have used the C4.5 algorithm to find a decision tree for that area. We achieved good classification accuracy
2.294 +on our training set, but the number of genes that appeared in each tree was too large. We plan to implement a pruning procedure to generate
2.295 +trees that use fewer genes
2.296 # multivariate gradient similarity
2.297 # deep belief nets
2.298 Apply these algorithms to the cortex
2.299 -___
2.300 - 20Already, for each cortical area, we have used the C4.5 algorithm to find a decision tree for that area. We achieved good classification accuracy
2.301 -on our training set, but the number of genes that appeared in each tree was too large. We plan to implement a pruning procedure to generate
2.302 -trees that use fewer genes
2.303 Using the methods developed in Aim 1, we will present, for each cortical area, a short list of markers to identify that
2.304 area; and we will also present lists of “panels” of genes that can be used to delineate many areas at once. Using the methods
2.305 developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify and explain how the statistical
2.306 structure in the gene expression data led to any unexpected or interesting features of these maps.
2.307 Timeline and milestones
2.308 Aim 1
2.309 -∙October-November 2009: develop an automated mechanism for segmenting the cortical voxels into layers
2.310 -∙November 2009 (milestone): a preliminary automated mechanism for segmenting the cortical voxels into layers
2.311 -∙October 2009-April 2010: develop scoring methods and to test them in various supervised learning frameworks. Also
2.312 +∙September-November 2009: Develop an automated mechanism for segmenting the cortical voxels into layers
2.313 +∙November 2009 (milestone): Have completed construction of a flatmapped, cortical dataset with information for each
2.314 +layer
2.315 +∙October 2009-April 2010: Develop scoring methods and to test them in various supervised learning frameworks. Also
2.316 test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised
2.317 learning frameworks which use multivariate versions of the best scoring methods.
2.318 -∙January 2010 (milestone): submit a publication on single marker genes for cortical areas
2.319 +∙January 2010 (milestone): Submit a publication on single marker genes for cortical areas
2.320 ∙February-July 2010: Continue to develop scoring methods and supervised learning frameworks. Explore the best way
2.321 to integrate radial profiles with supervised learning. Explore the best way to make supervised learning techniques
2.322 robust against incorrect labels (i.e. when the areas drawn on the input cortical map are slightly off). Quantitatively
2.323 compare the performance of different supervised learning techniques. Validate marker genes found in the ABA dataset
2.324 by checking against other gene expression datasets. Create documentation and unit tests for software toolbox for Aim
2.325 1. Respond to user bug reports for Aim 1 software toolbox.
2.326 -∙June 2010 (milestone): submit a paper describing a method fulfilling Aim 1. Release toolbox.
2.327 -∙July 2010 (milestone): submit a paper describing combinations of marker genes for each cortical area, and a small
2.328 +∙June 2010 (milestone): Submit a paper describing a method fulfilling Aim 1. Release toolbox.
2.329 +∙July 2010 (milestone): Submit a paper describing combinations of marker genes for each cortical area, and a small
2.330 number of marker genes that can, in combination, define most of the areas at once
2.331 Aim 2
2.332 -∙April-September 2010: Explore dimensionality reduction algorithms for Aim 2. Explore standard hierarchial clus-
2.333 -tering algorithms, used in combination with dimensionality reduction, for Aim 2. Explore co-clustering algorithms.
2.334 -Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial profile
2.335 -information.
2.336 -∙January-March 2011: Quantitatively compare the performance of different dimensionality reduction and clustering
2.337 -techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
2.338 -∙March 2011 (milestone): submit a paper describing a method fulfilling Aim 2. Release toolbox.
2.339 -∙February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. Read
2.340 -the literature and talk to people to learn about research related to unexpected and interesting discoveries. Create
2.341 -documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 1 software toolbox.
2.342 -∙May 2011 (milestone): submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
2.343 -∙May-August 2011: revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1.
2.344 +∙April-March 2011: Explore dimensionality reduction algorithms for Aim 2. Explore standard hierarchial clustering
2.345 +algorithms, used in combination with dimensionality reduction, for Aim 2. Explore co-clustering algorithms. Think
2.346 +about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial profile in-
2.347 +formation. Quantitatively compare the performance of different dimensionality reduction and clustering techniques.
2.348 +Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
2.349 +∙March 2011 (milestone): Submit a paper describing a method fulfilling Aim 2. Release toolbox.
2.350 +∙February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways
2.351 +of organizing the cortex into areas are discovered, read the literature and talk to people to learn about research related
2.352 +to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug
2.353 +reports for Aim 1 software toolbox.
2.354 +∙May 2011 (milestone): Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
2.355 +∙May-August 2011: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Follow
2.356 +up on responses to our papers. Possibly submit another paper.
2.357 Bibliography & References Cited
2.358 [1]Chris Adamson, Leigh Johnston, Terrie Inder, Sandra Rees, Iven Mareels, and Gary Egan. A Tracking Approach to
2.359 Parcellation of the Cerebral Cortex, volume Volume 3749/2005 of Lecture Notes in Computer Science, pages 294–301.
3.1 Binary file grant.odt has changed
4.1 Binary file grant.pdf has changed
5.1 --- a/grant.txt Tue Apr 21 05:38:52 2009 -0700
5.2 +++ b/grant.txt Tue Apr 21 05:50:39 2009 -0700
5.3 @@ -236,7 +236,6 @@
5.4
5.5
5.6
5.7 -\newpage
5.8 == The approach: Preliminary Studies ==
5.9 \begin{wrapfigure}{L}{0.35\textwidth}\centering
5.10 %%\includegraphics[scale=.27]{singlegene_SS_corr_top_1_2365_jet.eps}\includegraphics[scale=.27]{singlegene_SS_corr_top_2_242_jet.eps}\includegraphics[scale=.27]{singlegene_SS_corr_top_3_654_jet.eps}
5.11 @@ -451,7 +450,6 @@
5.12
5.13
5.14
5.15 -\newpage
5.16 == The approach: what we plan to do ==
5.17
5.18
5.19 @@ -535,20 +533,20 @@
5.20
5.21 === Aim 1 ===
5.22
5.23 -* October-November 2009: develop an automated mechanism for segmenting the cortical voxels into layers
5.24 -* November 2009 (milestone): a preliminary automated mechanism for segmenting the cortical voxels into layers
5.25 -* October 2009-April 2010: develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised learning frameworks which use multivariate versions of the best scoring methods.
5.26 -* January 2010 (milestone): submit a publication on single marker genes for cortical areas
5.27 +* September-November 2009: Develop an automated mechanism for segmenting the cortical voxels into layers
5.28 +* November 2009 (milestone): Have completed construction of a flatmapped, cortical dataset with information for each layer
5.29 +* October 2009-April 2010: Develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised learning frameworks which use multivariate versions of the best scoring methods.
5.30 +* January 2010 (milestone): Submit a publication on single marker genes for cortical areas
5.31 * February-July 2010: Continue to develop scoring methods and supervised learning frameworks. Explore the best way to integrate radial profiles with supervised learning. Explore the best way to make supervised learning techniques robust against incorrect labels (i.e. when the areas drawn on the input cortical map are slightly off). Quantitatively compare the performance of different supervised learning techniques. Validate marker genes found in the ABA dataset by checking against other gene expression datasets. Create documentation and unit tests for software toolbox for Aim 1. Respond to user bug reports for Aim 1 software toolbox.
5.32 -* June 2010 (milestone): submit a paper describing a method fulfilling Aim 1. Release toolbox.
5.33 -* July 2010 (milestone): submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
5.34 +* June 2010 (milestone): Submit a paper describing a method fulfilling Aim 1. Release toolbox.
5.35 +* July 2010 (milestone): Submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
5.36
5.37 === Aim 2 ===
5.38 * April-March 2011: Explore dimensionality reduction algorithms for Aim 2. Explore standard hierarchial clustering algorithms, used in combination with dimensionality reduction, for Aim 2. Explore co-clustering algorithms. Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial profile information. Quantitatively compare the performance of different dimensionality reduction and clustering techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
5.39 -* March 2011 (milestone): submit a paper describing a method fulfilling Aim 2. Release toolbox.
5.40 -* February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. Read the literature and talk to people to learn about research related to unexpected and interesting discoveries. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 1 software toolbox.
5.41 -* May 2011 (milestone): submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
5.42 -* May-August 2011: revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1.
5.43 +* March 2011 (milestone): Submit a paper describing a method fulfilling Aim 2. Release toolbox.
5.44 +* February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, read the literature and talk to people to learn about research related to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 2 software toolbox.
5.45 +* May 2011 (milestone): Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
5.46 +* May-August 2011: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Follow up on responses to our papers. Possibly submit another paper.
5.47
5.48 \newpage
5.49