cg
changeset 101:89815d210b5c
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Wed Apr 22 07:06:46 2009 -0700 (16 years ago) |
parents | fa7c0a924e7a |
children | 4cca7c7d91d1 |
files | grant.doc grant.html grant.odt grant.pdf grant.txt nih-blank.cls |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Wed Apr 22 06:45:17 2009 -0700
2.2 +++ b/grant.html Wed Apr 22 07:06:46 2009 -0700
2.3 @@ -30,53 +30,53 @@
2.4 allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated
2.5 methods to relate spatial variation in gene expression to anatomy. We want to find marker genes for specific
2.6 anatomical regions, and also to draw new anatomical maps based on gene expression patterns.
2.7 -The Challenge and Potential impact
2.8 +______________
2.9 + The Challenge and Potential impact
2.10 Each of our three aims will be discussed in turn. For each aim, we will develop a conceptual framework for
2.11 thinking about the task, and we will present our strategy for solving it. Next we will discuss related work. At the
2.12 conclusion of each section, we will summarize why our strategy is different from what has been done before. At
2.13 the end of this section, we will describe the potential impact.
2.14 -Aim 1: Given a map of regions, find genes that mark the regions
2.15 -
2.16 + Aim 1: Given a map of regions, find genes that mark the regions
2.17 +
2.18 Figure 1: Gene Pitx2
2.19 is selectively underex-
2.20 -pressed in area SS. Machine learning terminology: classifiers The task of looking for marker genes for
2.21 - known anatomical regions means that one is looking for a set of genes such that, if
2.22 - the expression level of those genes is known, then the locations of the regions can be
2.23 - inferred.
2.24 - If we define the regions so that they cover the entire anatomical structure to be
2.25 - subdivided, we may say that we are using gene expression in each voxel to assign
2.26 - that voxel to the proper area. We call this a classification task, because each voxel
2.27 - is being assigned to a class (namely, its region). An understanding of the relationship
2.28 - between the combination of their expression levels and the locations of the regions may
2.29 - be expressed as a function. The input to this function is a voxel, along with the gene
2.30 - expression levels within that voxel; the output is the regional identity of the target voxel,
2.31 +pressed in area SS. Machine learning terminology: classifiers The task of looking for marker genes for
2.32 + known anatomical regions means that one is looking for a set of genes such that, if
2.33 + the expression level of those genes is known, then the locations of the regions can be
2.34 + inferred.
2.35 + If we define the regions so that they cover the entire anatomical structure to be
2.36 + subdivided, we may say that we are using gene expression in each voxel to assign
2.37 + that voxel to the proper area. We call this a classification task, because each voxel
2.38 + is being assigned to a class (namely, its region). An understanding of the relationship
2.39 + between the combination of their expression levels and the locations of the regions may
2.40 + be expressed as a function. The input to this function is a voxel, along with the gene
2.41 + expression levels within that voxel; the output is the regional identity of the target voxel,
2.42 that is, the region to which the target voxel belongs. We call this function a classifier. In general, the input to a
2.43 classifier is called an instance, and the output is called a label (or a class label).
2.44 -The object of aim 1 is not to produce a single classifier, but rather to develop an automated method for
2.45 + The object of aim 1 is not to produce a single classifier, but rather to develop an automated method for
2.46 determining a classifier for any known anatomical structure. Therefore, we seek a procedure by which a gene
2.47 expression dataset may be analyzed in concert with an anatomical atlas in order to produce a classifier. The
2.48 initial gene expression dataset used in the construction of the classifier is called training data. In the machine
2.49 learning literature, this sort of procedure may be thought of as a supervised learning task, defined as a task in
2.50 which the goal is to learn a mapping from instances to labels, and the training data consists of a set of instances
2.51 (voxels) for which the labels (regions) are known.
2.52 -Each gene expression level is called a feature, and the selection of which genes1 to include is called feature
2.53 + Each gene expression level is called a feature, and the selection of which genes1 to include is called feature
2.54 selection. Feature selection is one component of the task of learning a classifier. Some methods for learning
2.55 classifiers start out with a separate feature selection phase, whereas other methods combine feature selection
2.56 with other aspects of training.
2.57 -One class of feature selection methods assigns some sort of score to each candidate gene. The top-ranked
2.58 + One class of feature selection methods assigns some sort of score to each candidate gene. The top-ranked
2.59 genes are then chosen. Some scoring measures can assign a score to a set of selected genes, not just to a
2.60 single gene; in this case, a dynamic procedure may be used in which features are added and subtracted from the
2.61 selected set depending on how much they raise the score. Such procedures are called “stepwise” or “greedy”.
2.62 -Although the classifier itself may only look at the gene expression data within each voxel before classifying
2.63 + Although the classifier itself may only look at the gene expression data within each voxel before classifying
2.64 that voxel, the algorithm which constructs the classifier may look over the entire dataset. We can categorize
2.65 score-based feature selection methods depending on how the score of calculated. Often the score calculation
2.66 consists of assigning a sub-score to each voxel, and then aggregating these sub-scores into a final score (the
2.67 aggregation is often a sum or a sum of squares or average). If only information from nearby voxels is used to
2.68 -_________________________________________
2.69 - 1Strictly speaking, the features are gene expression levels, but we’ll call them genes.
2.70 calculate a voxel’s sub-score, then we say it is a local scoring method. If only information from the voxel itself is
2.71 used to calculate a voxel’s sub-score, then we say it is a pointwise scoring method.
2.72 -Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects
2.73 + Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects
2.74 + 1Strictly speaking, the features are gene expression levels, but we’ll call them genes.
2.75 have idiosyncratic anatomy. Subjects may be improperly registered to the atlas. The method used to measure
2.76 gene expression may be noisy. The atlas may have errors. It is even possible that some areas in the anatomical
2.77 atlas are “wrong” in that they do not have the same shape as the natural domains of gene expression to which
2.78 @@ -169,94 +169,102 @@
2.79 example, we believe that domain-specific scoring measures (such as
2.80 gradient similarity, which is discussed in Preliminary Studies) may be
2.81 necessary in order to achieve the best results in this application.
2.82 - We are aware of six existing efforts to find marker genes using spa-
2.83 - tial gene expression data using automated methods.
2.84 - [13] mentions the possibility of constructing a spatial region for each
2.85 - gene, and then, for each anatomical structure of interest, computing
2.86 - what proportion of this structure is covered by the gene’s spatial region.
2.87 + We now turn to efforts to find marker genes using spatial gene ex-
2.88 + pression data using automated methods.
2.89 GeneAtlas[5] and EMAGE [26] allow the user to construct a search
2.90 query by demarcating regions and then specifying either the strength of
2.91 expression or the name of another gene or dataset whose expression
2.92 pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to
2.93 -search for combinations of genes that define a region in concert but not separately.
2.94 -[15 ] describes AGEA, ”Anatomic Gene Expression Atlas”. AGEA has three components. Gene Finder: The
2.95 -user selects a seed voxel and the system (1) chooses a cluster which includes the seed voxel, (2) yields a list of
2.96 -genes which are overexpressed in that cluster. Correlation: The user selects a seed voxel and the system then
2.97 -shows the user how much correlation there is between the gene expression profile of the seed voxel and every
2.98 -other voxel. Clusters: will be described later
2.99 -[6 ] looks at the mean expression level of genes within anatomical regions, and applies a Student’s t-test with
2.100 -Bonferroni correction to determine whether the mean expression level of a gene is significantly higher in the
2.101 -target region.
2.102 -[15 ] and [6] differ from our Aim 1 in at least three ways. First, [15] and [6] find only single genes, whereas
2.103 -we will also look for combinations of genes. Second, [15] and [6] can only use overexpression as a marker,
2.104 -whereas we will also search for underexpression. Third, [15] and [6] use scores based on pointwise expression
2.105 -levels, whereas we will also use geometric scores such as gradient similarity (described in Preliminary Studies).
2.106 -Figures 4, 1, and 3 in the Preliminary Studies section contain evidence that each of our three choices is the right
2.107 -one.
2.108 + search for combinations of genes that define a region in concert but not
2.109 + separately.
2.110 + [15] describes AGEA, ”Anatomic Gene Expression Atlas”. AGEA
2.111 +has three components. Gene Finder: The user selects a seed voxel and the system (1) chooses a cluster which
2.112 +includes the seed voxel, (2) yields a list of genes which are overexpressed in that cluster. Correlation: The
2.113 +user selects a seed voxel and the system then shows the user how much correlation there is between the gene
2.114 +expression profile of the seed voxel and every other voxel. Clusters: will be described later. [6] looks at the mean
2.115 +expression level of genes within anatomical regions, and applies a Student’s t-test with Bonferroni correction to
2.116 +determine whether the mean expression level of a gene is significantly higher in the target region. [15] and [6]
2.117 +differ from our Aim 1 in at least three ways. First, [15] and [6] find only single genes, whereas we will also look
2.118 +for combinations of genes. Second, [15] and [6] can only use overexpression as a marker, whereas we will also
2.119 +search for underexpression. Third, [15] and [6] use scores based on pointwise expression levels, whereas we
2.120 +will also use geometric scores such as gradient similarity (described in Preliminary Studies). Figures 4, 1, and 3
2.121 +in the Preliminary Studies section contain evidence that each of our three choices is the right one.
2.122 [10 ] describes a technique to find combinations of marker genes to pick out an anatomical region. They use
2.123 an evolutionary algorithm to evolve logical operators which combine boolean (thresholded) images in order to
2.124 match a target image.
2.125 -_____________________
2.126 - 2By “fundamentally spatial” we mean that there is information from a large number of spatial locations indexed by spatial coordinates;
2.127 -not just data which have only a few different locations or which is indexed by anatomical label.
2.128 In summary, there has been fruitful work on finding marker genes, but only one of the previous projects
2.129 explores combinations of marker genes, and none of these publications compare the results obtained by using
2.130 different algorithms or scoring methods.
2.131 Aim 2: From gene expression data, discover a map of regions
2.132 +Machine learning terminology: clustering
2.133 +If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is
2.134 +referred to as unsupervised learning in the jargon of machine learning. One thing that you can do with such a
2.135 +dataset is to group instances together. A set of similar instances is called a cluster, and the activity of finding
2.136 +grouping the data into clusters is called clustering or cluster analysis.
2.137 +_________________________________________
2.138 + 2By “fundamentally spatial” we mean that there is information from a large number of spatial locations indexed by spatial coordinates;
2.139 +not just data which have only a few different locations or which is indexed by anatomical label.
2.140 +The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The
2.141 +instances are once again voxels (or pixels) along with their associated gene expression profiles. We make
2.142 +the assumption that voxels from the same anatomical region have similar gene expression profiles, at least
2.143 +compared to the other regions. This means that clustering voxels is the same as finding potential regions; we
2.144 +seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
2.145 +It is desirable to determine not just one set of regions, but also how these regions relate to each other. The
2.146 +outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition
2.147 +the voxels. This is called hierarchical clustering.
2.148 +Similarity scores A crucial choice when designing a clustering method is how to measure similarity, across
2.149 +either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature
2.150 +selection (discussed above under Aim 1) and scoring methods for similarity.
2.151
2.152
2.153 Figure 4: Upper left: wwc1. Upper
2.154 right: mtif2. Lower left: wwc1 + mtif2
2.155 (each pixel’s value on the lower left is
2.156 the sum of the corresponding pixels in
2.157 -the upper row). Machine learning terminology: clustering
2.158 - If one is given a dataset consisting merely of instances, with no
2.159 - class labels, then analysis of the dataset is referred to as unsupervised
2.160 - learning in the jargon of machine learning. One thing that you can do
2.161 - with such a dataset is to group instances together. A set of similar
2.162 - instances is called a cluster, and the activity of finding grouping the
2.163 - data into clusters is called clustering or cluster analysis.
2.164 - The task of deciding how to carve up a structure into anatomical
2.165 - regions can be put into these terms. The instances are once again
2.166 - voxels (or pixels) along with their associated gene expression profiles.
2.167 - We make the assumption that voxels from the same anatomical region
2.168 - have similar gene expression profiles, at least compared to the other
2.169 - regions. This means that clustering voxels is the same as finding po-
2.170 - tential regions; we seek a partitioning of the voxels into regions, that is,
2.171 - into clusters of voxels with similar gene expression.
2.172 - It is desirable to determine not just one set of regions, but also how
2.173 - these regions relate to each other. The outcome of clustering may be
2.174 - a hierarchical tree of clusters, rather than a single set of clusters which
2.175 -partition the voxels. This is called hierarchical clustering.
2.176 -Similarity scores A crucial choice when designing a clustering method is how to measure similarity, across
2.177 -either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature
2.178 -selection (discussed above under Aim 1) and scoring methods for similarity.
2.179 -Spatially contiguous clusters; image segmentation We have shown that aim 2 is a type of clustering
2.180 -task. In fact, it is a special type of clustering task because we have an additional constraint on clusters; voxels
2.181 -grouped together into a cluster must be spatially contiguous. In Preliminary Studies, we show that one can get
2.182 -reasonable results without enforcing this constraint; however, we plan to compare these results against other
2.183 -methods which guarantee contiguous clusters.
2.184 -Image segmentation is the task of partitioning the pixels in a digital image into clusters, usually contiguous
2.185 -clusters. Aim 2 is similar to an image segmentation task. There are two main differences; in our task, there are
2.186 -thousands of color channels (one for each gene), rather than just three3. A more crucial difference is that there
2.187 -are various cues which are appropriate for detecting sharp object boundaries in a visual scene but which are not
2.188 -appropriate for segmenting abstract spatial data such as gene expression. Although many image segmentation
2.189 -algorithms can be expected to work well for segmenting other sorts of spatially arranged data, some of these
2.190 -algorithms are specialized for visual images.
2.191 +the upper row). Spatially contiguous clusters; image segmentation We have
2.192 + shown that aim 2 is a type of clustering task. In fact, it is a special
2.193 + type of clustering task because we have an additional constraint on
2.194 + clusters; voxels grouped together into a cluster must be spatially con-
2.195 + tiguous. In Preliminary Studies, we show that one can get reasonable
2.196 + results without enforcing this constraint; however, we plan to compare
2.197 + these results against other methods which guarantee contiguous clus-
2.198 + ters.
2.199 + Image segmentation is the task of partitioning the pixels in a digital
2.200 + image into clusters, usually contiguous clusters. Aim 2 is similar to an
2.201 + image segmentation task. There are two main differences; in our task,
2.202 + there are thousands of color channels (one for each gene), rather than
2.203 + just three3. A more crucial difference is that there are various cues
2.204 + which are appropriate for detecting sharp object boundaries in a visual
2.205 + scene but which are not appropriate for segmenting abstract spatial
2.206 + data such as gene expression. Although many image segmentation
2.207 + algorithms can be expected to work well for segmenting other sorts of
2.208 + spatially arranged data, some of these algorithms are specialized for
2.209 +visual images.
2.210 Dimensionality reduction In this section, we discuss reducing the length of the per-pixel gene expression
2.211 feature vector. By “dimension”, we mean the dimension of this vector, not the spatial dimension of the underlying
2.212 data.
2.213 Unlike aim 1, there is no externally-imposed need to select only a handful of informative genes for inclusion
2.214 in the instances. However, some clustering algorithms perform better on small numbers of features4. There are
2.215 techniques which “summarize” a larger number of features using a smaller number of features; these techniques
2.216 +go by the name of feature extraction or dimensionality reduction. The small set of features that such a technique
2.217 +yields is called the reduced feature set. Note that the features in the reduced feature set do not necessarily
2.218 +correspond to genes; each feature in the reduced set may be any function of the set of gene expression levels.
2.219 +Clustering genes rather than voxels Although the ultimate goal is to cluster the instances (voxels or pixels),
2.220 +one strategy to achieve this goal is to first cluster the features (genes). There are two ways that clusters of genes
2.221 +could be used.
2.222 +Gene clusters could be used as part of dimensionality reduction: rather than have one feature for each gene,
2.223 +we could have one reduced feature for each gene cluster.
2.224 +Gene clusters could also be used to directly yield a clustering on instances. This is because many genes have
2.225 +an expression pattern which seems to pick out a single, spatially contiguous region. This suggests the following
2.226 _________________________________________
2.227 3There are imaging tasks which use more than three colors, for example multispectral imaging and hyperspectral imaging, which are
2.228 often used to process satellite imagery.
2.229 4First, because the number of features in the reduced dataset is less than in the original dataset, the running time of clustering
2.230 algorithms may be much less. Second, it is thought that some clustering algorithms may give better results on reduced data.
2.231 -go by the name of feature extraction or dimensionality reduction. The small set of features that such a technique
2.232 -yields is called the reduced feature set. Note that the features in the reduced feature set do not necessarily
2.233 -correspond to genes; each feature in the reduced set may be any function of the set of gene expression levels.
2.234 +procedure: cluster together genes which pick out similar regions, and then to use the more popular common
2.235 +regions as the final clusters. In Preliminary Studies, Figure 7, we show that a number of anatomically recognized
2.236 +cortical regions, as well as some “superregions” formed by lumping together a few regions, are associated with
2.237 +gene clusters in this fashion.
2.238
2.239
2.240
2.241 @@ -281,23 +289,7 @@
2.242 tinguished from its neighbors, but not
2.243 from the entire rest of the cortex). The
2.244 genes are Pitx2, Aldh1a2, Ppfibp1,
2.245 -Slco1a5, Tshz2, Trhr, Col12a1, Ets1. Clustering genes rather than voxels Although the ultimate goal is
2.246 - to cluster the instances (voxels or pixels), one strategy to achieve this
2.247 - goal is to first cluster the features (genes). There are two ways that
2.248 - clusters of genes could be used.
2.249 - Gene clusters could be used as part of dimensionality reduction:
2.250 - rather than have one feature for each gene, we could have one reduced
2.251 - feature for each gene cluster.
2.252 - Gene clusters could also be used to directly yield a clustering on
2.253 - instances. This is because many genes have an expression pattern
2.254 - which seems to pick out a single, spatially contiguous region. This
2.255 - suggests the following procedure: cluster together genes which pick
2.256 - out similar regions, and then to use the more popular common regions
2.257 - as the final clusters. In Preliminary Studies, Figure 7, we show that a
2.258 - number of anatomically recognized cortical regions, as well as some
2.259 - “superregions” formed by lumping together a few regions, are associ-
2.260 - ated with gene clusters in this fashion.
2.261 - Related work
2.262 +Slco1a5, Tshz2, Trhr, Col12a1, Ets1. Related work
2.263 Some researchers have attempted to parcellate cortex on the basis of
2.264 non-gene expression data. For example, [18], [2], [19], and [1] asso-
2.265 ciate spots on the cortex with the radial profile5 of response to some
2.266 @@ -319,18 +311,35 @@
2.267 [6] clusters genes. For each cluster, prototypical spatial expression
2.268 patterns were created by averaging the genes in the cluster. The pro-
2.269 totypes were analyzed manually, without clustering voxels.
2.270 - [10] applies their technique for finding combinations of marker
2.271 - genes for the purpose of clustering genes around a “seed gene”.
2.272 + [10] applies their technique for finding combinations of marker genes
2.273 + for the purpose of clustering genes around a “seed gene”.
2.274 In summary, although these projects obtained clusterings, there has
2.275 not been much comparison between different algorithms or scoring
2.276 methods, so it is likely that the best clustering method for this appli-
2.277 - cation has not yet been found. The projects using gene expression on
2.278 -cortex did not attempt to make use of the radial profile of gene expression. Also, none of these projects did a
2.279 + cation has not yet been found. The projects using gene expression
2.280 + on cortex did not attempt to make use of the radial profile of gene ex-
2.281 + pression. Also, none of these projects did a separate dimensionality
2.282 + reduction step before clustering pixels, none tried to cluster genes first
2.283 + in order to guide automated clustering of pixels into spatial regions, and
2.284 + none used co-clustering algorithms.
2.285 + Aim 3: apply the methods developed to the cerebral cortex
2.286 + Background
2.287 + The cortex is divided into areas and layers. Because of the cortical
2.288 + columnar organization, the parcellation of the cortex into areas can be
2.289 + drawn as a 2-D map on the surface of the cortex. In the third dimension,
2.290 + the boundaries between the areas continue downwards into the cortical
2.291 + depth, perpendicular to the surface. The layer boundaries run parallel
2.292 + to the surface. One can picture an area of the cortex as a slice of a
2.293 + six-layered cake6.
2.294 + It is known that different cortical areas have distinct roles in both
2.295 + normal functioning and in disease processes, yet there are no known
2.296 _________________________________________
2.297 5A radial profile is a profile along a line perpendicular to the cortical surface.
2.298 -separate dimensionality reduction step before clustering pixels, none tried to cluster genes first in order to guide
2.299 -automated clustering of pixels into spatial regions, and none used co-clustering algorithms.
2.300 -Aim 3: apply the methods developed to the cerebral cortex
2.301 + 6Outside of isocortex, the number of layers varies.
2.302 + marker genes for most cortical areas. When it is necessary to divide a
2.303 + tissue sample into cortical areas, this is a manual process that requires
2.304 + a skilled human to combine multiple visual cues and interpret them in
2.305 + the context of their approximate location upon the cortical surface.
2.306
2.307
2.308
2.309 @@ -344,52 +353,40 @@
2.310 Isomap. Additional details: In the third and fourth rows, 7 dimen-
2.311 sions were found, but only 6 displayed. In the last row: for PCA,
2.312 50 dimensions were used; for NNMF, 6 dimensions were used; for
2.313 -landmark Isomap, 7 dimensions were used. Background
2.314 - The cortex is divided into areas and lay-
2.315 - ers. Because of the cortical columnar or-
2.316 - ganization, the parcellation of the cortex
2.317 - into areas can be drawn as a 2-D map on
2.318 - the surface of the cortex. In the third di-
2.319 - mension, the boundaries between the ar-
2.320 - eas continue downwards into the cortical
2.321 - depth, perpendicular to the surface. The
2.322 - layer boundaries run parallel to the sur-
2.323 - face. One can picture an area of the cortex
2.324 - as a slice of a six-layered cake6.
2.325 - It is known that different cortical areas
2.326 - have distinct roles in both normal function-
2.327 - ing and in disease processes, yet there are
2.328 - no known marker genes for most cortical
2.329 - areas. When it is necessary to divide a
2.330 - tissue sample into cortical areas, this is a
2.331 - manual process that requires a skilled hu-
2.332 - man to combine multiple visual cues and
2.333 - interpret them in the context of their ap-
2.334 - proximate location upon the cortical sur-
2.335 - face.
2.336 - Even the questions of how many ar-
2.337 +landmark Isomap, 7 dimensions were used. Even the questions of how many ar-
2.338 eas should be recognized in cortex, and
2.339 what their arrangement is, are still not com-
2.340 pletely settled. A proposed division of the
2.341 cortex into areas is called a cortical map.
2.342 In the rodent, the lack of a single agreed-
2.343 -upon map can be seen by contrasting the recent maps given by Swanson[22] on the one hand, and Paxinos
2.344 -and Franklin[17] on the other. While the maps are certainly very similar in their general arrangement, significant
2.345 -differences remain.
2.346 -The Allen Mouse Brain Atlas dataset
2.347 -The Allen Mouse Brain Atlas (ABA) data were produced by doing in-situ hybridization on slices of male,
2.348 -56-day-old C57BL/6J mouse brains. Pictures were taken of the processed slice, and these pictures were semi-
2.349 -automatically analyzed to create a digital measurement of gene expression levels at each location in each slice.
2.350 -Per slice, cellular spatial resolution is achieved. Using this method, a single physical slice can only be used
2.351 -to measure one single gene; many different mouse brains were needed in order to measure the expression of
2.352 -many genes.
2.353 -An automated nonlinear alignment procedure located the 2D data from the various slices in a single 3D
2.354 -coordinate system. In the final 3D coordinate system, voxels are cubes with 200 microns on a side. There are
2.355 -67x41x58 = 159,326 voxels in the 3D coordinate system, of which 51,533 are in the brain[15].
2.356 + upon map can be seen by contrasting the
2.357 + recent maps given by Swanson[22] on the
2.358 + one hand, and Paxinos and Franklin[17] on
2.359 + the other. While the maps are certainly
2.360 + very similar in their general arrangement,
2.361 + significant differences remain.
2.362 + The Allen Mouse Brain Atlas dataset
2.363 + The Allen Mouse Brain Atlas (ABA)
2.364 + data were produced by doing in-situ hy-
2.365 + bridization on slices of male, 56-day-old
2.366 + C57BL/6J mouse brains. Pictures were
2.367 + taken of the processed slice, and these pic-
2.368 + tures were semi-automatically analyzed to
2.369 + create a digital measurement of gene ex-
2.370 + pression levels at each location in each
2.371 + slice. Per slice, cellular spatial resolution
2.372 + is achieved. Using this method, a single
2.373 + physical slice can only be used to measure
2.374 + one single gene; many different mouse
2.375 + brains were needed in order to measure
2.376 + the expression of many genes.
2.377 + An automated nonlinear alignment pro-
2.378 + cedure located the 2D data from the var-
2.379 +ious slices in a single 3D coordinate system. In the final 3D coordinate system, voxels are cubes with 200
2.380 +microns on a side. There are 67x41x58 = 159,326 voxels in the 3D coordinate system, of which 51,533 are in
2.381 +the brain[15].
2.382 Mus musculus is thought to contain about 22,000 protein-coding genes[28]. The ABA contains data on about
2.383 20,000 genes in sagittal sections, out of which over 4,000 genes are also measured in coronal sections. Our
2.384 -_________________________________________
2.385 - 6Outside of isocortex, the number of layers varies.
2.386 dataset is derived from only the coronal subset of the ABA7.
2.387 The ABA is not the only large public spatial gene expression dataset. However, with the exception of the ABA,
2.388 GenePaint, and EMAGE, most of the other resources have not (yet) extracted the expression intensity from the
2.389 @@ -400,6 +397,10 @@
2.390 of analysis is not related to either of our aims, as it neither finds marker genes, nor does it suggest a cortical
2.391 map based on gene expression data. Neither of the other components of AGEA can be applied to cortical
2.392 areas; AGEA’s Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA’s hierarchical
2.393 +_________________________________________
2.394 + 7The sagittal data do not cover the entire cortex, and also have greater registration error[15]. Genes were selected by the Allen
2.395 +Institute for coronal sectioning based on, “classes of known neuroscientific interest... or through post hoc identification of a marked
2.396 +non-ubiquitous expression pattern”[15].
2.397 clustering does not produce clusters corresponding to the cortical areas8.
2.398 In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes,
2.399 (b) there has been almost no comparison of different algorithms or scoring methods, and (c) there has been no
2.400 @@ -435,28 +436,26 @@
2.401 cortical maps may have come out differently. It is likely that there are many repeated, salient spatial patterns
2.402 in the gene expression which have not yet been captured by any stain. Therefore, cortical anatomy needs to
2.403 incorporate what we can learn from looking at the patterns of gene expression.
2.404 -_________________________________________
2.405 - 7The sagittal data do not cover the entire cortex, and also have greater registration error[15]. Genes were selected by the Allen
2.406 -Institute for coronal sectioning based on, “classes of known neuroscientific interest... or through post hoc identification of a marked
2.407 -non-ubiquitous expression pattern”[15].
2.408 - 8In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer
2.409 -are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a
2.410 -pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas.
2.411 While we do not here propose to analyze human gene expression data, it is conceivable that the methods
2.412 we propose to develop could be used to suggest modifications to the human cortical map as well. In fact, the
2.413 methods we will develop will be applicable to other datasets beyond the brain.
2.414 -The approach: Preliminary Studies
2.415 -Format conversion between SEV, MATLAB, NIFTI
2.416 +_______________________________
2.417 + The approach: Preliminary Studies
2.418 + Format conversion between SEV, MATLAB, NIFTI
2.419 We have created software to (politely) download all of the SEV files9 from the Allen Institute website. We have
2.420 also created software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret’s
2.421 file formats.
2.422 -Flatmap of cortex
2.423 + Flatmap of cortex
2.424 We downloaded the ABA data and applied a mask to select only those voxels which belong to cerebral cortex.
2.425 We divided the cortex into hemispheres. Using Caret[7], we created a mesh representation of the surface of the
2.426 selected voxels. For each gene, and for each node of the mesh, we calculated an average of the gene expression
2.427 of the voxels “underneath” that mesh node. We then flattened the cortex, creating a two-dimensional mesh. We
2.428 sampled the nodes of the irregular, flat mesh in order to create a regular grid of pixel values. We converted this
2.429 grid into a MATLAB matrix. We manually traced the boundaries of each of 49 cortical areas from the ABA coronal
2.430 + 8In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer
2.431 +are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a
2.432 +pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas.
2.433 + 9SEV is a sparse format for spatial data. It is the format in which the ABA data is made available.
2.434 reference atlas slides. We then converted these manual traces into Caret-format regional boundary data on the
2.435 mesh surface. We projected the regions onto the 2-d mesh, and then onto the grid, and then we converted the
2.436 region data into MATLAB format.
2.437 @@ -483,8 +482,6 @@
2.438 We calculated the correlation between each gene and each cortical area. The top row of Figure 2 shows the
2.439 three genes most correlated with area SS.
2.440 Conditional entropy
2.441 -__________________
2.442 - 9SEV is a sparse format for spatial data. It is the format in which the ABA data is made available.
2.443 For each region, we created and ran a forward stepwise procedure which attempted to find pairs of gene
2.444 expression boolean masks such that the conditional entropy of the target area’s boolean mask, conditioned
2.445 upon the pair of gene expression boolean masks, is minimized.
2.446 @@ -555,6 +552,8 @@
2.447 Embedding, Fast Maximum Variance Unfolding, Non-negative Matrix Factorization (NNMF). Space constraints
2.448 prevent us from showing many of the results, but as a sample, PCA, NNMF, and landmark Isomap are shown in
2.449 the first, second, and third rows of Figure 6.
2.450 +_
2.451 + 105-fold cross-validation.
2.452 After applying the dimensionality reduction, we ran clustering algorithms on the reduced data. To date we
2.453 have tried k-means and spectral clustering. The results of k-means after PCA, NNMF, and landmark Isomap are
2.454 shown in the last row of Figure 6. To compare, the leftmost picture on the bottom row of Figure 6 shows some
2.455 @@ -575,8 +574,6 @@
2.456 like the cortex another strategy is to group together voxels in the same cortical layer; each surface pixel would
2.457 then be associated with one expression level per gene per layer. We will develop a segmentation algorithm to
2.458 automatically identify the layer boundaries.
2.459 -__
2.460 - 105-fold cross-validation.
2.461 Develop algorithms that find genetic markers for anatomical regions
2.462 Scoring measures and feature selection We will develop scoring methods for evaluating how good individual
2.463 genes are at marking areas. We will compare pointwise, geometric, and information-theoretic measures. We
2.464 @@ -622,13 +619,6 @@
2.465 sifier can be combined with a stepwise wrapper for use as a feature selection method. We will explore logistic
2.466 regression (including spatial models[16]), decision trees12, sparse SVMs, generative mixture models (including
2.467 naive bayes), kernel density estimation, instance-based learning methods (such as k-nearest neighbor), genetic
2.468 -_________________________________________
2.469 - 11Not just any redrawing is acceptable, only those which appear to be justified as a natural spatial domain of gene expression by
2.470 -multiple sources of evidence. Interestingly, the need to detect “natural spatial domains of gene expression” in a data-driven fashion
2.471 -means that the methods of Aim 2 might be useful in achieving Aim 1, as well – particularly discriminative dimensionality reduction.
2.472 - 12Actually, we have already begun to explore decision trees. For each cortical area, we have used the C4.5 algorithm to find a decision
2.473 -tree for that area. We achieved good classification accuracy on our training set, but the number of genes that appeared in each tree was
2.474 -too large. We plan to implement a pruning procedure to generate trees that use fewer genes.
2.475 algorithms, and artificial neural networks.
2.476 Develop algorithms to suggest a division of a structure into anatomical parts
2.477 Dimensionality reduction on gene expression profiles We have already described the application of ten
2.478 @@ -657,6 +647,13 @@
2.479 help or hurt the ultimate goal of identifying interesting spatial regions.
2.480 Co-clustering There are some algorithms which simultaneously incorporate clustering on instances and on
2.481 features (in our case, genes and pixels), for example, IRM[11]. These are called co-clustering or biclustering
2.482 +_________________________________________
2.483 + 11Not just any redrawing is acceptable, only those which appear to be justified as a natural spatial domain of gene expression by
2.484 +multiple sources of evidence. Interestingly, the need to detect “natural spatial domains of gene expression” in a data-driven fashion
2.485 +means that the methods of Aim 2 might be useful in achieving Aim 1, as well – particularly discriminative dimensionality reduction.
2.486 + 12Actually, we have already begun to explore decision trees. For each cortical area, we have used the C4.5 algorithm to find a decision
2.487 +tree for that area. We achieved good classification accuracy on our training set, but the number of genes that appeared in each tree was
2.488 +too large. We plan to implement a pruning procedure to generate trees that use fewer genes.
2.489 algorithms.
2.490 Radial profiles We wil explore the use of the radial profile of gene expression under each pixel.
2.491 Compare different methods In order to tell which method is best for genomic anatomy, for each experimental
2.492 @@ -682,13 +679,13 @@
2.493 and explain how the statistical structure in the gene expression data led to any unexpected or interesting features
2.494 of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of
2.495 areas, which are discovered.
2.496 -Timeline and milestones
2.497 +____________________________________________________________________________
2.498 + Timeline and milestones
2.499 Finding marker genes
2.500 September-November 2009: Develop an automated mechanism for segmenting the cortical voxels into layers
2.501 November 2009 (milestone): Have completed construction of a flatmapped, cortical dataset with information
2.502 for each layer
2.503 -October 2009-April 2010: Develop scoring methods, dimensionality reduction, and supervised learning meth-
2.504 -ods.
2.505 +October 2009-April 2010: Develop scoring and supervised learning methods.
2.506 January 2010 (milestone): Submit a publication on single marker genes for cortical areas
2.507 February-July 2010: Continue to develop scoring methods and supervised learning frameworks. Extend tech-
2.508 niques for robustness. Compare the performance of techniques. Validate marker genes. Prepare software
2.509 @@ -696,13 +693,12 @@
2.510 June 2010 (milestone): Submit a paper describing a method fulfilling Aim 1. Release toolbox.
2.511 July 2010 (milestone): Submit a paper describing combinations of marker genes for each cortical area, and a
2.512 small number of marker genes that can, in combination, define most of the areas at once
2.513 -Revealing new ways to parcellate a structure into regions
2.514 -June 2010-March 2011: Explore dimensionality reduction algorithms for Aim 2. Explore clustering algorithms.
2.515 -Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
2.516 + Revealing new ways to parcellate a structure into regions
2.517 +June 2010-March 2011: Explore dimensionality reduction algorithms. Explore clustering algorithms. Adapt
2.518 +clustering algorithms to use radial profile information. Compare the performance of techniques.
2.519 March 2011 (milestone): Submit a paper describing a method fulfilling Aim 2. Release toolbox.
2.520 -February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If
2.521 -new ways of organizing the cortex into areas are discovered, interpret the results. Prepare software toolbox for
2.522 -Aim 2.
2.523 +February-May 2011: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex,
2.524 +interpret the results. Prepare software toolbox for Aim 2.
2.525 May 2011 (milestone): Submit a paper on the genomic anatomy of the cortex, using the methods developed in
2.526 Aim 2
2.527 May-August 2011: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1.
3.1 Binary file grant.odt has changed
4.1 Binary file grant.pdf has changed
5.1 --- a/grant.txt Wed Apr 22 06:45:17 2009 -0700
5.2 +++ b/grant.txt Wed Apr 22 07:06:46 2009 -0700
5.3 @@ -1,5 +1,6 @@
5.4 \documentclass[11pt]{nih-blank}
5.5
5.6 +\usepackage[small,compact]{titlesec}
5.7
5.8 %%\piname{Stevens, Charles F.}
5.9
5.10 @@ -48,6 +49,7 @@
5.11
5.12 This proposal addresses challenge topic 06-HG-101. Massive new datasets obtained with techniques such as in situ hybridization (ISH), immunohistochemistry, in situ transgenic reporter, microarray voxelation, and others, allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated methods to relate spatial variation in gene expression to anatomy. We want to find marker genes for specific anatomical regions, and also to draw new anatomical maps based on gene expression patterns.
5.13
5.14 +\vspace{0.3cm}\hrule
5.15 == The Challenge and Potential impact ==
5.16
5.17 Each of our three aims will be discussed in turn. For each aim, we will develop a conceptual framework for thinking about the task, and we will present our strategy for solving it. Next we will discuss related work. At the conclusion of each section, we will summarize why our strategy is different from what has been done before. At the end of this section, we will describe the potential impact.
5.18 @@ -140,15 +142,14 @@
5.19
5.20 As noted above, there has been much work on both supervised learning and there are many available algorithms for each. However, the algorithms require the scientist to provide a framework for representing the problem domain, and the way that this framework is set up has a large impact on performance. Creating a good framework can require creatively reconceptualizing the problem domain, and is not merely a mechanical "fine-tuning" of numerical parameters. For example, we believe that domain-specific scoring measures (such as gradient similarity, which is discussed in Preliminary Studies) may be necessary in order to achieve the best results in this application.
5.21
5.22 -We are aware of six existing efforts to find marker genes using spatial gene expression data using automated methods.
5.23 +We now turn to efforts to find marker genes using spatial gene expression data using automated methods.
5.24
5.25 %%GeneAtlas\cite{carson_digital_2005} allows the user to construct a search query by freely demarcating one or two 2-D regions on sagittal slices, and then to specify either the strength of expression or the name of another gene whose expression pattern is to be matched.
5.26
5.27 -\cite{lee_high-resolution_2007} mentions the possibility of constructing a spatial region for each gene, and then, for each anatomical structure of interest, computing what proportion of this structure is covered by the gene's spatial region.
5.28 -
5.29 -GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifying either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
5.30 -
5.31 %% \footnote{For the similiarity score (match score) between two images (in this case, the query and the gene expression images), GeneAtlas uses the sum of a weighted L1-norm distance between vectors whose components represent the number of cells within a pixel (actually, many of these projects use quadrilaterals instead of square pixels; but we will refer to them as pixels for simplicity) whose expression is within four discretization levels. EMAGE uses Jaccard similarity (the number of true pixels in the intersection of the two images, divided by the number of pixels in their union).}
5.32 +%% \cite{lee_high-resolution_2007} mentions the possibility of constructing a spatial region for each gene, and then, for each anatomical structure of interest, computing what proportion of this structure is covered by the gene's spatial region.
5.33 +
5.34 +GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifying either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
5.35
5.36 \cite{ng_anatomic_2009} describes AGEA, "Anatomic Gene Expression
5.37 Atlas". AGEA has three
5.38 @@ -156,16 +157,11 @@
5.39 cluster which includes the seed voxel, (2) yields a list of genes
5.40 which are overexpressed in that cluster. **Correlation**: The user selects a seed voxel and the system
5.41 then shows the user how much correlation there is between the gene
5.42 -expression profile of the seed voxel and every other voxel. **Clusters**: will be described later
5.43 -
5.44 -\cite{chin_genome-scale_2007} looks at the mean expression level of genes within anatomical regions, and applies a Student's t-test with Bonferroni correction to determine whether the mean expression level of a gene is significantly higher in the target region.
5.45 -
5.46 -\cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} differ from our Aim 1 in at least three ways. First, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} find only single genes, whereas we will also look for combinations of genes. Second, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} can only use overexpression as a marker, whereas we will also search for underexpression. Third, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} use scores based on pointwise expression levels, whereas we will also use geometric scores such as gradient similarity (described in Preliminary Studies). Figures \ref{MOcombo}, \ref{hole}, and \ref{AUDgeometry} in the Preliminary Studies section contain evidence that each of our three choices is the right one.
5.47 -
5.48 -
5.49 +expression profile of the seed voxel and every other voxel. **Clusters**: will be described later. \cite{chin_genome-scale_2007} looks at the mean expression level of genes within anatomical regions, and applies a Student's t-test with Bonferroni correction to determine whether the mean expression level of a gene is significantly higher in the target region. \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} differ from our Aim 1 in at least three ways. First, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} find only single genes, whereas we will also look for combinations of genes. Second, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} can only use overexpression as a marker, whereas we will also search for underexpression. Third, \cite{ng_anatomic_2009} and \cite{chin_genome-scale_2007} use scores based on pointwise expression levels, whereas we will also use geometric scores such as gradient similarity (described in Preliminary Studies). Figures \ref{MOcombo}, \ref{hole}, and \ref{AUDgeometry} in the Preliminary Studies section contain evidence that each of our three choices is the right one.
5.50
5.51 \cite{hemert_matching_2008} describes a technique to find combinations of marker genes to pick out an anatomical region. They use an evolutionary algorithm to evolve logical operators which combine boolean (thresholded) images in order to match a target image. %%Their match score is Jaccard similarity.
5.52
5.53 +
5.54 In summary, there has been fruitful work on finding marker genes, but only one of the previous projects explores combinations of marker genes, and none of these publications compare the results obtained by using different algorithms or scoring methods.
5.55
5.56
5.57 @@ -173,6 +169,23 @@
5.58
5.59 === Aim 2: From gene expression data, discover a map of regions ===
5.60
5.61 +
5.62 +
5.63 +\vspace{0.3cm}**Machine learning terminology: clustering**
5.64 +
5.65 +If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is referred to as __unsupervised learning__ in the jargon of machine learning. One thing that you can do with such a dataset is to group instances together. A set of similar instances is called a __cluster__, and the activity of finding grouping the data into clusters is called __clustering__ or __cluster analysis__.
5.66 +
5.67 +The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same anatomical region have similar gene expression profiles, at least compared to the other regions. This means that clustering voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
5.68 +
5.69 +%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
5.70 +
5.71 +It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
5.72 +
5.73 +
5.74 +\vspace{0.3cm}**Similarity scores**
5.75 +A crucial choice when designing a clustering method is how to measure similarity, across either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature selection (discussed above under Aim 1) and scoring methods for similarity.
5.76 +
5.77 +
5.78 \begin{wrapfigure}{L}{0.35\textwidth}\centering
5.79 \includegraphics[scale=.27]{MO_vs_Wwc1_jet.eps}\includegraphics[scale=.27]{MO_vs_Mtif2_jet.eps}
5.80
5.81 @@ -180,23 +193,6 @@
5.82 \caption{Upper left: $wwc1$. Upper right: $mtif2$. Lower left: wwc1 + mtif2 (each pixel's value on the lower left is the sum of the corresponding pixels in the upper row).}
5.83 \label{MOcombo}\end{wrapfigure}
5.84
5.85 -
5.86 -
5.87 -\vspace{0.3cm}**Machine learning terminology: clustering**
5.88 -
5.89 -If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is referred to as __unsupervised learning__ in the jargon of machine learning. One thing that you can do with such a dataset is to group instances together. A set of similar instances is called a __cluster__, and the activity of finding grouping the data into clusters is called __clustering__ or __cluster analysis__.
5.90 -
5.91 -The task of deciding how to carve up a structure into anatomical regions can be put into these terms. The instances are once again voxels (or pixels) along with their associated gene expression profiles. We make the assumption that voxels from the same anatomical region have similar gene expression profiles, at least compared to the other regions. This means that clustering voxels is the same as finding potential regions; we seek a partitioning of the voxels into regions, that is, into clusters of voxels with similar gene expression.
5.92 -
5.93 -%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
5.94 -
5.95 -It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
5.96 -
5.97 -
5.98 -\vspace{0.3cm}**Similarity scores**
5.99 -A crucial choice when designing a clustering method is how to measure similarity, across either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature selection (discussed above under Aim 1) and scoring methods for similarity.
5.100 -
5.101 -
5.102 \vspace{0.3cm}**Spatially contiguous clusters; image segmentation**
5.103 We have shown that aim 2 is a type of clustering task. In fact, it is a special type of clustering task because we have an additional constraint on clusters; voxels grouped together into a cluster must be spatially contiguous. In Preliminary Studies, we show that one can get reasonable results without enforcing this constraint; however, we plan to compare these results against other methods which guarantee contiguous clusters.
5.104
5.105 @@ -347,6 +343,7 @@
5.106
5.107
5.108
5.109 +\vspace{0.3cm}\hrule
5.110
5.111 == The approach: Preliminary Studies ==
5.112
5.113 @@ -596,22 +593,23 @@
5.114 %%\vspace{0.3cm}**Extension to probabalistic maps**
5.115 %%Presently, we do not have a probabalistic atlas which is registered to the ABA space. However, in anticipation of the availability of such maps, we would like to explore extensions to our Aim 1 techniques which can handle probabalistic maps.
5.116
5.117 +\vspace{0.3cm}\hrule
5.118
5.119 == Timeline and milestones ==
5.120
5.121 \vspace{0.3cm}**Finding marker genes**
5.122 \\ **September-November 2009**: Develop an automated mechanism for segmenting the cortical voxels into layers
5.123 \\ **November 2009 (milestone)**: Have completed construction of a flatmapped, cortical dataset with information for each layer
5.124 -\\ **October 2009-April 2010**: Develop scoring methods, dimensionality reduction, and supervised learning methods.
5.125 +\\ **October 2009-April 2010**: Develop scoring and supervised learning methods.
5.126 \\ **January 2010 (milestone)**: Submit a publication on single marker genes for cortical areas
5.127 \\ **February-July 2010**: Continue to develop scoring methods and supervised learning frameworks. Extend techniques for robustness. Compare the performance of techniques. Validate marker genes. Prepare software toolbox for Aim 1.
5.128 \\ **June 2010 (milestone)**: Submit a paper describing a method fulfilling Aim 1. Release toolbox.
5.129 \\ **July 2010 (milestone)**: Submit a paper describing combinations of marker genes for each cortical area, and a small number of marker genes that can, in combination, define most of the areas at once
5.130
5.131 \vspace{0.3cm}**Revealing new ways to parcellate a structure into regions**
5.132 -\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms for Aim 2. Explore clustering algorithms. Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
5.133 +\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms. Explore clustering algorithms. Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
5.134 \\ **March 2011 (milestone)**: Submit a paper describing a method fulfilling Aim 2. Release toolbox.
5.135 -\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, interpret the results. Prepare software toolbox for Aim 2.
5.136 +\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex, interpret the results. Prepare software toolbox for Aim 2.
5.137 \\ **May 2011 (milestone)**: Submit a paper on the genomic anatomy of the cortex, using the methods developed in Aim 2
5.138 \\ **May-August 2011**: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Possibly submit another paper.
5.139
6.1 --- a/nih-blank.cls Wed Apr 22 06:45:17 2009 -0700
6.2 +++ b/nih-blank.cls Wed Apr 22 07:06:46 2009 -0700
6.3 @@ -48,7 +48,8 @@
6.4 %% changed by bayle shanks: use .5 inch, not .49
6.5
6.6 % 0.5 inch top
6.7 -\RequirePackage[letterpaper,left=0.5in,top=0.5in,bottom=0.575in,right=0.55in,nohead,nofoot]{geometry}
6.8 +%\RequirePackage[letterpaper,left=0.5in,top=0.5in,bottom=0.575in,right=0.55in,nohead,nofoot]{geometry}
6.9 +\RequirePackage[letterpaper,left=0.5in,top=0.5in,bottom=0.52in,right=0.55in,nohead,nofoot]{geometry}
6.10
6.11 % 0.49 inch top
6.12 %\RequirePackage[letterpaper,left=0.5in,top=0.49in,bottom=0.575in,right=0.55in,nohead,nofoot]{geometry}
6.13 @@ -72,8 +73,12 @@
6.14 %%%% More code
6.15 % preamble stuff
6.16
6.17 +
6.18 +%% changed by bayle shanks
6.19 +
6.20 \renewcommand{\headrulewidth}{0pt}
6.21 -\renewcommand{\footrulewidth}{0.75pt}
6.22 +%\renewcommand{\footrulewidth}{0.75pt}
6.23 +\renewcommand{\footrulewidth}{0pt}
6.24
6.25 %%%% Changed by M A Lewis, Ph.D. (mal11 at alumni.cwru.edu)
6.26 %%%% Simplify page layout by using geometry package above.
6.27 @@ -90,7 +95,8 @@
6.28 %\renewcommand{\baselinestretch}{.9}
6.29 %\headwidth=\textwidth
6.30
6.31 -\addtolength{\headheight}{2.5pt}
6.32 +%\addtolength{\headheight}{2.5pt}
6.33 +\addtolength{\headheight}{0.5pt}
6.34
6.35 % rename the bibliography section
6.36 %\AtBeginDocument{\renewcommand{\refname}{Literature~Cited}}