cg
changeset 36:c1152241ab12
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Mon Apr 13 23:11:04 2009 -0700 (16 years ago) |
parents | 99e5d268bab0 |
children | af3389b432e9 |
files | grant.doc grant.html grant.odt grant.pdf grant.txt |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Mon Apr 13 20:27:32 2009 -0700
2.2 +++ b/grant.html Mon Apr 13 23:11:04 2009 -0700
2.3 @@ -155,12 +155,30 @@
2.4 their approximate location upon the cortical surface.
2.5 Even the questions of how many areas should be recognized in cortex, and what their arrangement is, are still not
2.6 completely settled. A proposed division of the cortex into areas is called a cortical map. In the rodent, the lack of a
2.7 -single agreed-upon map can be seen by contrasting the recent maps given by Swanson[?] on the one hand, and Paxinos
2.8 -and Franklin[?] on the other. While the maps are certainly very similar in their general arrangement, significant differences
2.9 +single agreed-upon map can be seen by contrasting the recent maps given by Swanson[4] on the one hand, and Paxinos
2.10 +and Franklin[3] on the other. While the maps are certainly very similar in their general arrangement, significant differences
2.11 remain in the details.
2.12 +The Allen Mouse Brain Atlas dataset
2.13 +The Allen Mouse Brain Atlas (ABA) data was produced by doing in-situ hybridization on slices of male, 56-day-old
2.14 +C57BL/6J mouse brains. Pictures were taken of the processed slice, and these pictures were semi-automatically analyzed
2.15 +in order to create a digital measurement of gene expression levels at each location in each slice. Per slice, cellular spatial
2.16 +resolution is achieved. Using this method, a single physical slice can only be used to measure one single gene; many different
2.17 +mouse brains were needed in order to measure the expression of many genes.
2.18 +Next, an automated nonlinear alignment procedure located the 2D data from the various slices in a single 3D coordinate
2.19 +system. In the final 3D coordinate system, voxels are cubes with 200 microns on a side. There are 67x41x58 = 159,326
2.20 +voxels in the 3D coordinate system, of which 51,533 are in the brain[2].
2.21 +Mus musculus, the common house mouse, is thought to contain about 22,000 protein-coding genes[6]. The ABA contains
2.22 +data on about 20,000 genes in sagittal sections, out of which over 4,000 genes are also measured in coronal sections. Our
2.23 +dataset is derived from only the coronal subset of the ABA, because the sagittal data does not cover the entire cortex,
2.24 +and has greater registration error[2]. Genes were selected by the Allen Institute for coronal sectioning based on, “classes of
2.25 +known neuroscientific interest... or through post hoc identification of a marked non-ubiquitous expression pattern”[2].
2.26 Significance
2.27 The method developed in aim (1) will be applied to each cortical area to find a set of marker genes such that the
2.28 combinatorial expression pattern of those genes uniquely picks out the target area. Finding marker genes will be useful for
2.29 +_________________________________________
2.30 + 2This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes. However, it is
2.31 +possible that the currently accepted cortical maps divide the cortex into subregions which are unnatural from the point of view of gene expression;
2.32 +perhaps there is some other way to map the cortex for which each subregion can be identified by single genes.
2.33 drug discovery as well as for experimentation because marker genes can be used to design interventions which selectively
2.34 target individual cortical areas.
2.35 The application of the marker gene finding algorithm to the cortex will also support the development of new neuroanatom-
2.36 @@ -176,10 +194,6 @@
2.37 at the patterns of gene expression.
2.38 While we do not here propose to analyze human gene expression data, it is conceivable that the methods we propose to
2.39 develop could be used to suggest modifications to the human cortical map as well.
2.40 -_________________________________________
2.41 - 2This would seem to contradict our finding in aim 1 that some cortical areas are combinatorially coded by multiple genes. However, it is
2.42 -possible that the currently accepted cortical maps divide the cortex into subregions which are unnatural from the point of view of gene expression;
2.43 -perhaps there is some other way to map the cortex for which each subregion can be identified by single genes.
2.44 Related work
2.45 There does not appear to be much work on the automated analysis of spatial gene expression data.
2.46 There is a substantial body of work on the analysis of gene expression data, however, most of this concerns gene expression
2.47 @@ -192,7 +206,7 @@
2.48 “fine-tuning” of numerical parameters. For example, we believe that domain-specific scoring measures (such as gradient
2.49 similarity, which is discussed in Preliminary Work) may be necessary in order to achieve the best results in this application.
2.50 We are aware of two existing efforts to relate spatial gene expression data to anatomy through computational methods.
2.51 -[3 ] describes an analysis of the anatomy of the hippocampus using the ABA dataset. In addition to manual analysis,
2.52 +[5 ] describes an analysis of the anatomy of the hippocampus using the ABA dataset. In addition to manual analysis,
2.53 two clustering methods were employed, a modified Non-negative Matrix Factorization (NNMF), and a hierarchial recursive
2.54 bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the
2.55 usefulness of such research. We have run NNMF on the cortical dataset3 and while the results are promising (see Preliminary
2.56 @@ -212,14 +226,7 @@
2.57 Gene Finder finds only single genes, whereas we will also look for combinations of genes5. Third, gene finder can only use
2.58 overexpression as a marker, whereas in the Preliminary Data we show that underexpression can also be used. Fourth, Gene
2.59 Finder uses a simple pointwise score6, whereas we will also use geometric metrics such as gradient similarity.
2.60 -The hierarchial clustering is different from our Aim 2 in at least three ways. First, the clustering finds clusters cor-
2.61 -responding to layers, but no clusters corresponding to areas7 8 Our Aim 2 will not be accomplished until a clustering is
2.62 -produced which yields areas. Second, AGEA uses perhaps the simplest possible similarity score (correlation), and does no
2.63 -dimensionality reduction before calculating similarity. While it is possible that a more complex system will not do any better
2.64 -than this, we believe further exploration of alternative methods of scoring and dimensionality reduction is warranted. Third,
2.65 -AGEA did not look at clusters of genes; in Preliminary Data we have shown that clusters of genes may identify intersting
2.66 -spatial subregions such as cortical areas.
2.67 -_______
2.68 +_________________________________________
2.69 3We ran “vanilla” NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft
2.70 spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was
2.71 needed. The paper under discussion mentions that they also tried a hierarchial variant of NNMF, but since they didn’t report its results, we
2.72 @@ -230,30 +237,42 @@
2.73 5See Preliminary Data for an example of an area which cannot be marked by any single gene in the dataset, but which can be marked by a
2.74 combination.
2.75 6“Expression energy ratio”, which captures overexpression.
2.76 - 7This is for the same reason as in footnote 4.
2.77 +The hierarchial clustering is different from our Aim 2 in at least three ways. First, the clustering finds clusters cor-
2.78 +responding to layers, but no clusters corresponding to areas7 8 Our Aim 2 will not be accomplished until a clustering is
2.79 +produced which yields areas. Second, AGEA uses perhaps the simplest possible similarity score (correlation), and does no
2.80 +dimensionality reduction before calculating similarity. While it is possible that a more complex system will not do any better
2.81 +than this, we believe further exploration of alternative methods of scoring and dimensionality reduction is warranted. Third,
2.82 +AGEA did not look at clusters of genes; in Preliminary Data we have shown that clusters of genes may identify intersting
2.83 +spatial subregions such as cortical areas.
2.84 +_______
2.85 + 7This is for the same reason as in footnote 4.
2.86 8There are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area
2.87 intersection clusters, further work is needed to make sense of these.
2.88 -
2.89 -
2.90 -
2.91 -Figure 1: Upper left: wwc1. Upper right: mtif2. Lower left: wwc1 + mtif2 (each pixel’s value on the lower left is the sum
2.92 -of the corresponding pixels in the upper row). Within each picture, the vertical axis roughly corresponds to anterior at the
2.93 -top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right.
2.94 -The red outline is the boundary of region MO. Pixels are colored approximately according to the density of expressing cells
2.95 -underneath each pixel, with red meaning a lot of expression and blue meaning little.
2.96 Preliminary work
2.97 Format conversion between SEV, MATLAB, NIFTI
2.98 We have created software to (politely) download all of the SEV files from the Allen Institute website. We have also created
2.99 software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret’s formats.
2.100 Flatmap of cortex
2.101 -We created a mask which selects only those voxels within the ABA atlas space which belong to cerebral cortex.
2.102 -todo
2.103 -Using Caret, [1]
2.104 -We manually entered the boundaries of each cortical area into Caret.
2.105 -Cortical layers are found at different depths in different parts of the cortex. We have manually demarcated the depth of
2.106 -the outer boundary of cortical layer 5 throughout the cortex.
2.107 -In preparation for extracting the layer-specific datasets, we have extended Caret with routines that allow the depth of
2.108 -the ROI for volume-to-surface projection to vary.
2.109 +We downloaded the ABA data and applied a mask to select only those voxels which belong to cerebral cortex. We divided
2.110 +the cortex into hemispheres.
2.111 +Using Caret[1], we created a mesh representation of the surface of the selected region. For each gene, for each node of
2.112 +the mesh, we calculated an average of the gene expression of the voxels “underneath” that mesh node. Using Caret, we then
2.113 +flattened the cortex, creating a two-dimensional mesh.
2.114 +We sampled the nodes of the irregular, flat mesh in order to create a regular grid of pixel values. We converted this grid
2.115 +into a MATLAB matrix.
2.116 +We manually traced the boundaries of each cortical area from the ABA coronal reference atlas slides. We then converted
2.117 +these manual traces into Caret-format regional boundary data on the mesh surface. Using Caret, we projected the regions
2.118 +onto the 2-d mesh, and then onto the grid, and then we converted the region data into MATLAB format.
2.119 +At this point, the data is in the form of a number of 2-D matrices, each registered to each other, with the matrix entries
2.120 +representing a grid of points (pixels) over the cortical surface:
2.121 +∙A 2-D matrix whose entries represent the regional label associated with each surface pixel
2.122 +∙For each gene, a 2-D matrix whose entries represent the average expression level underneath each surface pixel
2.123 +Rather than a single average expression level for each surface pixel, we plan to create a separate matrix for each cortical
2.124 +layer to represent the average expression level within that layer. Cortical layers are found at different depths in different
2.125 +parts of the cortex. In preparation for extracting the layer-specific datasets, we have extended Caret with routines that
2.126 +allow the depth of the ROI for volume-to-surface projection to vary.
2.127 +In the Research Plan, we describe how we will automatically locate the layer depths. For validation, we have manually
2.128 +demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
2.129 Using combinations of multiple genes is necessary and sufficient to delineate some cortical areas
2.130 Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combi-
2.131 natorially. according to logistic regression, gene wwc19 is the best fit single gene for predicting whether or not a pixel on
2.132 @@ -268,20 +287,32 @@
2.133 Conditional entropy todo
2.134 Gradient similarity todo
2.135 Geometric and pointwise scoring methods provide complementary information
2.136 +To show that local geometry can provide useful information that cannot be detected via pointwise analyses, consider Fig.
2.137 +. The top row of Fig. displays the 3 genes which most match area AUD, according to a pointwise method11. The bottom
2.138 +row displays the 3 genes which most match AUD according to a method which considers local geometry12 The pointwise
2.139 +method in the top row identifies genes which express more strongly in AUD than outside of it; its weakness is that this
2.140 +includes many areas which don’t have a salient border matching the areal border. The geometric method identifies genes
2.141 _________________________________________
2.142 9“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.143 10“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.144 + 11For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.145 +variable was the value of the expression of the gene underneath that pixel. The resulting scores were used to rank the genes in terms of how well
2.146 +they predict area AUD.
2.147 + 12For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
2.148 +shape of area AUD, was calculated, and this was used to rank the genes.
2.149
2.150 +
2.151 +
2.152 +Figure 1: Upper left: wwc1. Upper right: mtif2. Lower left: wwc1 + mtif2 (each pixel’s value on the lower left is the sum
2.153 +of the corresponding pixels in the upper row). Within each picture, the vertical axis roughly corresponds to anterior at the
2.154 +top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right.
2.155 +The red outline is the boundary of region MO. Pixels are colored approximately according to the density of expressing cells
2.156 +underneath each pixel, with red meaning a lot of expression and blue meaning little.
2.157
2.158
2.159 Figure 2: The top row shows the three genes which (individually) best predict area AUD, according to logistic regression.
2.160 The bottom row shows the three genes which (individually) best match area AUD, according to gradient similarity. From
2.161 left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a, Ptk7, Aph1a again, and Lepr
2.162 -To show that local geometry can provide useful information that cannot be detected via pointwise analyses, consider Fig.
2.163 -. The top row of Fig. displays the 3 genes which most match area AUD, according to a pointwise method11. The bottom
2.164 -row displays the 3 genes which most match AUD according to a method which considers local geometry12 The pointwise
2.165 -method in the top row identifies genes which express more strongly in AUD than outside of it; its weakness is that this
2.166 -includes many areas which don’t have a salient border matching the areal border. The geometric method identifies genes
2.167 whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes genes
2.168 which don’t express over the entire area. Genes which have high rankings using both pointwise and border criteria, such as
2.169 Aph1a in the example, may be particularly good markers. None of these genes are, individually, a perfect marker for AUD;
2.170 @@ -304,18 +335,13 @@
2.171 Specific to Aim 2 (and Aim 3)
2.172 Raw dimensionality reduction results
2.173 todo
2.174 -_________________________________________
2.175 - 11For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.176 -variable was the value of the expression of the gene underneath that pixel. The resulting scores were used to rank the genes in terms of how well
2.177 -they predict area AUD.
2.178 - 12For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
2.179 -shape of area AUD, was calculated, and this was used to rank the genes.
2.180 - 135-fold cross-validation.
2.181 (might want to incld nnMF since mentioned above)
2.182 Dimensionality reduction plus K-means or spectral clustering
2.183 Many areas are captured by clusters of genes
2.184 todo
2.185 todo
2.186 +_________________________________________
2.187 + 135-fold cross-validation.
2.188 Research plan
2.189 todo amongst other things:
2.190 Develop algorithms that find genetic markers for anatomical regions
2.191 @@ -355,10 +381,45 @@
2.192 Chinh Dang, Jason W Bohland, Hemant Bokil, Partha P Mitra, Luis Puelles, John Hohmann, David J Anderson, Ed S
2.193 Lein, Allan R Jones, and Michael Hawrylycz. An anatomic gene expression atlas of the adult mouse brain. Nat Neurosci,
2.194 12(3):356–362, March 2009.
2.195 -[3]Carol L. Thompson, Sayan D. Pathak, Andreas Jeromin, Lydia L. Ng, Cameron R. MacPherson, Marty T. Mortrud,
2.196 +[3]George Paxinos and Keith B.J. Franklin. The Mouse Brain in Stereotaxic Coordinates. Academic Press, 2 edition, July
2.197 +2001.
2.198 +[4]Larry Swanson. Brain Maps: Structure of the Rat Brain. Academic Press, 3 edition, November 2003.
2.199 +[5]Carol L. Thompson, Sayan D. Pathak, Andreas Jeromin, Lydia L. Ng, Cameron R. MacPherson, Marty T. Mortrud,
2.200 Allison Cusick, Zackery L. Riley, Susan M. Sunkin, Amy Bernard, Ralph B. Puchalski, Fred H. Gage, Allan R. Jones,
2.201 Vladimir B. Bajic, Michael J. Hawrylycz, and Ed S. Lein. Genomic anatomy of the hippocampus. Neuron, 60(6):1010–
2.202 1021, December 2008.
2.203 +[6]Robert H Waterston, Kerstin Lindblad-Toh, Ewan Birney, Jane Rogers, Josep F Abril, Pankaj Agarwal, Richa Agarwala,
2.204 +Rachel Ainscough, Marina Alexandersson, Peter An, Stylianos E Antonarakis, John Attwood, Robert Baertsch, Jonathon
2.205 +Bailey, Karen Barlow, Stephan Beck, Eric Berry, Bruce Birren, Toby Bloom, Peer Bork, Marc Botcherby, Nicolas Bray,
2.206 +Michael R Brent, Daniel G Brown, Stephen D Brown, Carol Bult, John Burton, Jonathan Butler, Robert D Campbell,
2.207 +Piero Carninci, Simon Cawley, Francesca Chiaromonte, Asif T Chinwalla, Deanna M Church, Michele Clamp, Christopher
2.208 +Clee, Francis S Collins, Lisa L Cook, Richard R Copley, Alan Coulson, Olivier Couronne, James Cuff, Val Curwen, Tim
2.209 +Cutts, Mark Daly, Robert David, Joy Davies, Kimberly D Delehaunty, Justin Deri, Emmanouil T Dermitzakis, Colin
2.210 +Dewey, Nicholas J Dickens, Mark Diekhans, Sheila Dodge, Inna Dubchak, Diane M Dunn, Sean R Eddy, Laura Elnitski,
2.211 +Richard D Emes, Pallavi Eswara, Eduardo Eyras, Adam Felsenfeld, Ginger A Fewell, Paul Flicek, Karen Foley, Wayne N
2.212 +Frankel, Lucinda A Fulton, Robert S Fulton, Terrence S Furey, Diane Gage, Richard A Gibbs, Gustavo Glusman, Sante
2.213 +Gnerre, Nick Goldman, Leo Goodstadt, Darren Grafham, Tina A Graves, Eric D Green, Simon Gregory, Roderic Guig,
2.214 +Mark Guyer, Ross C Hardison, David Haussler, Yoshihide Hayashizaki, LaDeana W Hillier, Angela Hinrichs, Wratko
2.215 +Hlavina, Timothy Holzer, Fan Hsu, Axin Hua, Tim Hubbard, Adrienne Hunt, Ian Jackson, David B Jaffe, L Steven
2.216 +Johnson, Matthew Jones, Thomas A Jones, Ann Joy, Michael Kamal, Elinor K Karlsson, Donna Karolchik, Arkadiusz
2.217 +Kasprzyk, Jun Kawai, Evan Keibler, Cristyn Kells, W James Kent, Andrew Kirby, Diana L Kolbe, Ian Korf, Raju S
2.218 +Kucherlapati, Edward J Kulbokas, David Kulp, Tom Landers, J P Leger, Steven Leonard, Ivica Letunic, Rosie Levine, Jia
2.219 +Li, Ming Li, Christine Lloyd, Susan Lucas, Bin Ma, Donna R Maglott, Elaine R Mardis, Lucy Matthews, Evan Mauceli,
2.220 +John H Mayer, Megan McCarthy, W Richard McCombie, Stuart McLaren, Kirsten McLay, John D McPherson, Jim
2.221 +Meldrim, Beverley Meredith, Jill P Mesirov, Webb Miller, Tracie L Miner, Emmanuel Mongin, Kate T Montgomery,
2.222 +Michael Morgan, Richard Mott, James C Mullikin, Donna M Muzny, William E Nash, Joanne O Nelson, Michael N
2.223 +Nhan, Robert Nicol, Zemin Ning, Chad Nusbaum, Michael J O’Connor, Yasushi Okazaki, Karen Oliver, Emma Overton-
2.224 +Larty, Lior Pachter, Gens Parra, Kymberlie H Pepin, Jane Peterson, Pavel Pevzner, Robert Plumb, Craig S Pohl, Alex
2.225 +Poliakov, Tracy C Ponce, Chris P Ponting, Simon Potter, Michael Quail, Alexandre Reymond, Bruce A Roe, Krishna M
2.226 +Roskin, Edward M Rubin, Alistair G Rust, Ralph Santos, Victor Sapojnikov, Brian Schultz, Jrg Schultz, Matthias S
2.227 +Schwartz, Scott Schwartz, Carol Scott, Steven Seaman, Steve Searle, Ted Sharpe, Andrew Sheridan, Ratna Shownkeen,
2.228 +Sarah Sims, Jonathan B Singer, Guy Slater, Arian Smit, Douglas R Smith, Brian Spencer, Arne Stabenau, Nicole Stange-
2.229 +Thomann, Charles Sugnet, Mikita Suyama, Glenn Tesler, Johanna Thompson, David Torrents, Evanne Trevaskis, John
2.230 +Tromp, Catherine Ucla, Abel Ureta-Vidal, Jade P Vinson, Andrew C Von Niederhausern, Claire M Wade, Melanie Wall,
2.231 +Ryan J Weber, Robert B Weiss, Michael C Wendl, Anthony P West, Kris Wetterstrand, Raymond Wheeler, Simon
2.232 +Whelan, Jamey Wierzbowski, David Willey, Sophie Williams, Richard K Wilson, Eitan Winter, Kim C Worley, Dudley
2.233 +Wyman, Shan Yang, Shiaw-Pyng Yang, Evgeny M Zdobnov, Michael C Zody, and Eric S Lander. Initial sequencing and
2.234 +comparative analysis of the mouse genome. Nature, 420(6915):520–62, December 2002. PMID: 12466850.
2.235
2.236 _______________________________________________________________________________________________________
2.237 stuff i dunno where to put yet (there is more scattered through grant-oldtext):
2.238 @@ -376,5 +437,6 @@
2.239 —
2.240 note:
2.241 do we need to cite: no known markers, impressive results?
2.242 + two hemis
2.243
2.244
3.1 Binary file grant.odt has changed
4.1 Binary file grant.pdf has changed
5.1 --- a/grant.txt Mon Apr 13 20:27:32 2009 -0700
5.2 +++ b/grant.txt Mon Apr 13 23:11:04 2009 -0700
5.3 @@ -119,7 +119,15 @@
5.4
5.5 Although it is known that different cortical areas have distinct roles in both normal functioning and in disease processes, there are no known marker genes for many cortical areas. When it is necessary to divide a tissue sample into cortical areas, this is a manual process that requires a skilled human to combine multiple visual cues and interpret them in the context of their approximate location upon the cortical surface.
5.6
5.7 -Even the questions of how many areas should be recognized in cortex, and what their arrangement is, are still not completely settled. A proposed division of the cortex into areas is called a cortical map. In the rodent, the lack of a single agreed-upon map can be seen by contrasting the recent maps given by Swanson\cite{brain_swanson_2003} on the one hand, and Paxinos and Franklin\cite{mouse_paxinos_2001} on the other. While the maps are certainly very similar in their general arrangement, significant differences remain in the details.
5.8 +Even the questions of how many areas should be recognized in cortex, and what their arrangement is, are still not completely settled. A proposed division of the cortex into areas is called a cortical map. In the rodent, the lack of a single agreed-upon map can be seen by contrasting the recent maps given by Swanson\cite{swanson_brain_2003} on the one hand, and Paxinos and Franklin\cite{paxinos_mouse_2001} on the other. While the maps are certainly very similar in their general arrangement, significant differences remain in the details.
5.9 +
5.10 +\vspace{0.3cm}**The Allen Mouse Brain Atlas dataset**
5.11 +
5.12 +The Allen Mouse Brain Atlas (ABA) data was produced by doing in-situ hybridization on slices of male, 56-day-old C57BL/6J mouse brains. Pictures were taken of the processed slice, and these pictures were semi-automatically analyzed in order to create a digital measurement of gene expression levels at each location in each slice. Per slice, cellular spatial resolution is achieved. Using this method, a single physical slice can only be used to measure one single gene; many different mouse brains were needed in order to measure the expression of many genes.
5.13 +
5.14 +Next, an automated nonlinear alignment procedure located the 2D data from the various slices in a single 3D coordinate system. In the final 3D coordinate system, voxels are cubes with 200 microns on a side. There are 67x41x58 \= 159,326 voxels in the 3D coordinate system, of which 51,533 are in the brain\cite{ng_anatomic_2009}.
5.15 +
5.16 +Mus musculus, the common house mouse, is thought to contain about 22,000 protein-coding genes\cite{waterston_initial_2002}. The ABA contains data on about 20,000 genes in sagittal sections, out of which over 4,000 genes are also measured in coronal sections. Our dataset is derived from only the coronal subset of the ABA, because the sagittal data does not cover the entire cortex, and has greater registration error\cite{ng_anatomic_2009}. Genes were selected by the Allen Institute for coronal sectioning based on, "classes of known neuroscientific interest... or through post hoc identification of a marked non-ubiquitous expression pattern"\cite{ng_anatomic_2009}.
5.17
5.18
5.19
5.20 @@ -179,17 +187,24 @@
5.21
5.22
5.23 === Flatmap of cortex ===
5.24 -We created a mask which selects only those voxels within the ABA atlas space which belong to cerebral cortex.
5.25 -
5.26 -todo
5.27 -
5.28 -Using Caret, \cite{van_essen_integrated_2001}
5.29 -
5.30 -We manually entered the boundaries of each cortical area into Caret.
5.31 -
5.32 -Cortical layers are found at different depths in different parts of the cortex. We have manually demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
5.33 -
5.34 -In preparation for extracting the layer-specific datasets, we have extended Caret with routines that allow the depth of the ROI for volume-to-surface projection to vary.
5.35 +We downloaded the ABA data and applied a mask to select only those voxels which belong to cerebral cortex. We divided the cortex into hemispheres.
5.36 +
5.37 +Using Caret\cite{van_essen_integrated_2001}, we created a mesh representation of the surface of the selected region. For each gene, for each node of the mesh, we calculated an average of the gene expression of the voxels "underneath" that mesh node. Using Caret, we then flattened the cortex, creating a two-dimensional mesh.
5.38 +
5.39 +We sampled the nodes of the irregular, flat mesh in order to create a regular grid of pixel values. We converted this grid into a MATLAB matrix.
5.40 +
5.41 +We manually traced the boundaries of each cortical area from the ABA coronal reference atlas slides. We then converted these manual traces into Caret-format regional boundary data on the mesh surface. Using Caret, we projected the regions onto the 2-d mesh, and then onto the grid, and then we converted the region data into MATLAB format.
5.42 +
5.43 +At this point, the data is in the form of a number of 2-D matrices, each registered to each other, with the matrix entries representing a grid of points (pixels) over the cortical surface:
5.44 +
5.45 +* A 2-D matrix whose entries represent the regional label associated with each surface pixel
5.46 +* For each gene, a 2-D matrix whose entries represent the average expression level underneath each surface pixel
5.47 +
5.48 +Rather than a single average expression level for each surface pixel, we plan to create a separate matrix for each cortical layer to represent the average expression level within that layer. Cortical layers are found at different depths in different parts of the cortex. In preparation for extracting the layer-specific datasets, we have extended Caret with routines that allow the depth of the ROI for volume-to-surface projection to vary.
5.49 +
5.50 +In the Research Plan, we describe how we will automatically locate the layer depths. For validation, we have manually demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
5.51 +
5.52 +
5.53
5.54
5.55
5.56 @@ -363,3 +378,4 @@
5.57
5.58
5.59
5.60 +two hemis