bshanks@0: Specific aims bshanks@0: Massive new datasets obtained with techniques such as in situ hybridization bshanks@0: (ISH) and BAC-transgenics allow the expression levels of many genes at many bshanks@0: locations to be compared. Our goal is to develop automated methods to relate bshanks@0: spatial variation in gene expression to anatomy. We want to find marker genes bshanks@0: for specific anatomical regions, and also to draw new anatomical maps based on bshanks@0: gene expression patterns. We have three specific aims: bshanks@0: (1) develop an algorithm to screen spatial gene expression data for combina- bshanks@0: tions of marker genes which selectively target anatomical regions bshanks@0: (2) develop an algorithm to suggest new ways of carving up a structure into bshanks@0: anatomical subregions, based on spatial patterns in gene expression bshanks@0: (3) create a 2-D “flat map” dataset of the mouse cerebral cortex that contains bshanks@0: a flattened version of the Allen Mouse Brain Atlas ISH data, as well as bshanks@0: the boundaries of cortical anatomical areas. Use this dataset to validate bshanks@0: the methods developed in (1) and (2). bshanks@0: In addition to validating the usefulness of the algorithms, the application of bshanks@0: these methods to cerebral cortex will produce immediate benefits, because there bshanks@0: are currently no known genetic markers for many cortical areas. The results bshanks@0: of the project will support the development of new ways to selectively target bshanks@0: cortical areas, and it will support the development of a method for identifying bshanks@0: the cortical areal boundaries present in small tissue samples. bshanks@0: All algorithms that we develop will be implemented in an open-source soft- bshanks@0: ware toolkit. The toolkit, as well as the machine-readable datasets developed bshanks@0: in aim (3), will be published and freely available for others to use. bshanks@0: Background and significance bshanks@0: Aim 1 bshanks@0: Machine learning terminology bshanks@0: The task of looking for marker genes for anatomical subregions means that one bshanks@0: is looking for a set of genes such that, if the expression level of those genes is bshanks@0: known, then the locations of the subregions can be inferred. bshanks@0: If we define the subregions so that they cover the entire anatomical structure bshanks@0: to be divided, then instead of saying that we are using gene expression to find bshanks@0: the locations of the subregions, we may say that we are using gene expression to bshanks@0: determine to which subregion each voxel within the structure belongs. We call bshanks@0: this a classification task, because each voxel is being assigned to a class (namely, bshanks@0: its subregion). bshanks@0: Therefore, an understanding of the relationship between the combination of bshanks@0: their expression levels and the locations of the subregions may be expressed as bshanks@0: 1 bshanks@0: bshanks@0: a function. The input to this function is a voxel, along with the gene expression bshanks@0: levels within that voxel; the output is the subregional identity of the target bshanks@0: voxel, that is, the subregion to which the target voxel belongs. We call this bshanks@0: function a classifier. In general, the input to a classifier is called an instance, bshanks@0: and the output is called a label. bshanks@0: The object of aim 1 is not to produce a single classifier, but rather to develop bshanks@0: an automated method for determining a classifier for any known anatomical bshanks@0: structure. Therefore, we seek a procedure by which a gene expression dataset bshanks@0: may be analyzed in concert with an anatomical atlas in order to produce a bshanks@0: classifier. Such a procedure is a type of a machine learning procedure. The bshanks@0: construction of the classifier is called training (also learning), and the initial bshanks@0: gene expression dataset used in the construction of the classifier is called training bshanks@0: data. bshanks@0: In the machine learning literature, this sort of procedure may be thought bshanks@0: of as a supervised learning task, defined as a task in whcih the goal is to learn bshanks@0: a mapping from instances to labels, and the training data consists of a set of bshanks@0: instances (voxels) for which the labels (subregions) are known. bshanks@0: Each gene expression level is called a feature, and the selection of which bshanks@0: genes to include is called feature selection. Feature selection is one component bshanks@0: of the task of learning a classifier. Some methods for learning classifiers start bshanks@0: out with a separate feature selection phase, whereas other methods combine bshanks@0: feature selection with other aspects of training. bshanks@0: One class of feature selection methods assigns some sort of score to each bshanks@0: candidate gene. The top-ranked genes are then chosen. Some scoring measures bshanks@0: can assign a score to a set of selected genes, not just to a single gene; in this bshanks@0: case, a dynamic procedure may be used in which features are added and sub- bshanks@0: tracted from the selected set depending on how much they raise the score. Such bshanks@0: procedures are called “stepwise” or “greedy”. bshanks@0: Although the classifier itself may only look at the gene expression data within bshanks@0: each voxel before classifying that voxel, the learning algorithm which constructs bshanks@0: the classifier may look over the entire dataset. We can categorize score-based bshanks@0: feature selection methods depending on how the score of calculated. Often bshanks@0: the score calculation consists of assigning a sub-score to each voxel, and then bshanks@0: aggregating these sub-scores into a final score (the aggregation is often a sum or bshanks@0: a sum of squares). If only information from nearby voxels is used to calculate a bshanks@0: voxel’s sub-score, then we say it is a local scoring method. If only information bshanks@0: from the voxel itself is used to calculate a voxel’s sub-score, then we say it is a bshanks@0: pointwise scoring method. bshanks@0: Key questions when choosing a learning method are: What are the instances? bshanks@0: What are the features? How are the features chosen? Here are four principles bshanks@0: that outline our answers to these questions. bshanks@0: Principle 1: Combinatorial gene expression bshanks@0: Above, we defined an “instance” as the combination of a voxel with the “asso- bshanks@0: ciated gene expression data”. In our case this refers to the expression level of bshanks@0: 2 bshanks@0: bshanks@0: genes within the voxel, but should we include the expression levels of all genes, bshanks@0: or only a few of them? bshanks@0: It is too much to hope that every anatomical region of interest will be iden- bshanks@0: tified by a single gene. For example, in the cortex, there are some areas which bshanks@0: are not clearly delineated by any gene included in the Allen Brain Atlas (ABA) bshanks@0: dataset. However, at least some of these areas can be delineated by looking bshanks@0: at combinations of genes (an example of an area for which multiple genes are bshanks@0: necessary and sufficient is provided in Preliminary Results). bshanks@0: Principle 2: Only look at combinations of small numbers of genes bshanks@0: When the classifier classifies a voxel, it is only allowed to look at the expression of bshanks@0: the genes which have been selected as features. The more data that is available bshanks@0: to a classifier, the better that it can do. For example, perhaps there are weak bshanks@0: correlations over many genes that add up to a strong signal. So, why not include bshanks@0: every gene as a feature? The reason is that we wish to employ the classifier in bshanks@0: situations in which it is not feasible to gather data about every gene. For bshanks@0: example, if we want to use the expression of marker genes as a trigger for some bshanks@0: regionally-targeted intervention, then our intervention must contain a molecular bshanks@0: mechanism to check the expression level of each marker gene before it triggers. bshanks@0: It is currently infeasible to design a molecular trigger that checks the level of bshanks@0: more than a handful of genes. Similarly, if the goal is to develop a procedure to bshanks@0: do ISH on tissue samples in order to label their anatomy, then it is infeasible bshanks@0: to label more than a few genes. Therefore, we must select only a few genes as bshanks@0: features. bshanks@0: Principle 3: Use geometry in feature selection bshanks@0: When doing feature selection with score-based methods, the simplest thing to bshanks@0: do would be to score the performance of each voxel by itself and then combine bshanks@0: these scores; this is pointwise scoring. A more powerful approach is to also use bshanks@0: information about the geometric relations between each voxel and its neighbors; bshanks@0: this requires non-pointwise, local scoring methods. See Preliminary Results for bshanks@0: evidence of the complementary nature of pointwise and local scoring methods. bshanks@0: Principle 4: Work in 2-D whenever possible bshanks@0: There are many anatomical structures which are commonly characterized in bshanks@0: terms of a two-dimensional manifold. When it is known that the structure that bshanks@0: one is looking for is two-dimensional, the results may be improved by allowing bshanks@0: the analysis algorithm to take advantage of this prior knowledge. In addition, bshanks@0: it is easier for humans to visualize and work with 2-D data. bshanks@0: Therefore, when possible, the instances should represent pixels, not voxels. bshanks@0: 3 bshanks@0: bshanks@0: Aim 3 bshanks@0: Background bshanks@0: The cortex is divided into areas and layers. To a first approximation, the par- bshanks@0: cellation of the cortex into areas can be drawn as a 2-D map on the surface bshanks@0: of the cortex. In the third dimension, the boundaries between the areas con- bshanks@0: tinue downwards into the cortical depth, perpendicular to the surface. The layer bshanks@0: boundaries run parallel to the surface. One can picture an area of the cortex as bshanks@0: a slice of many-layered cake. bshanks@0: Although it is known that different cortical areas have distinct roles in both bshanks@0: normal functioning and in disease processes, there are no known marker genes bshanks@0: for many cortical areas. When it is necessary to divide a tissue sample into bshanks@0: cortical areas, this is a manual process that requires a skilled human to combine bshanks@0: multiple visual cues and interpret them in the context of their approximate bshanks@0: location upon the cortical surface. bshanks@0: Even the questions of how many areas should be recognized in cortex, and bshanks@0: what their arrangement is, are still not completely settled. A proposed division bshanks@0: of the cortex into areas is called a cortical map. In the rodent, the lack of a bshanks@0: single agreed-upon map can be seen by contrasting the recent maps given by bshanks@0: Swanson?? on the one hand, and Paxinos and Franklin?? on the other. While bshanks@0: the maps are certainly very similar in their general arrangement, significant bshanks@0: differences remain in the details. bshanks@0: Significance bshanks@0: The method developed in aim (1) will be applied to each cortical area to find bshanks@0: a set of marker genes such that the combinatorial expression pattern of those bshanks@0: genes uniquely picks out the target area. Finding marker genes will be useful bshanks@0: for drug discovery as well as for experimentation because marker genes can be bshanks@0: used to design interventions which selectively target individual cortical areas. bshanks@0: The application of the marker gene finding algorithm to the cortex will bshanks@0: also support the development of new neuroanatomical methods. In addition to bshanks@0: finding markers for each individual cortical areas, we will find a small panel bshanks@0: of genes that can find many of the areal boundaries at once. This panel of bshanks@0: marker genes will allow the development of an ISH protocol that will allow bshanks@0: experimenters to more easily identify which anatomical areas are present in bshanks@0: small samples of cortex. bshanks@0: The method developed in aim (3) will provide a genoarchitectonic viewpoint bshanks@0: that will contribute to the creation of a better map. The development of present- bshanks@0: day cortical maps was driven by the application of histological stains. It is bshanks@0: conceivable that if a different set of stains had been available which identified bshanks@0: a different set of features, then the today’s cortical maps would have come out bshanks@0: differently. Since the number of classes of stains is small compared to the number bshanks@0: of genes, it is likely that there are many repeated, salient spatial patterns in bshanks@0: the gene expression which have not yet been captured by any stain. Therefore, bshanks@0: 4 bshanks@0: bshanks@0: current ideas about cortical anatomy need to incorporate what we can learn bshanks@0: from looking at the patterns of gene expression. bshanks@0: While we do not here propose to analyze human gene expression data, it is bshanks@0: conceivable that the methods we propose to develop could be used to suggest bshanks@0: modifications to the human cortical map as well. bshanks@0: Related work bshanks@0: Preliminary work bshanks@0: Justification of principles 1 thur 3 bshanks@0: Principle 1: Combinatorial gene expression bshanks@0: Here we give an example of a cortical area which is not marked by any single bshanks@0: gene, but which can be identified combinatorially. according to logistic regres- bshanks@0: sion, gene wwc11 is the best fit single gene for predicting whether or not a pixel bshanks@0: on the cortical surface belongs to the motor area (area MO). The upper-left bshanks@0: picture in Figure shows wwc1’s spatial expression pattern over the cortex. The bshanks@0: lower-right boundary of MO is represented reasonably well by this gene, however bshanks@0: the gene overshoots the upper-left boundary. This flattened 2-D representation bshanks@0: does not show it, but the area corresponding to the overshoot is the medial bshanks@0: surface of the cortex. MO is only found on the lateral surface (todo). bshanks@0: Gnee mtif22 is shown in figure the upper-right of Fig. . Mtif2 captures MO’s bshanks@0: upper-left boundary, but not its lower-right boundary. Mtif2 does not express bshanks@0: very much on the medial surface. By adding together the values at each pixel bshanks@0: in these two figures, we get the lower-left of Figure . This combination captures bshanks@0: area MO much better than any single gene. bshanks@0: Principle 2: Only look at combinations of small numbers of genes bshanks@0: In order to see how well one can do when looking at all genes at once, we ran bshanks@0: a support vector machine to classify cortical surface pixels based on their gene bshanks@0: expression profiles. We achieved classification accuracy of about 81%3. As noted bshanks@0: above, however, a classifier that looks at all the genes at once isn’t practically bshanks@0: useful. bshanks@0: The requirement to find combinations of only a small number of genes limits bshanks@0: us from straightforwardly applying many of the most simple techniques from bshanks@0: the field of supervised machine learning. In the parlance of machine learning, bshanks@0: our task combines feature selection with supervised learning. bshanks@0: __________________________ bshanks@0: 1“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652 bshanks@0: 2“mitochondrial translational initiation factor 2”; EntrezGene ID 76784 bshanks@0: 3Using the Shogun SVM package (todo:cite), with parameters type=GMNPSVM (multi- bshanks@0: class b-SVM), kernal = gaussian with sigma = 0.1, c = 10, epsilon = 1e-1 – these are the bshanks@0: first parameters we tried, so presumably performance would improve with different choices of bshanks@0: parameters. 5-fold cross-validation. bshanks@0: 5 bshanks@0: bshanks@0: bshanks@0: bshanks@0: Figure 1: Upper left: wwc1. Upper right: mtif2. Lower left: wwc1 + mtif2 bshanks@0: (each pixel’s value on the lower left is the sum of the corresponding pixels in bshanks@0: the upper row). Within each picture, the vertical axis roughly corresponds to bshanks@0: anterior at the top and posterior at the bottom, and the horizontal axis roughly bshanks@0: corresponds to medial at the left and lateral at the right. The red outline is bshanks@0: the boundary of region MO. Pixels are colored approximately according to the bshanks@0: density of expressing cells underneath each pixel, with red meaning a lot of bshanks@0: expression and blue meaning little. bshanks@0: 6 bshanks@0: bshanks@0: bshanks@0: bshanks@0: Figure 2: The top row shows the three genes which (individually) best predict bshanks@0: area AUD, according to logistic regression. The bottom row shows the three bshanks@0: genes which (individually) best match area AUD, according to gradient similar- bshanks@0: ity. From left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a, bshanks@0: Ptk7, Aph1a again, and Lepr bshanks@0: Principle 3: Use geometry bshanks@0: To show that local geometry can provide useful information that cannot be bshanks@0: detected via pointwise analyses, consider Fig. . The top row of Fig. displays bshanks@0: the 3 genes which most match area AUD, according to a pointwise method4. The bshanks@0: bottom row displays the 3 genes which most match AUD according to a method bshanks@0: which considers local geometry5 The pointwise method in the top row identifies bshanks@0: genes which express more strongly in AUD than outside of it; its weakness is that bshanks@0: this includes many areas which don’t have a salient border matching the areal bshanks@0: border. The geometric method identifies genes whose salient expression border bshanks@0: seems to partially line up with the border of AUD; its weakness is that this bshanks@0: includes genes which don’t express over the entire area. Genes which have high bshanks@0: rankings using both pointwise and border criteria, such as Aph1a in the example, bshanks@0: may be particularly good markers. None of these genes are, individually, a bshanks@0: perfect marker for AUD; we deliberately chose a “difficult” area in order to bshanks@0: better contrast pointwise with geometric methods. bshanks@0: __________________________ bshanks@0: 4For each gene, a logistic regression in which the response variable was whether or not a bshanks@0: surface pixel was within area AUD, and the predictor variable was the value of the expression bshanks@0: of the gene underneath that pixel. The resulting scores were used to rank the genes in terms bshanks@0: of how well they predict area AUD. bshanks@0: 5For each gene the gradient similarity (see section ??) between (a) a map of the expression bshanks@0: of each gene on the cortical surface and (b) the shape of area AUD, was calculated, and this bshanks@0: was used to rank the genes. bshanks@0: 7 bshanks@0: bshanks@0: Principle 4: Work in 2-D whenever possible bshanks@0: In anatomy, the manifold of interest is usually either defined by a combination bshanks@0: of two relevant anatomical axes (todo), or by the surface of the structure (as is bshanks@0: the case with the cortex). In the former case, the manifold of interest is a plane, bshanks@0: but in the latter case it is curved. If the manifold is curved, there are various bshanks@0: methods for mapping the manifold into a plane. bshanks@0: The method that we will develop will begin by mapping the data into a bshanks@0: 2-D plane. Although the manifold that characterized cortical areas is known bshanks@0: to be the cortical surface, it remains to be seen which method of mapping the bshanks@0: manifold into a plane is optimal for this application. We will compare mappings bshanks@0: which attempt to preserve size (such as the one used by Caret??) with mappings bshanks@0: which preserve angle (conformal maps). bshanks@0: Although there is much 2-D organization in anatomy, there are also struc- bshanks@0: tures whose shape is fundamentally 3-dimensional. If possible, we would like bshanks@0: the method we develop to include a statistical test that warns the user if the bshanks@0: assumption of 2-D structure seems to be wrong. bshanks@0: —— bshanks@0: Massive new datasets obtained with techniques such as in situ hybridization bshanks@0: (ISH) and BAC-transgenics allow the expression levels of many genes at many bshanks@0: locations to be compared. This can be used to find marker genes for specific bshanks@0: anatomical structures, as well as to draw new anatomical maps. Our goal is bshanks@0: to develop automated methods to relate spatial variation in gene expression to bshanks@0: anatomy. We have five specific aims: bshanks@0: (1) develop an algorithm to screen spatial gene expression data for combi- bshanks@0: nations of marker genes which selectively target individual anatomical bshanks@0: structures bshanks@0: (2) develop an algorithm to screen spatial gene expression data for combina- bshanks@0: tions of marker genes which can be used to delineate most of the bound- bshanks@0: aries between a number of anatomical structures at once bshanks@0: (3) develop an algorithm to suggest new ways of dividing a structure up into bshanks@0: anatomical subregions, based on spatial patterns in gene expression bshanks@0: (4) create a flat (2-D) map of the mouse cerebral cortex that contains a flat- bshanks@0: tened version of the Allen Mouse Brain Atlas ISH dataset, as well as the bshanks@0: boundaries of anatomical areas within the cortex. For each cortical layer, bshanks@0: a layer-specific flat dataset will be created. A single combined flat dataset bshanks@0: will be created which averages information from all of the layers. These bshanks@0: datasets will be made available in both MATLAB and Caret formats. bshanks@0: (5) validate the methods developed in (1), (2) and (3) by applying them to bshanks@0: the cerebral cortex datasets created in (4) bshanks@0: All algorithms that we develop will be implemented in an open-source soft- bshanks@0: ware toolkit. The toolkit, as well as the machine-readable datasets developed in bshanks@0: 8 bshanks@0: bshanks@0: aim (4) and any other intermediate dataset we produce, will be published and bshanks@0: freely available for others to use. bshanks@0: In addition to developing generally useful methods, the application of these bshanks@0: methods to cerebral cortex will produce immediate benefits that are only one bshanks@0: step removed from clinical application, while also supporting the development bshanks@0: of new neuroanatomical techniques. The method developed in aim (1) will be bshanks@0: applied to each cortical area to find a set of marker genes. Currently, despite bshanks@0: the distinct roles of different cortical areas in both normal functioning and bshanks@0: disease processes, there are no known marker genes for many cortical areas. bshanks@0: Finding marker genes will be immediately useful for drug discovery as well as for bshanks@0: experimentation because once marker genes for an area are known, interventions bshanks@0: can be designed which selectively target that area. bshanks@0: The method developed in aim (2) will be used to find a small panel of genes bshanks@0: that can find most of the boundaries between areas in the cortex. Today, finding bshanks@0: cortical areal boundaries in a tissue sample is a manual process that requires a bshanks@0: skilled human to combine multiple visual cues over a large area of the cortical bshanks@0: surface. A panel of marker genes will allow the development of an ISH protocol bshanks@0: that will allow experimenters to more easily identify which anatomical areas are bshanks@0: present in small samples of cortex. bshanks@0: For each cortical layer, a layer-specific flat dataset will be created. A single bshanks@0: combined flat dataset will be created which averages information from all of bshanks@0: the layers. These datasets will be made available in both MATLAB and Caret bshanks@0: formats. bshanks@0: —- bshanks@0: New techniques allow the expression levels of many genes at many locations bshanks@0: to be compared. It is thought that even neighboring anatomical structures have bshanks@0: different gene expression profiles. We propose to develop automated methods bshanks@0: to relate the spatial variation in gene expression to anatomy. We will develop bshanks@0: two kinds of techniques: bshanks@0: (a) techniques to screen for combinations of marker genes which selectively bshanks@0: target anatomical structures bshanks@0: (b) techniques to suggest new ways of dividing a structure up into anatomical bshanks@0: subregions, based on the shapes of contours in the gene expression bshanks@0: The first kind of technique will be helpful for finding marker genes associated bshanks@0: with known anatomical features. The second kind of technique will be helpful in bshanks@0: creating new anatomical maps, maps which reflect differences in gene expression bshanks@0: the same way that existing maps reflect differences in histology. bshanks@0: We intend to develop our techniques using the adult mouse cerebral cortex bshanks@0: as a testbed. The Allen Brain Atlas has collected a dataset containing the bshanks@0: expression level of about 4000 genes* over a set of over 150000 voxels, with a bshanks@0: spatial resolution of approximately 200 microns[?]. bshanks@0: We expect to discover sets of marker genes that pick out specific cortical bshanks@0: areas. This will allow the development of drugs and other interventions that bshanks@0: selectively target individual cortical areas. Therefore our research will lead bshanks@0: 9 bshanks@0: bshanks@0: to application in drug discovery, in the development of other targeted clinical bshanks@0: interventions, and in the development of new experimental techniques. bshanks@0: The best way to divide up rodent cortex into areas has not been completely bshanks@0: determined, as can be seen by the differences in the recent maps given by Swan- bshanks@0: son on the one hand, and Paxinos and Franklin on the other. It is likely that our bshanks@0: study, by showing which areal divisions naturally follow from gene expression bshanks@0: data, as opposed to traditional histological data, will contribute to the creation bshanks@0: of a better map. While we do not here propose to analyze human gene expres- bshanks@0: sion data, it is conceivable that the methods we propose to develop could be bshanks@0: used to suggest modifications to the human cortical map as well. bshanks@0: In the following, we will only be talking about coronal data. bshanks@0: The Allen Brain Atlas provides “Smoothed Energy Volumes”, which are bshanks@0: One type of artifact in the Allen Brain Atlas data is what we call a “slice bshanks@0: artifact”. We have noticed two types of slice artifacts in the dataset. The first bshanks@0: type, a “missing slice artifact”, occurs when the ISH procedure on a slice did bshanks@0: not come out well. In this case, the Allen Brain investigators excluded the slice bshanks@0: at issue from the dataset. This means that no gene expression information is bshanks@0: available for that gene for the region of space covered by that slice. This results bshanks@0: in an expression level of zero being assigned to voxels covered by the slice. This bshanks@0: is partially but not completely ameliorated by the smoothing that is applied to bshanks@0: create the Smoothed Energy Volumes. The usual end result is that a region of bshanks@0: space which is shaped and oriented like a coronal slice is marked as having less bshanks@0: gene expression than surrounding regions. bshanks@0: The second type of slice artifact is caused by the fact that all of the slices bshanks@0: have a consistent orientation. Since there may be artifacts (such as how well bshanks@0: the ISH worked) which are constant within each slice but which vary between bshanks@0: different slices, the result is that ceteris paribus, when one compares the genetic bshanks@0: data of a voxel to another voxel within the same coronal plane, one would expect bshanks@0: to find more similarity than if one compared a voxel to another voxel displaced bshanks@0: along the rostrocaudal axis. bshanks@0: We are enthusiastic about the sharing of methods, data, and results, and bshanks@0: at the conclusion of the project, we will make all of our data and computer bshanks@0: source code publically available. Our goal is that replicating our results, or bshanks@0: applying the methods we develop to other targets, will be quick and easy for bshanks@0: other investigators. In order to aid in understanding and replicating our results, bshanks@0: we intend to include a software program which, when run, will take as input bshanks@0: the Allen Brain Atlas raw data, and produce as output all numbers and charts bshanks@0: found in publications resulting from the project. bshanks@0: To aid in the replication of our results, we will include a script which takes bshanks@0: as input the dataset in aim (3) and provides as output all of the tables in figures bshanks@0: in our publications . bshanks@0: We also expect to weigh in on the debate about how to best partition rodent bshanks@0: cortex bshanks@0: be useful for drug discovery as well bshanks@0: * Another 16000 genes are available, but they do not cover the entire cerebral bshanks@0: cortex with high spatial resolution. bshanks@0: 10 bshanks@0: bshanks@0: User-definable ROIs Combinatorial gene expression Negative as well as pos- bshanks@0: itive signal Use geometry Search for local boundaries if necessary Flatmapped bshanks@0: Specific aims bshanks@0: Develop algorithms that find genetic markers for anatomical regions bshanks@0: 1. Develop scoring measures for evaluating how good individual genes are at bshanks@0: marking areas: we will compare pointwise, geometric, and information- bshanks@0: theoretic measures. bshanks@0: 2. Develop a procedure to find single marker genes for anatomical regions: for bshanks@0: each cortical area, by using or combining the scoring measures developed, bshanks@0: we will rank the genes by their ability to delineate each area. bshanks@0: 3. Extend the procedure to handle difficult areas by using combinatorial cod- bshanks@0: ing: for areas that cannot be identified by any single gene, identify them bshanks@0: with a handful of genes. We will consider both (a) algorithms that incre- bshanks@0: mentally/greedily combine single gene markers into sets, such as forward bshanks@0: stepwise regression and decision trees, and also (b) supervised learning bshanks@0: techniques which use soft constraints to minimize the number of features, bshanks@0: such as sparse support vector machines. bshanks@0: 4. Extend the procedure to handle difficult areas by combining or redrawing bshanks@0: the boundaries: An area may be difficult to identify because the bound- bshanks@0: aries are misdrawn, or because it does not “really” exist as a single area, bshanks@0: at least on the genetic level. We will develop extensions to our procedure bshanks@0: which (a) detect when a difficult area could be fit if its boundary were bshanks@0: redrawn slightly, and (b) detect when a difficult area could be combined bshanks@0: with adjacent areas to create a larger area which can be fit. bshanks@0: Apply these algorithms to the cortex bshanks@0: 1. Create open source format conversion tools: we will create tools to bulk bshanks@0: download the ABA dataset and to convert between SEV, NIFTI and MAT- bshanks@0: LAB formats. bshanks@0: 2. Flatmap the ABA cortex data: map the ABA data onto a plane and draw bshanks@0: the cortical area boundaries onto it. bshanks@0: 3. Find layer boundaries: cluster similar voxels together in order to auto- bshanks@0: matically find the cortical layer boundaries. bshanks@0: 4. Run the procedures that we developed on the cortex: we will present, for bshanks@0: each area, a short list of markers to identify that area; and we will also bshanks@0: present lists of “panels” of genes that can be used to delineate many areas bshanks@0: at once. bshanks@0: 11 bshanks@0: bshanks@0: Develop algorithms to suggest a division of a structure into anatom- bshanks@0: ical parts bshanks@0: 1. Explore dimensionality reduction algorithms applied to pixels: including bshanks@0: TODO bshanks@0: 2. Explore dimensionality reduction algorithms applied to genes: including bshanks@0: TODO bshanks@0: 3. Explore clustering algorithms applied to pixels: including TODO bshanks@0: 4. Explore clustering algorithms applied to genes: including gene shaving, bshanks@0: TODO bshanks@0: 5. Develop an algorithm to use dimensionality reduction and/or hierarchial bshanks@0: clustering to create anatomical maps bshanks@0: 6. Run this algorithm on the cortex: present a hierarchial, genoarchitectonic bshanks@0: map of the cortex bshanks@0: gradient similarity is calculated as: ∑ bshanks@0: pixels cos(abs(∠∇1 - ∠∇2)) ⋅|∇1|+|∇2| bshanks@0: 2 ⋅ bshanks@0: pixel_value1+pixel_value2 bshanks@0: 2 bshanks@0: (todo) Technically, we say that an anatomical structure has a fundamen- bshanks@0: tally 2-D organization when there exists a commonly used, generic, anatomical bshanks@0: structure-preserving map from 3-D space to a 2-D manifold. bshanks@0: Related work: bshanks@0: The Allen Brain Institute has developed an interactive web interface called bshanks@0: AGEA which allows an investigator to (1) calculate lists of genes which are se- bshanks@0: lectively overexpressed in certain anatomical regions (ABA calls this the “Gene bshanks@0: Finder” function) (2) to visualize the correlation between the genetic profiles of bshanks@0: voxels in the dataset, and (3) to visualize a hierarchial clustering of voxels in bshanks@0: the dataset [?]. AGEA is an impressive and useful tool, however, it does not bshanks@0: solve the same problems that we propose to solve with this project. bshanks@0: First we describe AGEA’s “Gene Finder”, and then compare it to our pro- bshanks@0: posed method for finding marker genes. AGEA’s Gene Finder first asks the bshanks@0: investigator to select a single “seed voxel” of interest. It then uses a clustering bshanks@0: method, combined with built-in knowledge of major anatomical structures, to bshanks@0: select two sets of voxels; an “ROI” and a “comparator region”*. The seed voxel bshanks@0: is always contained within the ROI, and the ROI is always contained within the bshanks@0: comparator region. The comparator region is similar but not identical to the bshanks@0: set of voxels making up the major anatomical region containing the ROI. Gene bshanks@0: Finder then looks for genes which can distinguish the ROI from the comparator bshanks@0: region. Specifically, it finds genes for which the ratio (expression energy in the bshanks@0: ROI) / (expression energy in the comparator region) is high. bshanks@0: Informally, the Gene Finder first infers an ROI based on clustering the seed bshanks@0: voxel with other voxels. Then, the Gene Finder finds genes which overexpress bshanks@0: in the ROI as compared to other voxels in the major anatomical region. bshanks@0: There are three major differences between our approach and Gene Finder. bshanks@0: 12 bshanks@0: bshanks@0: First, Gene Finder focuses on individual genes and individual ROIs in isola- bshanks@0: tion. This is great for regions which can be picked out from all other regions by a bshanks@0: single gene, but not all of them can (todo). There are at least two ways this can bshanks@0: miss out on useful genes. First, a gene might express in part of a region, but not bshanks@0: throughout the whole region, but there may be another gene which expresses bshanks@0: in the rest of the region*. Second, a gene might express in a region, but not in bshanks@0: any of its neighbors, but it might express also in other non-neighboring regions. bshanks@0: To take advantage of these types of genes, we propose to find combinations of bshanks@0: genes which, together, can identify the boundaries of all subregions within the bshanks@0: containing region. bshanks@0: Second, Gene Finder uses a pointwise metric, namely expression energy ratio, bshanks@0: to decide whether a gene is good for picking out a region. We have found better bshanks@0: results by using metrics which take into account not just single voxels, but also bshanks@0: the local geometry of neighboring voxels, such as the local gradient (todo). In bshanks@0: addition, we have found that often the absence of gene expression can be used bshanks@0: as a marker, which will not be caught by Gene Finder’s expression energy ratio bshanks@0: (todo). bshanks@0: Third, Gene Finder chooses the ROI based only on the seed voxel. This bshanks@0: often does not permit the user to query the ROI that they are interested in. For bshanks@0: example, in all of our tests of Gene Finder in cortex, the ROIs chosen tend to bshanks@0: be cortical layers, rather than cortical areas. bshanks@0: In summary, when Gene Finder picks the ROI that you want, and when this bshanks@0: ROI can be easily picked out from neighboring regions by single genes which bshanks@0: selectively overexpress in the ROI compared to the entire major anatomical re- bshanks@0: gion, Gene Finder will work. However, Gene Finder will not pick cortical areas bshanks@0: as ROIs, and even if it could, many cortical areas cannot be uniquely picked out bshanks@0: by the overexpression of any single gene. By contrast, we will target cortical bshanks@0: areas, we will explore a variety of metrics which can complement the shortcom- bshanks@0: ings of expression energy ratio, and we will use the combinatorial expression of bshanks@0: genes to pick out cortical areas even when no individual gene will do. bshanks@0: * The terms “ROI” and “comparator region” are our own; the ABI calls bshanks@0: them the “local region” and the “larger anatomical context”. The ABI uses the bshanks@0: term “specificity comparator” to mean the major anatomic region containing bshanks@0: the ROI, which is not exactly identical to the comparator region. bshanks@0: ** In this case, the union of the area of expression of the two genes would bshanks@0: suffice; one could also imagine that there could be situations in which the in- bshanks@0: tersection of multiple genes would be needed, or a combination of unions and bshanks@0: intersections. bshanks@0: Now we describe AGEA’s hierarchial clustering, and compare it to our pro- bshanks@0: posal. The goal of AGEA’s hierarchial clustering is to generate a binary tree of bshanks@0: clusters, where a cluster is a collection of voxels. AGEA begins by computing bshanks@0: the Pearson correlation between each pair of voxels. They then employ a recur- bshanks@0: sive divisive (top-down) hierarchial clustering procedure on the voxels, which bshanks@0: means that they start with all of the voxels, and then they divide them into clus- bshanks@0: ters, and then within each cluster, they divide that cluster into smaller clusters, bshanks@0: etc***. At each step, the collection of voxels is partitioned into two smaller bshanks@0: 13 bshanks@0: bshanks@0: clusters in a way that maximizes the following quantity: average correlation bshanks@0: between all possible pairs of voxels containing one voxel from each cluster. bshanks@0: There are three major differences between our approach and AGEA’s hier- bshanks@0: archial clustering. First, AGEA’s clustering method separates cortical layers bshanks@0: before it separates cortical areas. bshanks@0: following procedure is used for the purpose of dividing a collection of voxels bshanks@0: into smaller clusters: partition the voxels into two sets, such that the following bshanks@0: quantity is maximized: bshanks@0: *** depending on which level of the tree is being created, the voxels are bshanks@0: subsampled in order to save time bshanks@0: does not allow the user to input anything other than a seed voxel; this means bshanks@0: that for each seed voxel, there is only one bshanks@0: The role of the “local region” is to serve as a region of interest for which bshanks@0: marker genes are desired; the role of the “larger anatomical context” is to be bshanks@0: the structure bshanks@0: There are two kinds of differences between AGEA and our project; differ- bshanks@0: ences that relate to the treatment of the cortex, and differences in the type of bshanks@0: generalizable methods being developed. As relates bshanks@0: indicate an ROI bshanks@0: explore simple correlation-based relationships between voxels, genes, and bshanks@0: clusters of voxels. bshanks@0: There have not yet been any studies which describe the results of applying bshanks@0: AGEA to the cerebral cortex; however, we suspect that the AGEA metrics are bshanks@0: not optimal for the task of relating genes to cortical areas. A voxel’s gene bshanks@0: expression profile depends upon both its cortical area and its cortical layer, bshanks@0: however, AGEA has no mechanism to distinguish these two. As a result, voxels bshanks@0: in the same layer but different areas are often clustered together by AGEA. As bshanks@0: part of the project, we will compare the performance of our techniques against bshanks@0: AGEA’s. bshanks@0: — bshanks@0: The Allen Brain Institute has developed interactive tools called AGEA which bshanks@0: allow an investigator to explore simple correlation-based relationships between bshanks@0: voxels, genes, and clusters of voxels. There have not yet been any studies bshanks@0: which describe the results of applying AGEA to the cerebral cortex; however, bshanks@0: we suspect that the AGEA metrics are not optimal for the task of relating bshanks@0: genes to cortical areas. A voxel’s gene expression profile depends upon both bshanks@0: its cortical area and its cortical layer, however, AGEA has no mechanism to bshanks@0: distinguish these two. As a result, voxels in the same layer but different areas bshanks@0: are often clustered together by AGEA. As part of the project, we will compare bshanks@0: the performance of our techniques against AGEA’s. bshanks@0: Another difference between our techniques and AGEA’s is that AGEA allows bshanks@0: the user to enter only a voxel location, and then to either explore the rest of bshanks@0: the brain’s relationship to that particular voxel, or explore a partitioning of bshanks@0: the brain based on pairwise voxel correlation. If the user is interested not in a bshanks@0: single voxel, but rather an entire anatomical structure, AGEA will only succeed bshanks@0: to the extent that the selected voxel is a typical representative of the structure. bshanks@0: 14 bshanks@0: bshanks@0: As discussed in the previous paragraph, this poses problems for structures like bshanks@0: cortical areas, which (because of their division into cortical layers) do not have bshanks@0: a single “typical representative”. bshanks@0: By contrast, in our system, the user will start by selecting, not a single voxel, bshanks@0: but rather, an anatomical superstructure to be divided into pieces (for example, bshanks@0: the cerebral cortex). We expect that our methods will take into account not bshanks@0: just pairwise statistics between voxels, but also large-scale geometric features bshanks@0: (for example, the rapidity of change in gene expression as regional boundaries bshanks@0: are crossed) which optimize the discriminability of regions within the selected bshanks@0: superstructure. bshanks@0: —– bshanks@0: screen for combinations of marker genes which selectively target anatom- bshanks@0: ical structures pick delineate the boundaries between neighboring anatomical bshanks@0: structures. (b) techniques to screen for marker genes which pick out anatomical bshanks@0: structures of interest bshanks@0: , techniques which: (a) screen for marker genes , and (b) suggest new bshanks@0: anatomical maps based on bshanks@0: whose expression partitions the region of interest into its anatomical sub- bshanks@0: structures, and (b) use the natural contours of gene expression to suggest new bshanks@0: ways of dividing an organ into bshanks@0: The Allen Brain Atlas bshanks@0: – bshanks@0: to: brooksl@mail.nih.gov bshanks@0: Hi, I’m writing to confirm the applicability of a potential research project to bshanks@0: the challenge grant topic ”New computational and statistical methods for the bshanks@0: analysis of large data sets from next-generation sequencing technologies”. bshanks@0: We want to develop methods for the analysis of gene expression datasets that bshanks@0: can be used to uncover the relationships between gene expression and anatomical bshanks@0: regions. Specifically, we want to develop techniques to (a) given a set of known bshanks@0: anatomical areas, identify genetic markers for each of these areas, and (b) given bshanks@0: an anatomical structure whose substructure is unknown, suggest a map, that bshanks@0: is, a division of the space into anatomical sub-structures, that represents the bshanks@0: boundaries inherent in the gene expression data. bshanks@0: We propose to develop our techniques on the Allen Brain Atlas mouse brain bshanks@0: gene expression dataset by finding genetic markers for anatomical areas within bshanks@0: the cerebral cortex. The Allen Brain Atlas contains a registered 3-D map of bshanks@0: gene expression data with 200-micron voxel resolution which was created from bshanks@0: in situ hybridization data. The dataset contains about 4000 genes which are bshanks@0: available at this resolution across the entire cerebral cortex. bshanks@0: Despite the distinct roles of different cortical areas in both normal function- bshanks@0: ing and disease processes, there are no known marker genes for many cortical bshanks@0: areas. This project will be immediately useful for both drug discovery and clini- bshanks@0: cal research because once the markers are known, interventions can be designed bshanks@0: which selectively target specific cortical areas. bshanks@0: This techniques we develop will be useful because they will be applicable to bshanks@0: the analysis of other anatomical areas, both in terms of finding marker genes bshanks@0: 15 bshanks@0: bshanks@0: for known areas, and in terms of suggesting new anatomical subdivisions that bshanks@0: are based upon the gene expression data. bshanks@0: —- bshanks@0: It is likely that our study, by showing which areal divisions naturally fol- bshanks@0: low from gene expression data, as opposed to traditional histological data, will bshanks@0: contribute to the creation of bshanks@0: there are clear genetic or chemical markers known for only a few cortical bshanks@0: areas. This makes it difficult to target drugs to specific bshanks@0: As part of aims (1) and (5), we will discover sets of marker genes that pick bshanks@0: out specific cortical areas. This will allow the development of drugs and other bshanks@0: interventions that selectively target individual cortical areas. As part of aims bshanks@0: (2) and (5), we will also discover small panels of marker genes that can be used bshanks@0: to delineate most of the cortical areal map. bshanks@0: With aims (2) and (4), we bshanks@0: There are five principals bshanks@0: In addition to validating the usefulness of the algorithms, the application of bshanks@0: these methods to cerebral cortex will produce immediate benefits that are only bshanks@0: one step removed from clinical application. bshanks@0: todo: remember to check gensat, etc for validation (mention bias/variance) bshanks@0: Why it is useful to apply these methods to cortex bshanks@0: There is still room for debate as to exactly how the cortex should be parcellated bshanks@0: into areas. bshanks@0: The best way to divide up rodent cortex into areas has not been completely bshanks@0: determined, bshanks@0: not yet been accounted for in bshanks@0: that the expression of some genes will contain novel spatial patterns which bshanks@0: are not account bshanks@0: that a genoarchitectonic map bshanks@0: This principle is only applicable to aim 1 (marker genes). For aim 2 (partition bshanks@0: a structure in into anatomical subregions), we plan to work with many genes at bshanks@0: once. bshanks@0: tood: aim 2 b+s? bshanks@0: Principle 5: Interoperate with existing tools bshanks@0: In order for our software to be as useful as possible for our users, it will be bshanks@0: able to import and export data to standard formats so that users can use our bshanks@0: software in tandem with other software tools created by other teams. We will bshanks@0: support the following formats: NIFTI (Neuroimaging Informatics Technology bshanks@0: Initiative), SEV (Allen Brain Institute Smoothed Energy Volume), and MAT- bshanks@0: LAB. This ensures that our users will not have to exclusively rely on our tools bshanks@0: when analyzing data. For example, users will be able to use the data visualiza- bshanks@0: tion and analysis capabilities of MATLAB and Caret alongside our software. bshanks@0: 16 bshanks@0: bshanks@0: To our knowledge, there is no currently available software to convert between bshanks@0: these formats, so we will also provide a format conversion tool. This may be bshanks@0: useful even for groups that don’t use any of our other software. bshanks@0: todo: is “marker gene” even a phrase that we should use at all? bshanks@0: note for aim 1 apps: combo of genes is for voxel, not within any single cell bshanks@0: , as when genetic markers allow the development of selective interventions; bshanks@0: the reason that one can be confident that the intervention is selective is that it bshanks@0: is only turned on when a certain combination of genes is turned on and off. The bshanks@0: result procedure is what assures us that when that combination is present, the bshanks@0: local tissue is probably part of a certain subregion. bshanks@0: The basic idea is that we want to find a procedure by bshanks@0: The task of finding genes that mark anatomical areas can be phrased in bshanks@0: terms of what the field of machine learning calls a “supervised learning” task. bshanks@0: The goal of this task is to learn a function (the “classifier”) which bshanks@0: If a person knows a combination of genes that mark an area, that implies bshanks@0: that the person can be told how strong those genes express in any voxel, and bshanks@0: the person can use this information to determine how bshanks@0: finding how to infer the areal identity of a voxel if given the gene expression bshanks@0: profile of that voxel. bshanks@0: For each voxel in the cortex, we want to start with data about the gene bshanks@0: expression bshanks@0: There are various ways to look for marker genes. We will define some terms, bshanks@0: and along the way we will describe a few design choices encountered in the bshanks@0: process of creating a marker gene finding method, and then we will present four bshanks@0: principles that describe which options we have chosen. bshanks@0: In developing a procedure for finding marker genes, we are developing a bshanks@0: procedure that takes a dataset of experimental observations and produces a bshanks@0: result. One can think of the result as merely a list of genes, but really the result bshanks@0: is an understanding of a predictive relationship between, on the one hand, the bshanks@0: expression levels of genes, and, on the other hand, anatomical subregions. bshanks@0: One way to more formally define this understanding is to look at it as a bshanks@0: procedure. In this view, the result of the learning procedure is itself a procedure. bshanks@0: The result procedure provides a way to use the gene expression profiles of voxels bshanks@0: in a tissue sample in order to determine where the subregions are. bshanks@0: This result procedure can be used directly, as when an experimenter has bshanks@0: a tissue sample and needs to know what subregions are present in it, and, bshanks@0: if multiple subregions are present, where they each are. Or it can be used bshanks@0: indirectly; imagine that the result procedure tells us that whenever a certain bshanks@0: combination of genes are expressed, the local tissue is probably part of a certain bshanks@0: subregion. This means that we can then confidentally develop an intervention bshanks@0: which is triggered only when that combination of genes are expressed; and to bshanks@0: the extent that the result procedure is reliable, we know that the intervention bshanks@0: will only be triggered in the target subregion. bshanks@0: We said that the result procedure provides “a way to use the gene expression bshanks@0: profiles of voxels in a tissue sample” in order to “determine where the subregions bshanks@0: are”. bshanks@0: 17 bshanks@0: bshanks@0: Does the result procedure get as input all of the gene expression profiles bshanks@0: of each voxel in the entire tissue sample, and produce as output all of the bshanks@0: subregional boundaries all at once? bshanks@0: it is helpful for the classifier to look at the global “shape” of gene expression bshanks@0: patterns over the whole structure, rather than just nearby voxels. bshanks@0: there is some small bit of additional information that can be gleaned from bshanks@0: knowing the bshanks@0: Design choices for a supervised learning procedure bshanks@0: After all, bshanks@0: there is a small correlation between the gene expression levels from distant bshanks@0: voxels and bshanks@0: Depending on how we intend to use the classifier, we may want to design it bshanks@0: so that bshanks@0: It is possible for many things to bshanks@0: The choice of which data is made part of an instance bshanks@0: what we seek is a procedure bshanks@0: partition the tissue sample into subregions. bshanks@0: each part of the anatomical structure bshanks@0: must be One way to rephrase this task is to say that, instead of searching bshanks@0: for the location of the subregions, we are looking to partition the tissue sample bshanks@0: into subregions. bshanks@0: There are various ways to look for marker genes. We will define some terms, bshanks@0: and along the way we will describe a few design choices encountered in the bshanks@0: process of creating a marker gene finding method, and then we will present four bshanks@0: principles that describe which options we have chosen. bshanks@0: In developing a procedure for finding marker genes, we are developing a bshanks@0: procedure that takes a dataset of experimental observations and produces a bshanks@0: result. One can think of the result as merely a list of genes, but really the result bshanks@0: is an understanding of a predictive relationship between, on the one hand, the bshanks@0: expression levels of genes, and, on the other hand, anatomical subregions. bshanks@0: One way to more formally define this understanding is to look at it as a bshanks@0: procedure. In this view, the result of the learning procedure is itself a procedure. bshanks@0: The result procedure provides a way to use the gene expression profiles of voxels bshanks@0: in a tissue sample in order to determine where the subregions are. bshanks@0: This result procedure can be used directly, as when an experimenter has bshanks@0: a tissue sample and needs to know what subregions are present in it, and, bshanks@0: if multiple subregions are present, where they each are. Or it can be used bshanks@0: indirectly; imagine that the result procedure tells us that whenever a certain bshanks@0: combination of genes are expressed, the local tissue is probably part of a certain bshanks@0: subregion. This means that we can then confidentally develop an intervention bshanks@0: which is triggered only when that combination of genes are expressed; and to bshanks@0: the extent that the result procedure is reliable, we know that the intervention bshanks@0: will only be triggered in the target subregion. bshanks@0: 18 bshanks@0: bshanks@0: We said that the result procedure provides “a way to use the gene expression bshanks@0: profiles of voxels in a tissue sample” in order to “determine where the subregions bshanks@0: are”. bshanks@0: Does the result procedure get as input all of the gene expression profiles bshanks@0: of each voxel in the entire tissue sample, and produce as output all of the bshanks@0: subregional boundaries all at once? bshanks@0: Or are we given one voxel at a time, bshanks@0: In the jargon of the field of machine learning, the result procedure is called bshanks@0: a classifier. bshanks@0: The task of finding genes that mark anatomical areas can be phrased in bshanks@0: terms of what the field of machine learning calls a “supervised learning” task. bshanks@0: The goal of this task is to learn a function (the “classifier”) which bshanks@0: If a person knows a combination of genes that mark an area, that implies bshanks@0: that the person can be told how strong those genes express in any voxel, and bshanks@0: the person can use this information to determine how bshanks@0: finding how to infer the areal identity of a voxel if given the gene expression bshanks@0: profile of that voxel. bshanks@0: For each voxel in the cortex, we want to start with data about the gene bshanks@0: expression bshanks@0: single voxels, but rather groups of voxels, such that the groups can be placed bshanks@0: in some 2-D space. We will call such instances “pixels”. bshanks@0: We have been speaking as if instances necessarily correspond to single voxels. bshanks@0: But it is possible for instances to be groupings of many voxels, in which case bshanks@0: each grouping must be assigned the same label (that is, each voxel grouping bshanks@0: must stay inside a single anatomical subregion). bshanks@0: In some but not all cases, the groups are either rows or columns of voxels. bshanks@0: This is the case with the cerebral cortex, in which one may assume that columns bshanks@0: of voxels which run perpendicular to the cortical surface all share the same areal bshanks@0: identity. In the cortex, we call such an instance a “surface pixel”, because such bshanks@0: an instance represents the data associated with all voxels underneath a specific bshanks@0: patch of the cortical surface. bshanks@0: 19 bshanks@0: bshanks@0: