cg
changeset 17:ff9b47f2c7d3
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Sun Apr 12 04:01:58 2009 -0700 (16 years ago) |
parents | 796116742ec5 |
children | 5d6dfc57654a |
files | grant.doc grant.html grant.odt grant.pdf grant.txt |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Sun Apr 12 03:39:30 2009 -0700
2.2 +++ b/grant.html Sun Apr 12 04:01:58 2009 -0700
2.3 @@ -5,14 +5,14 @@
2.4 spatial variation in gene expression to anatomy. We want to find marker genes
2.5 for specific anatomical regions, and also to draw new anatomical maps based on
2.6 gene expression patterns. We have three specific aims:
2.7 - (1) develop an algorithm to screen spatial gene expression data for combina-
2.8 - tions of marker genes which selectively target anatomical regions
2.9 - (2) develop an algorithm to suggest new ways of carving up a structure into
2.10 - anatomical subregions, based on spatial patterns in gene expression
2.11 - (3) create a 2-D “flat map” dataset of the mouse cerebral cortex that contains
2.12 - a flattened version of the Allen Mouse Brain Atlas ISH data, as well as
2.13 - the boundaries of cortical anatomical areas. Use this dataset to validate
2.14 - the methods developed in (1) and (2).
2.15 + (1) develop an algorithm to screen spatial gene expression data for combi-
2.16 + nations of marker genes which selectively target anatomical regions
2.17 + (2) develop an algorithm to suggest new ways of carving up a structure into
2.18 + anatomical subregions, based on spatial patterns in gene expression
2.19 + (3) create a 2-D “flat map” dataset of the mouse cerebral cortex that con-
2.20 + tains a flattened version of the Allen Mouse Brain Atlas ISH data, as well as
2.21 + the boundaries of cortical anatomical areas. Use this dataset to validate the
2.22 + methods developed in (1) and (2).
2.23 In addition to validating the usefulness of the algorithms, the application of
2.24 these methods to cerebral cortex will produce immediate benefits, because there
2.25 are currently no known genetic markers for many cortical areas. The results
2.26 @@ -35,10 +35,10 @@
2.27 this a classification task, because each voxel is being assigned to a class (namely,
2.28 its subregion).
2.29 Therefore, an understanding of the relationship between the combination of
2.30 + 1
2.31 +
2.32 their expression levels and the locations of the subregions may be expressed as
2.33 a function. The input to this function is a voxel, along with the gene expression
2.34 - 1
2.35 -
2.36 levels within that voxel; the output is the subregional identity of the target
2.37 voxel, that is, the subregion to which the target voxel belongs. We call this
2.38 function a classifier. In general, the input to a classifier is called an instance,
2.39 @@ -83,10 +83,10 @@
2.40 Above, we defined an “instance” as the combination of a voxel with the
2.41 “associated gene expression data”. In our case this refers to the expression level
2.42 of genes within the voxel, but should we include the expression levels of all
2.43 + 2
2.44 +
2.45 genes, or only a few of them?
2.46 It is too much to hope that every anatomical region of interest will be iden-
2.47 - 2
2.48 -
2.49 tified by a single gene. For example, in the cortex, there are some areas which
2.50 are not clearly delineated by any gene included in the Allen Brain Atlas (ABA)
2.51 dataset. However, at least some of these areas can be delineated by looking
2.52 @@ -128,11 +128,11 @@
2.53 of machine learning. One thing that you can do with such a dataset is to group
2.54 instances together. A set of similar instances is called a cluster, and the activity
2.55 of finding grouping the data into clusters is called clustering or cluster analysis.
2.56 + 3
2.57 +
2.58 The task of deciding how to carve up a structure into anatomical subregions
2.59 can be put into these terms. The instances are once again voxels (or pixels)
2.60 along with their associated gene expression profiles. We make the assumption
2.61 - 3
2.62 -
2.63 that voxels from the same subregion have similar gene expression profiles, at
2.64 least compared to the other subregions. This means that clustering voxels is
2.65 the same as finding potential subregions; we seek a partitioning of the voxels
2.66 @@ -176,11 +176,11 @@
2.67 reduction. The small set of features that such a technique yields is called the
2.68 reduced feature set. After the reduced feature set is created, the instances may
2.69 be replaced by reduced instances, which have as their features the reduced fea-
2.70 + 4
2.71 +
2.72 ture set rather than the original feature set of all gene expression levels. Note
2.73 that the features in the reduced feature set do not necessarily correspond to
2.74 genes; each feature in the reduced set may be any function of the set of gene
2.75 - 4
2.76 -
2.77 expression levels.
2.78 Another use for dimensionality reduction is to visualize the relationships
2.79 between subregions. For example, one might want to make a 2-D plot upon
2.80 @@ -217,9 +217,7 @@
2.81 parcellation of the cortex into areas can be drawn as a 2-D map on the surface of
2.82 the cortex. In the third dimension, the boundaries between the areas continue
2.83 downwards into the cortical depth, perpendicular to the surface. The layer
2.84 - boundaries run parallel to the surface. One can picture an area of the cortex as
2.85 - a slice of many-layered cake.
2.86 -___
2.87 +__________________________
2.88 1This would seem to contradict our finding in aim 1 that some cortical areas are combina-
2.89 torially coded by multiple genes. However, it is possible that the currently accepted cortical
2.90 maps divide the cortex into subregions which are unnatural from the point of view of gene
2.91 @@ -227,6 +225,8 @@
2.92 be identified by single genes.
2.93 5
2.94
2.95 + boundaries run parallel to the surface. One can picture an area of the cortex as
2.96 + a slice of many-layered cake.
2.97 Although it is known that different cortical areas have distinct roles in both
2.98 normal functioning and in disease processes, there are no known marker genes
2.99 for many cortical areas. When it is necessary to divide a tissue sample into
2.100 @@ -292,6 +292,9 @@
2.101 very much on the medial surface. By adding together the values at each pixel
2.102 in these two figures, we get the lower-left of Figure . This combination captures
2.103 area MO much better than any single gene.
2.104 + Correlation todo
2.105 + Conditional entropy todo
2.106 + Gradient similarity todo
2.107 Geometric and pointwise scoring methods provide complementary
2.108 information
2.109 To show that local geometry can provide useful information that cannot be
2.110 @@ -302,9 +305,6 @@
2.111 genes which express more strongly in AUD than outside of it; its weakness is that
2.112 this includes many areas which don’t have a salient border matching the areal
2.113 border. The geometric method identifies genes whose salient expression border
2.114 - seems to partially line up with the border of AUD; its weakness is that this
2.115 - includes genes which don’t express over the entire area. Genes which have high
2.116 - rankings using both pointwise and border criteria, such as Aph1a in the example,
2.117 __________________________
2.118 2“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.119 3“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.120 @@ -336,13 +336,17 @@
2.121 genes which (individually) best match area AUD, according to gradient similar-
2.122 ity. From left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a,
2.123 Ptk7, Aph1a again, and Lepr
2.124 + seems to partially line up with the border of AUD; its weakness is that this
2.125 + includes genes which don’t express over the entire area. Genes which have high
2.126 + rankings using both pointwise and border criteria, such as Aph1a in the example,
2.127 may be particularly good markers. None of these genes are, individually, a
2.128 perfect marker for AUD; we deliberately chose a “difficult” area in order to
2.129 better contrast pointwise with geometric methods.
2.130 Areas which can be identified by single genes
2.131 todo
2.132 Aim 1 (and Aim 3)
2.133 - SVM on all genes at once
2.134 + Forward stepwise logistic regression todo
2.135 + SVM on all genes at once
2.136 In order to see how well one can do when looking at all genes at once, we
2.137 ran a support vector machine to classify cortical surface pixels based on their
2.138 gene expression profiles. We achieved classification accuracy of about 81%6.
2.139 @@ -350,17 +354,17 @@
2.140 practically useful.
2.141 The requirement to find combinations of only a small number of genes limits
2.142 us from straightforwardly applying many of the most simple techniques from
2.143 - the field of supervised machine learning. In the parlance of machine learning,
2.144 - our task combines feature selection with supervised learning.
2.145 - Decision trees
2.146 - todo
2.147 -____________________
2.148 +__________________________
2.149 6Using the Shogun SVM package (todo:cite), with parameters type=GMNPSVM (multi-
2.150 class b-SVM), kernal = gaussian with sigma = 0.1, c = 10, epsilon = 1e-1 – these are the
2.151 first parameters we tried, so presumably performance would improve with different choices of
2.152 parameters. 5-fold cross-validation.
2.153 9
2.154
2.155 + the field of supervised machine learning. In the parlance of machine learning,
2.156 + our task combines feature selection with supervised learning.
2.157 + Decision trees
2.158 + todo
2.159 Aim 2 (and Aim 3)
2.160 Raw dimensionality reduction results
2.161 Dimensionality reduction plus K-means or spectral clus-
2.162 @@ -393,12 +397,12 @@
2.163 which (a) detect when a difficult area could be fit if its boundary were
2.164 redrawn slightly, and (b) detect when a difficult area could be combined
2.165 with adjacent areas to create a larger area which can be fit.
2.166 + 10
2.167 +
2.168 Apply these algorithms to the cortex
2.169 1. Create open source format conversion tools: we will create tools to bulk
2.170 download the ABA dataset and to convert between SEV, NIFTI and MAT-
2.171 LAB formats.
2.172 - 10
2.173 -
2.174 2. Flatmap the ABA cortex data: map the ABA data onto a plane and draw
2.175 the cortical area boundaries onto it.
2.176 3. Find layer boundaries: cluster similar voxels together in order to auto-
2.177 @@ -432,13 +436,16 @@
2.178 The method that we will develop will begin by mapping the data into a
2.179 2-D plane. Although the manifold that characterized cortical areas is known
2.180 to be the cortical surface, it remains to be seen which method of mapping the
2.181 -manifold into a plane is optimal for this application. We will compare mappings
2.182 -which attempt to preserve size (such as the one used by Caret??) with mappings
2.183 -which preserve angle (conformal maps).
2.184 - Although there is much 2-D organization in anatomy, there are also struc-
2.185 -tures whose shape is fundamentally 3-dimensional. If possible, we would like
2.186 -the method we develop to include a statistical test that warns the user if the
2.187 -assumption of 2-D structure seems to be wrong.
2.188 11
2.189
2.190 -
2.191 + manifold into a plane is optimal for this application. We will compare mappings
2.192 + which attempt to preserve size (such as the one used by Caret??) with mappings
2.193 + which preserve angle (conformal maps).
2.194 + Although there is much 2-D organization in anatomy, there are also struc-
2.195 + tures whose shape is fundamentally 3-dimensional. If possible, we would like
2.196 + the method we develop to include a statistical test that warns the user if the
2.197 + assumption of 2-D structure seems to be wrong.
2.198 + todo: replace aim # bullet pts with #s
2.199 + 12
2.200 +
2.201 +
3.1 Binary file grant.odt has changed
4.1 Binary file grant.pdf has changed
5.1 --- a/grant.txt Sun Apr 12 03:39:30 2009 -0700
5.2 +++ b/grant.txt Sun Apr 12 04:01:58 2009 -0700
5.3 @@ -1,10 +1,12 @@
5.4 == Specific aims ==
5.5
5.6 -Massive new datasets obtained with techniques such as in situ hybridization (ISH) and BAC-transgenics allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated methods to relate spatial variation in gene expression to anatomy. We want to find marker genes for specific anatomical regions, and also to draw new anatomical maps based on gene expression patterns. We have three specific aims:
5.7 -
5.8 -(1) develop an algorithm to screen spatial gene expression data for combinations of marker genes which selectively target anatomical regions
5.9 -(2) develop an algorithm to suggest new ways of carving up a structure into anatomical subregions, based on spatial patterns in gene expression
5.10 -(3) create a 2-D "flat map" dataset of the mouse cerebral cortex that contains a flattened version of the Allen Mouse Brain Atlas ISH data, as well as the boundaries of cortical anatomical areas. Use this dataset to validate the methods developed in (1) and (2).
5.11 +Massive new datasets obtained with techniques such as in situ hybridization (ISH) and BAC-transgenics allow the expression levels of many genes at many locations to be compared. Our goal is to develop automated methods to relate spatial variation in gene expression to anatomy. We want to find marker genes for specific anatomical regions, and also to draw new anatomical maps based on gene expression patterns. We have three specific aims:\\
5.12 +
5.13 +(1) develop an algorithm to screen spatial gene expression data for combinations of marker genes which selectively target anatomical regions\\
5.14 +
5.15 +(2) develop an algorithm to suggest new ways of carving up a structure into anatomical subregions, based on spatial patterns in gene expression\\
5.16 +
5.17 +(3) create a 2-D "flat map" dataset of the mouse cerebral cortex that contains a flattened version of the Allen Mouse Brain Atlas ISH data, as well as the boundaries of cortical anatomical areas. Use this dataset to validate the methods developed in (1) and (2).\\
5.18
5.19 In addition to validating the usefulness of the algorithms, the application of these methods to cerebral cortex will produce immediate benefits, because there are currently no known genetic markers for many cortical areas. The results of the project will support the development of new ways to selectively target cortical areas, and it will support the development of a method for identifying the cortical areal boundaries present in small tissue samples.
5.20
5.21 @@ -164,7 +166,14 @@
5.22 \caption{Upper left: $wwc1$. Upper right: $mtif2$. Lower left: wwc1 + mtif2 (each pixel's value on the lower left is the sum of the corresponding pixels in the upper row). Within each picture, the vertical axis roughly corresponds to anterior at the top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right. The red outline is the boundary of region MO. Pixels are colored approximately according to the density of expressing cells underneath each pixel, with red meaning a lot of expression and blue meaning little.}
5.23 \end{figure}
5.24
5.25 -
5.26 +**Correlation**
5.27 +todo
5.28 +
5.29 +**Conditional entropy**
5.30 +todo
5.31 +
5.32 +**Gradient similarity**
5.33 +todo
5.34
5.35 **Geometric and pointwise scoring methods provide complementary information**
5.36
5.37 @@ -191,7 +200,8 @@
5.38
5.39
5.40 === Aim 1 (and Aim 3) ===
5.41 -
5.42 +**Forward stepwise logistic regression**
5.43 +todo
5.44
5.45
5.46 **SVM on all genes at once**
5.47 @@ -283,3 +293,6 @@
5.48
5.49 Although there is much 2-D organization in anatomy, there are also structures whose shape is fundamentally 3-dimensional. If possible, we would like the method we develop to include a statistical test that warns the user if the assumption of 2-D structure seems to be wrong.
5.50
5.51 +
5.52 +
5.53 +todo: replace aim # bullet pts with #s