cg
changeset 48:a872ffae2d48
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Wed Apr 15 15:12:52 2009 -0700 (16 years ago) |
parents | 33c10c13f9a3 |
children | 3de5b85c50f1 |
files | grant.doc grant.html grant.odt grant.pdf |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Wed Apr 15 15:11:48 2009 -0700
2.2 +++ b/grant.html Wed Apr 15 15:12:52 2009 -0700
2.3 @@ -121,8 +121,8 @@
2.4 If one is given a dataset consisting merely of instances, with no class labels, then analysis of the dataset is referred to as
2.5 unsupervised learning in the jargon of machine learning. One thing that you can do with such a dataset is to group instances
2.6 _________________________________________
2.7 - 2By “fundamentally spatial” we mean that there is information from a large number of spatial locations; not just data which has only a few
2.8 -different locations.
2.9 + 2By “fundamentally spatial” we mean that there is information from a large number of spatial locations indexed by spatial coordinates; not
2.10 +just data which has only a few different locations or which is indexed by anatomical label.
2.11 3See Preliminary Data for an example of an area which cannot be marked by any single gene in the dataset, but which can be marked by a
2.12 combination.
2.13 4“Expression energy ratio”, which captures overexpression.
2.14 @@ -243,9 +243,10 @@
2.15 also has greater registration error[6]. Genes were selected by the Allen Institute for coronal sectioning based on, “classes of
2.16 known neuroscientific interest... or through post hoc identification of a marked non-ubiquitous expression pattern”[6].
2.17 TheABA is not the only large public spatial gene expression dataset. Other such resources include GENSAT[3],
2.18 -GenePaint[12], its sister project GeneAtlas[1], BGEM[5], EMAGE[11], EurExpress7, todo. With the exception of the ABA,
2.19 -GenePaint, and EMAGE, most of these resources, have not (yet) extracted the expression intensity from the ISH images
2.20 -and registered the results into a single 3-D space, and only ABA and EMAGE make this form of data available for public
2.21 +GenePaint[12], its sister project GeneAtlas[1], BGEM[5], EMAGE[11], EurExpress7, EADHB8, MAMEP9, Xenbase10,
2.22 +ZFIN[? ], Aniseed11, VisiGene12, GEISHA[?], Fruitfly.org[?], COMPARE[?] todo. With the exception of the ABA, Gene-
2.23 +Paint, and EMAGE, most of these resources, have not (yet) extracted the expression intensity from the ISH images and
2.24 +registered the results into a single 3-D space, and only ABA and EMAGE make this form of data available for public
2.25 download from the website. Many of these resources focus on developmental gene expression.
2.26 Significance
2.27 The method developed in aim (1) will be applied to each cortical area to find a set of marker genes such that the
2.28 @@ -270,7 +271,7 @@
2.29 between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either
2.30 of our aims, as it neither finds marker genes, nor does it suggest a cortical map based on gene expression data. Neither of
2.31 the other components of AGEA can be applied to cortical areas; AGEA’s Gene Finder cannot be used to find marker genes
2.32 -for the cortical areas; and AGEA’s hierarchial clustering does not produce clusters corresponding to the cortical areas8.
2.33 +for the cortical areas; and AGEA’s hierarchial clustering does not produce clusters corresponding to the cortical areas13.
2.34 In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes, (b) there has
2.35 been almost no comparison of different algorithms or scoring methods, and (c) there has been no work on computationally
2.36 finding marker genes for cortical areas, or on finding a hierarchial clustering that will yield a map of cortical areas de novo
2.37 @@ -279,7 +280,12 @@
2.38 genes for / reproduce the layout of cortical areas), which will provide a solid basis for comparing different methods.
2.39 _________________________________________
2.40 7http://www.eurexpress.org/ee/; EurExpress data is also entered into EMAGE
2.41 - 8In both cases, the root cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are
2.42 + 8http://www.ncl.ac.uk/ihg/EADHB/database/EADHB_database.html
2.43 + 9http://mamep.molgen.mpg.de/index.php
2.44 + 10http://xenbase.org/
2.45 + 11http://aniseed-ibdm.univ-mrs.fr/
2.46 + 12http://genome.ucsc.edu/cgi-bin/hgVisiGene ; includes data from some the other listed data sources
2.47 + 13In both cases, the root cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are
2.48 often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a pairwise voxel
2.49 correlation clustering algorithm will tend to create clusters representing cortical layers, not areas. This is why the hierarchial clustering does not
2.50 find most cortical areas (there are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have
2.51 @@ -366,8 +372,8 @@
2.52 similar direction (because the borders are similar).
2.53 Gradient similarity provides information complementary to correlation
2.54 To show that gradient similarity can provide useful information that cannot be detected via pointwise analyses, consider
2.55 -Fig. . The top row of Fig. displays the 3 genes which most match area AUD, according to a pointwise method9. The
2.56 -bottom row displays the 3 genes which most match AUD according to a method which considers local geometry10 The
2.57 +Fig. . The top row of Fig. displays the 3 genes which most match area AUD, according to a pointwise method14. The
2.58 +bottom row displays the 3 genes which most match AUD according to a method which considers local geometry15 The
2.59 pointwise method in the top row identifies genes which express more strongly in AUD than outside of it; its weakness is
2.60 that this includes many areas which don’t have a salient border matching the areal border. The geometric method identifies
2.61 genes whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes
2.62 @@ -376,14 +382,14 @@
2.63 for AUD; we deliberately chose a “difficult” area in order to better contrast pointwise with geometric methods.
2.64 Combinations of multiple genes are useful
2.65 Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combi-
2.66 -natorially. according to logistic regression, gene wwc111 is the best fit single gene for predicting whether or not a pixel on
2.67 -_________________________________________
2.68 - 9For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.69 +natorially. according to logistic regression, gene wwc116 is the best fit single gene for predicting whether or not a pixel on
2.70 +_________________________________________
2.71 + 14For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.72 variable was the value of the expression of the gene underneath that pixel. The resulting scores were used to rank the genes in terms of how well
2.73 they predict area AUD.
2.74 - 10For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
2.75 + 15For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
2.76 shape of area AUD, was calculated, and this was used to rank the genes.
2.77 - 11“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.78 + 16“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.79
2.80
2.81
2.82 @@ -396,7 +402,7 @@
2.83 pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, however the gene
2.84 overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the
2.85 overshoot is the medial surface of the cortex. MO is only found on the lateral surface (todo).
2.86 -Gene mtif212 is shown in figure the upper-right of Fig. . Mtif2 captures MO’s upper-left boundary, but not its lower-right
2.87 +Gene mtif217 is shown in figure the upper-right of Fig. . Mtif2 captures MO’s upper-left boundary, but not its lower-right
2.88 boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these
2.89 two figures, we get the lower-left of Figure . This combination captures area MO much better than any single gene.
2.90 Areas which can be identified by single genes
2.91 @@ -407,7 +413,7 @@
2.92 Forward stepwise logistic regression todo
2.93 SVM on all genes at once
2.94 In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical
2.95 -surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%13. As noted above,
2.96 +surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%18. As noted above,
2.97 however, a classifier that looks at all the genes at once isn’t practically useful.
2.98 The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many
2.99 of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task
2.100 @@ -419,8 +425,8 @@
2.101 todo
2.102 (might want to incld nnMF since mentioned above)
2.103 _________________________________________
2.104 - 12“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.105 - 135-fold cross-validation.
2.106 + 17“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.107 + 185-fold cross-validation.
2.108 Dimensionality reduction plus K-means or spectral clustering
2.109 Many areas are captured by clusters of genes
2.110 todo
3.1 Binary file grant.odt has changed
4.1 Binary file grant.pdf has changed