cg
changeset 52:074e2be60b38
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Fri Apr 17 12:48:50 2009 -0700 (16 years ago) |
parents | 3ebb8f4ea921 |
children | 304d07e0ac94 |
files | grant.doc grant.html grant.odt |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Fri Apr 17 12:47:51 2009 -0700
2.2 +++ b/grant.html Fri Apr 17 12:48:50 2009 -0700
2.3 @@ -250,10 +250,10 @@
2.4 known neuroscientific interest... or through post hoc identification of a marked non-ubiquitous expression pattern”[6].
2.5 The ABA is not the only large public spatial gene expression dataset. Other such resources include GENSAT[3],
2.6 GenePaint[12], its sister project GeneAtlas[1], BGEM[5], EMAGE[11], EurExpress6, EADHB7, MAMEP8, Xenbase9, ZFIN[?],
2.7 -Aniseed10, VisiGene11, GEISHA[?], Fruitfly.org[?], COMPARE[?] todo. With the exception of the ABA, GenePaint, and
2.8 +Aniseed10, VisiGene11, GEISHA[?], Fruitfly.org[?], COMPARE12 todo. With the exception of the ABA, GenePaint, and
2.9 EMAGE, most of these resources have not (yet) extracted the expression intensity from the ISH images and registered the
2.10 results into a single 3-D space, and only ABA and EMAGE make this form of data available for public download from the
2.11 -website12. Many of these resources focus on developmental gene expression.
2.12 +website13. Many of these resources focus on developmental gene expression.
2.13 Significance
2.14 The method developed in aim (1) will be applied to each cortical area to find a set of marker genes such that the
2.15 combinatorial expression pattern of those genes uniquely picks out the target area. Finding marker genes will be useful for
2.16 @@ -277,7 +277,7 @@
2.17 between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either
2.18 of our aims, as it neither finds marker genes, nor does it suggest a cortical map based on gene expression data. Neither of
2.19 the other components of AGEA can be applied to cortical areas; AGEA’s Gene Finder cannot be used to find marker genes
2.20 -for the cortical areas; and AGEA’s hierarchial clustering does not produce clusters corresponding to the cortical areas13.
2.21 +for the cortical areas; and AGEA’s hierarchial clustering does not produce clusters corresponding to the cortical areas14.
2.22 In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes, (b) there has
2.23 been almost no comparison of different algorithms or scoring methods, and (c) there has been no work on computationally
2.24 finding marker genes for cortical areas, or on finding a hierarchial clustering that will yield a map of cortical areas de novo
2.25 @@ -289,8 +289,9 @@
2.26 9http://xenbase.org/
2.27 10http://aniseed-ibdm.univ-mrs.fr/
2.28 11http://genome.ucsc.edu/cgi-bin/hgVisiGene ; includes data from some the other listed data sources
2.29 - 12without prior offline registration
2.30 - 13In both cases, the root cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are
2.31 + 12http://compare.ibdml.univ-mrs.fr/
2.32 + 13without prior offline registration
2.33 + 14In both cases, the root cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are
2.34 often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a pairwise voxel
2.35 correlation clustering algorithm will tend to create clusters representing cortical layers, not areas. This is why the hierarchial clustering does not
2.36 find most cortical areas (there are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have
2.37 @@ -379,8 +380,8 @@
2.38 similar direction (because the borders are similar).
2.39 Gradient similarity provides information complementary to correlation
2.40 To show that gradient similarity can provide useful information that cannot be detected via pointwise analyses, consider
2.41 -Fig. . The top row of Fig. displays the 3 genes which most match area AUD, according to a pointwise method14. The
2.42 -bottom row displays the 3 genes which most match AUD according to a method which considers local geometry15 The
2.43 +Fig. . The top row of Fig. displays the 3 genes which most match area AUD, according to a pointwise method15. The
2.44 +bottom row displays the 3 genes which most match AUD according to a method which considers local geometry16 The
2.45 pointwise method in the top row identifies genes which express more strongly in AUD than outside of it; its weakness is
2.46 that this includes many areas which don’t have a salient border matching the areal border. The geometric method identifies
2.47 genes whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes
2.48 @@ -389,14 +390,14 @@
2.49 for AUD; we deliberately chose a “difficult” area in order to better contrast pointwise with geometric methods.
2.50 Combinations of multiple genes are useful
2.51 Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combi-
2.52 -natorially. according to logistic regression, gene wwc116 is the best fit single gene for predicting whether or not a pixel on
2.53 +natorially. according to logistic regression, gene wwc117 is the best fit single gene for predicting whether or not a pixel on
2.54 _________________________________________
2.55 - 14For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.56 + 15For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.57 variable was the value of the expression of the gene underneath that pixel. The resulting scores were used to rank the genes in terms of how well
2.58 they predict area AUD.
2.59 - 15For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
2.60 + 16For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
2.61 shape of area AUD, was calculated, and this was used to rank the genes.
2.62 - 16“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.63 + 17“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.64
2.65
2.66
2.67 @@ -409,7 +410,7 @@
2.68 pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, however the gene
2.69 overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the
2.70 overshoot is the medial surface of the cortex. MO is only found on the lateral surface (todo).
2.71 -Gene mtif217 is shown in figure the upper-right of Fig. . Mtif2 captures MO’s upper-left boundary, but not its lower-right
2.72 +Gene mtif218 is shown in figure the upper-right of Fig. . Mtif2 captures MO’s upper-left boundary, but not its lower-right
2.73 boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these
2.74 two figures, we get the lower-left of Figure . This combination captures area MO much better than any single gene.
2.75 Areas which can be identified by single genes
2.76 @@ -420,7 +421,7 @@
2.77 Forward stepwise logistic regression todo
2.78 SVM on all genes at once
2.79 In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical
2.80 -surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%18. As noted above,
2.81 +surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%19. As noted above,
2.82 however, a classifier that looks at all the genes at once isn’t practically useful.
2.83 The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many
2.84 of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task
2.85 @@ -432,8 +433,8 @@
2.86 todo
2.87 (might want to incld nnMF since mentioned above)
2.88 _________________________________________
2.89 - 17“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.90 - 185-fold cross-validation.
2.91 + 18“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.92 + 195-fold cross-validation.
2.93 Dimensionality reduction plus K-means or spectral clustering
2.94 Many areas are captured by clusters of genes
2.95 todo
3.1 Binary file grant.odt has changed