cg
changeset 34:c435e5da5211
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Mon Apr 13 19:38:30 2009 -0700 (16 years ago) |
parents | 6d023f15572e |
children | 99e5d268bab0 |
files | grant.doc grant.html grant.odt grant.pdf grant.txt |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Mon Apr 13 14:53:12 2009 -0700
2.2 +++ b/grant.html Mon Apr 13 19:38:30 2009 -0700
2.3 @@ -195,8 +195,8 @@
2.4 [3 ] describes an analysis of the anatomy of the hippocampus using the ABA dataset. In addition to manual analysis,
2.5 two clustering methods were employed, a modified Non-negative Matrix Factorization (NNMF), and a hierarchial recursive
2.6 bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the
2.7 -usefulness of such research. We have run NNMF on the cortical dataset and while the results are promising (see Preliminary
2.8 -Data), we think that it will be possible to find a better method3 (we also think that more automation of the parts that this
2.9 +usefulness of such research. We have run NNMF on the cortical dataset3 and while the results are promising (see Preliminary
2.10 +Data), we think that it will be possible to find a better method (we also think that more automation of the parts that this
2.11 paper’s authors did manually will be possible).
2.12 [2 ] describes AGEA, ”Anatomic Gene Expression Atlas”. AGEA is an analysis tool for the ABA dataset. AGEA has
2.13 three components:
2.14 @@ -206,19 +206,33 @@
2.15 expression profile of the seed voxel and every other voxel.
2.16 * Clusters: AGEA includes a precomputed hierarchial clustering of voxels based on a recursive bifurcation algorithm
2.17 with correlation as the similarity metric.
2.18 -At first glance AGEA seems similar to this proposal, but in fact it is different.
2.19 -Gene Finder is different from our Aim 1 in at least four ways. First, although the user chooses a seed voxel, Gene Finder,
2.20 -not the user, chooses the cluster for which genes will be found, and in our experience it never chooses cortical areas, instead
2.21 -preferring cortical layers. Therefore, Gene Finder cannot be used to find marker genes for cortical areas. Second, Gene Finder
2.22 -finds only single genes, whereas we will also look for combinations of genes. Third, gene finder can only use overexpression
2.23 -as a marker, whereas we will also look for underexpression. Fourth, Gene Finder uses a simple pointwise metric (“expression
2.24 -energy ratio”, which captures overexpression), whereas we will also use geometric metrics such as gradient similarity.
2.25 -The hierarchial clustering is different from our Aim 2 in at least two ways. todo
2.26 -_________________________________________
2.27 +Gene Finder is different from our Aim 1 in at least four ways. First, although the user chooses a seed voxel, Gene
2.28 +Finder, not the user, chooses the cluster for which genes will be found, and in our experience it never chooses cortical areas,
2.29 +instead preferring cortical layers4. Therefore, Gene Finder cannot be used to find marker genes for cortical areas. Second,
2.30 +Gene Finder finds only single genes, whereas we will also look for combinations of genes5. Third, gene finder can only use
2.31 +overexpression as a marker, whereas in the Preliminary Data we show that underexpression can also be used. Fourth, Gene
2.32 +Finder uses a simple pointwise score6, whereas we will also use geometric metrics such as gradient similarity.
2.33 +The hierarchial clustering is different from our Aim 2 in at least three ways. First, the clustering finds clusters cor-
2.34 +responding to layers, but no clusters corresponding to areas7 8 Our Aim 2 will not be accomplished until a clustering is
2.35 +produced which yields areas. Second, AGEA uses perhaps the simplest possible similarity score (correlation), and does no
2.36 +dimensionality reduction before calculating similarity. While it is possible that a more complex system will not do any better
2.37 +than this, we believe further exploration of alternative methods of scoring and dimensionality reduction is warranted. Third,
2.38 +AGEA did not look at clusters of genes; in Preliminary Data we have shown that clusters of genes may identify intersting
2.39 +spatial subregions such as cortical areas.
2.40 +_______
2.41 3We ran “vanilla” NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft
2.42 spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was
2.43 needed. The paper under discussion mentions that they also tried a hierarchial variant of NNMF, but since they didn’t report its results, we
2.44 assume that those result were not any more impressive than the results of the non-hierarchial variant.
2.45 + 4Because of the way in which Gene Finder chooses a cluster, layers will always be preferred to areas if pairwise correlations between the gene
2.46 +expression of voxels in different areas but the same layer are stronger than pairwise correlatios between the gene expression of voxels in different
2.47 +layers but the same area. This appears to be the case.
2.48 + 5See Preliminary Data for an example of an area which cannot be marked by any single gene in the dataset, but which can be marked by a
2.49 +combination.
2.50 + 6“Expression energy ratio”, which captures overexpression.
2.51 + 7This is for the same reason as in footnote 4.
2.52 + 8There are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area
2.53 +intersection clusters, further work is needed to make sense of these.
2.54
2.55
2.56
2.57 @@ -234,12 +248,12 @@
2.58 todo
2.59 Using combinations of multiple genes is necessary and sufficient to delineate some cortical areas
2.60 Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combi-
2.61 -natorially. according to logistic regression, gene wwc14 is the best fit single gene for predicting whether or not a pixel on
2.62 +natorially. according to logistic regression, gene wwc19 is the best fit single gene for predicting whether or not a pixel on
2.63 the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure shows wwc1’s spatial expression
2.64 pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, however the gene
2.65 overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the
2.66 overshoot is the medial surface of the cortex. MO is only found on the lateral surface (todo).
2.67 -Gnee mtif25 is shown in figure the upper-right of Fig. . Mtif2 captures MO’s upper-left boundary, but not its lower-right
2.68 +Gnee mtif210 is shown in figure the upper-right of Fig. . Mtif2 captures MO’s upper-left boundary, but not its lower-right
2.69 boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these
2.70 two figures, we get the lower-left of Figure . This combination captures area MO much better than any single gene.
2.71 Correlation todo
2.72 @@ -247,16 +261,16 @@
2.73 Gradient similarity todo
2.74 Geometric and pointwise scoring methods provide complementary information
2.75 To show that local geometry can provide useful information that cannot be detected via pointwise analyses, consider Fig.
2.76 -. The top row of Fig. displays the 3 genes which most match area AUD, according to a pointwise method6. The bottom
2.77 -row displays the 3 genes which most match AUD according to a method which considers local geometry7 The pointwise
2.78 +. The top row of Fig. displays the 3 genes which most match area AUD, according to a pointwise method11. The bottom
2.79 +row displays the 3 genes which most match AUD according to a method which considers local geometry12 The pointwise
2.80 method in the top row identifies genes which express more strongly in AUD than outside of it; its weakness is that this
2.81 _________________________________________
2.82 - 4“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.83 - 5“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.84 - 6For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.85 + 9“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.86 + 10“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.87 + 11For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.88 variable was the value of the expression of the gene underneath that pixel. The resulting scores were used to rank the genes in terms of how well
2.89 they predict area AUD.
2.90 - 7For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
2.91 + 12For each gene the gradient similarity (see section ??) between (a) a map of the expression of each gene on the cortical surface and (b) the
2.92 shape of area AUD, was calculated, and this was used to rank the genes.
2.93
2.94
2.95 @@ -275,7 +289,7 @@
2.96 Forward stepwise logistic regression todo
2.97 SVM on all genes at once
2.98 In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical
2.99 -surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%8. As noted above,
2.100 +surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%13. As noted above,
2.101 however, a classifier that looks at all the genes at once isn’t practically useful.
2.102 The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many
2.103 of the most simple techniques from the field of supervised machine learning. In the parlance of machine learning, our task
2.104 @@ -291,7 +305,7 @@
2.105 todo
2.106 todo
2.107 _________________________________________
2.108 - 85-fold cross-validation.
2.109 + 135-fold cross-validation.
2.110 Research plan
2.111 todo amongst other things:
2.112 Develop algorithms that find genetic markers for anatomical regions
3.1 Binary file grant.odt has changed
4.1 Binary file grant.pdf has changed
5.1 --- a/grant.txt Mon Apr 13 14:53:12 2009 -0700
5.2 +++ b/grant.txt Mon Apr 13 19:38:30 2009 -0700
5.3 @@ -147,7 +147,7 @@
5.4 \cite{thompson_genomic_2008} describes an analysis of the anatomy of
5.5 the hippocampus using the ABA dataset. In addition to manual analysis,
5.6 two clustering methods were employed, a modified Non-negative Matrix
5.7 -Factorization (NNMF), and a hierarchial recursive bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the usefulness of such research. We have run NNMF on the cortical dataset and while the results are promising (see Preliminary Data), we think that it will be possible to find a better method\footnote{We ran "vanilla" NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was needed. The paper under discussion mentions that they also tried a hierarchial variant of NNMF, but since they didn't report its results, we assume that those result were not any more impressive than the results of the non-hierarchial variant.} (we also think that more automation of the parts that this paper's authors did manually will be possible).
5.8 +Factorization (NNMF), and a hierarchial recursive bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the usefulness of such research. We have run NNMF on the cortical dataset\footnote{We ran "vanilla" NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was needed. The paper under discussion mentions that they also tried a hierarchial variant of NNMF, but since they didn't report its results, we assume that those result were not any more impressive than the results of the non-hierarchial variant.} and while the results are promising (see Preliminary Data), we think that it will be possible to find a better method (we also think that more automation of the parts that this paper's authors did manually will be possible).
5.9
5.10
5.11 \cite{ng_anatomic_2009} describes AGEA, "Anatomic Gene Expression
5.12 @@ -164,11 +164,9 @@
5.13
5.14 * Clusters: AGEA includes a precomputed hierarchial clustering of voxels based on a recursive bifurcation algorithm with correlation as the similarity metric.
5.15
5.16 -At first glance AGEA seems similar to this proposal, but in fact it is different.
5.17 -
5.18 -Gene Finder is different from our Aim 1 in at least four ways. First, although the user chooses a seed voxel, Gene Finder, not the user, chooses the cluster for which genes will be found, and in our experience it never chooses cortical areas, instead preferring cortical layers. Therefore, Gene Finder cannot be used to find marker genes for cortical areas. Second, Gene Finder finds only single genes, whereas we will also look for combinations of genes. Third, gene finder can only use overexpression as a marker, whereas we will also look for underexpression. Fourth, Gene Finder uses a simple pointwise metric ("expression energy ratio", which captures overexpression), whereas we will also use geometric metrics such as gradient similarity.
5.19 -
5.20 -The hierarchial clustering is different from our Aim 2 in at least two ways. todo
5.21 +Gene Finder is different from our Aim 1 in at least four ways. First, although the user chooses a seed voxel, Gene Finder, not the user, chooses the cluster for which genes will be found, and in our experience it never chooses cortical areas, instead preferring cortical layers\footnote{\label{layersNotAreas}Because of the way in which Gene Finder chooses a cluster, layers will always be preferred to areas if pairwise correlations between the gene expression of voxels in different areas but the same layer are stronger than pairwise correlatios between the gene expression of voxels in different layers but the same area. This appears to be the case.}. Therefore, Gene Finder cannot be used to find marker genes for cortical areas. Second, Gene Finder finds only single genes, whereas we will also look for combinations of genes\footnote{See Preliminary Data for an example of an area which cannot be marked by any single gene in the dataset, but which can be marked by a combination.}. Third, gene finder can only use overexpression as a marker, whereas in the Preliminary Data we show that underexpression can also be used. Fourth, Gene Finder uses a simple pointwise score\footnote{"Expression energy ratio", which captures overexpression.}, whereas we will also use geometric metrics such as gradient similarity.
5.22 +
5.23 +The hierarchial clustering is different from our Aim 2 in at least three ways. First, the clustering finds clusters corresponding to layers, but no clusters corresponding to areas\footnote{This is for the same reason as in footnote \ref{layersNotAreas}.} \footnote{There are clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these.} Our Aim 2 will not be accomplished until a clustering is produced which yields areas. Second, AGEA uses perhaps the simplest possible similarity score (correlation), and does no dimensionality reduction before calculating similarity. While it is possible that a more complex system will not do any better than this, we believe further exploration of alternative methods of scoring and dimensionality reduction is warranted. Third, AGEA did not look at clusters of genes; in Preliminary Data we have shown that clusters of genes may identify intersting spatial subregions such as cortical areas.
5.24
5.25
5.26