cg

diff grant.html @ 27:5db0420abbb6

.
author bshanks@bshanks.dyndns.org
date Mon Apr 13 03:25:42 2009 -0700 (16 years ago)
parents 9d0cc9c66ecd
children 01c118d1074b
line diff
1.1 --- a/grant.html Mon Apr 13 03:22:01 2009 -0700 1.2 +++ b/grant.html Mon Apr 13 03:25:42 2009 -0700 1.3 @@ -67,9 +67,9 @@ 1.4 tracted from the selected set depending on how much they raise the score. Such 1.5 procedures are called “stepwise” or “greedy”. 1.6 Although the classifier itself may only look at the gene expression data within 1.7 + 2 1.8 + 1.9 each voxel before classifying that voxel, the learning algorithm which constructs 1.10 - 2 1.11 - 1.12 the classifier may look over the entire dataset. We can categorize score-based 1.13 feature selection methods depending on how the score of calculated. Often 1.14 the score calculation consists of assigning a sub-score to each voxel, and then 1.15 @@ -112,12 +112,12 @@ 1.16 to do would be to score the performance of each voxel by itself and then com- 1.17 bine these scores (pointwise scoring). A more powerful approach is to also use 1.18 information about the geometric relations between each voxel and its neighbors; 1.19 + 3 1.20 + 1.21 this requires non-pointwise, local scoring methods. See Preliminary Results for 1.22 evidence of the complementary nature of pointwise and local scoring methods. 1.23 Principle 4: Work in 2-D whenever possible 1.24 There are many anatomical structures which are commonly characterized in 1.25 - 3 1.26 - 1.27 terms of a two-dimensional manifold. When it is known that the structure that 1.28 one is looking for is two-dimensional, the results may be improved by allowing 1.29 the analysis algorithm to take advantage of this prior knowledge. In addition, 1.30 @@ -154,6 +154,8 @@ 1.31 special type of clustering task because we have an additional constraint on 1.32 clusters; voxels grouped together into a cluster must be spatially contiguous. 1.33 In Preliminary Results, we show that one can get reasonable results without 1.34 + 4 1.35 + 1.36 enforcing this constraint, however, we plan to compare these results against 1.37 other methods which guarantee contiguous clusters. 1.38 Perhaps the biggest source of continguous clustering algorithms is the field 1.39 @@ -162,8 +164,6 @@ 1.40 image into clusters, usually contiguous clusters. Aim 2 is similar to an image 1.41 segmentation task. There are two main differences; in our task, there are thou- 1.42 sands of color channels (one for each gene), rather than just three. There are 1.43 - 4 1.44 - 1.45 imaging tasks which use more than three colors, however, for example multispec- 1.46 tral imaging and hyperspectral imaging, which are often used to process satellite 1.47 imagery. A more crucial difference is that there are various cues which are ap- 1.48 @@ -200,6 +200,8 @@ 1.49 Clustering genes rather than voxels 1.50 Although the ultimate goal is to cluster the instances (voxels or pixels), one 1.51 strategy to achieve this goal is to first cluster the features (genes). There are 1.52 + 5 1.53 + 1.54 two ways that clusters of genes could be used. 1.55 Gene clusters could be used as part of dimensionality reduction: rather than 1.56 have one feature for each gene, we could have one reduced feature for each gene 1.57 @@ -208,8 +210,6 @@ 1.58 This is because many genes have an expression pattern which seems to pick 1.59 out a single, spatially continguous subregion. Therefore, it seems likely that an 1.60 anatomically interesting subregion will have multiple genes which each individ- 1.61 - 5 1.62 - 1.63 ually pick it out1. This suggests the following procedure: cluster together genes 1.64 which pick out similar subregions, and then to use the more popular common 1.65 subregions as the final clusters. In the Preliminary Data we show that a num- 1.66 @@ -240,6 +240,14 @@ 1.67 Significance 1.68 The method developed in aim (1) will be applied to each cortical area to find 1.69 a set of marker genes such that the combinatorial expression pattern of those 1.70 +__________________________ 1.71 + 1This would seem to contradict our finding in aim 1 that some cortical areas are combina- 1.72 +torially coded by multiple genes. However, it is possible that the currently accepted cortical 1.73 +maps divide the cortex into subregions which are unnatural from the point of view of gene 1.74 +expression; perhaps there is some other way to map the cortex for which each subregion can 1.75 +be identified by single genes. 1.76 + 6 1.77 + 1.78 genes uniquely picks out the target area. Finding marker genes will be useful 1.79 for drug discovery as well as for experimentation because marker genes can be 1.80 used to design interventions which selectively target individual cortical areas. 1.81 @@ -248,14 +256,6 @@ 1.82 finding markers for each individual cortical areas, we will find a small panel 1.83 of genes that can find many of the areal boundaries at once. This panel of 1.84 marker genes will allow the development of an ISH protocol that will allow 1.85 -__________________________ 1.86 - 1This would seem to contradict our finding in aim 1 that some cortical areas are combina- 1.87 -torially coded by multiple genes. However, it is possible that the currently accepted cortical 1.88 -maps divide the cortex into subregions which are unnatural from the point of view of gene 1.89 -expression; perhaps there is some other way to map the cortex for which each subregion can 1.90 -be identified by single genes. 1.91 - 6 1.92 - 1.93 experimenters to more easily identify which anatomical areas are present in 1.94 small samples of cortex. 1.95 The method developed in aim (3) will provide a genoarchitectonic viewpoint 1.96 @@ -292,21 +292,20 @@ 1.97 anatomy through computational methods. 1.98 [?] describes an analysis of the anatomy of the hippocampus using the ABA 1.99 dataset. In addition to manual analysis, two clustering methods were employed, 1.100 + 7 1.101 + 1.102 a modified Non-negative Matrix Factorization (NNMF), and a hierarchial bifur- 1.103 cation clustering scheme based on correlation as the similarity score. The paper 1.104 yielded impressive results, proving the usefulness of such research. We have run 1.105 NNMF on the cortical dataset and while the results are promising (see Prelim- 1.106 inary Data), we think that it will be possible to find a better method2 (we also 1.107 + think that more automation of the parts that this paper’s authors did manually 1.108 + will be possible). 1.109 + and [?] describes AGEA. todo 1.110 __________________________ 1.111 2We ran “vanilla” NNMF, whereas the paper under discussion used a modified method. 1.112 Their main modification consisted of adding a soft spatial contiguity constraint. However, 1.113 on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional 1.114 - 7 1.115 - 1.116 - think that more automation of the parts that this paper’s authors did manually 1.117 - will be possible). 1.118 - and [?] describes AGEA. todo 1.119 -__________________________ 1.120 constraint was needed. The paper under discussion mentions that they also tried a hierarchial 1.121 variant of NNMF, but since they didn’t report its results, we assume that those result were 1.122 not any more impressive than the results of the non-hierarchial variant. 1.123 @@ -343,9 +342,6 @@ 1.124 3 genes which most match area AUD, according to a pointwise method5. The 1.125 bottom row displays the 3 genes which most match AUD according to a method 1.126 which considers local geometry6 The pointwise method in the top row identifies 1.127 - genes which express more strongly in AUD than outside of it; its weakness is that 1.128 - this includes many areas which don’t have a salient border matching the areal 1.129 - border. The geometric method identifies genes whose salient expression border 1.130 __________________________ 1.131 3“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652 1.132 4“mitochondrial translational initiation factor 2”; EntrezGene ID 76784 1.133 @@ -377,6 +373,9 @@ 1.134 genes which (individually) best match area AUD, according to gradient similar- 1.135 ity. From left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a, 1.136 Ptk7, Aph1a again, and Lepr 1.137 + genes which express more strongly in AUD than outside of it; its weakness is that 1.138 + this includes many areas which don’t have a salient border matching the areal 1.139 + border. The geometric method identifies genes whose salient expression border 1.140 seems to partially line up with the border of AUD; its weakness is that this 1.141 includes genes which don’t express over the entire area. Genes which have high 1.142 rankings using both pointwise and border criteria, such as Aph1a in the example, 1.143 @@ -393,16 +392,16 @@ 1.144 gene expression profiles. We achieved classification accuracy of about 81%7. 1.145 As noted above, however, a classifier that looks at all the genes at once isn’t 1.146 practically useful. 1.147 +____________ 1.148 + 75-fold cross-validation. 1.149 + 11 1.150 + 1.151 The requirement to find combinations of only a small number of genes limits 1.152 us from straightforwardly applying many of the most simple techniques from 1.153 the field of supervised machine learning. In the parlance of machine learning, 1.154 our task combines feature selection with supervised learning. 1.155 Decision trees 1.156 todo 1.157 -____________________ 1.158 - 75-fold cross-validation. 1.159 - 11 1.160 - 1.161 Specific to Aim 2 (and Aim 3) 1.162 Raw dimensionality reduction results 1.163 todo 1.164 @@ -449,10 +448,10 @@ 1.165 each area, a short list of markers to identify that area; and we will also 1.166 present lists of “panels” of genes that can be used to delineate many areas 1.167 at once. 1.168 + 13 1.169 + 1.170 Develop algorithms to suggest a division of a structure into anatom- 1.171 ical parts 1.172 - 13 1.173 - 1.174 1. Explore dimensionality reduction algorithms applied to pixels: including 1.175 TODO 1.176 2. Explore dimensionality reduction algorithms applied to genes: including