cg

changeset 60:9381e0c1827f

.
author bshanks@bshanks.dyndns.org
date Sun Apr 19 14:29:24 2009 -0700 (16 years ago)
parents c46f8f975f7c
children cb5eed6525f2
files grant.html grant.odt grant.pdf grant.txt
line diff
1.1 --- a/grant.html Sun Apr 19 14:19:52 2009 -0700 1.2 +++ b/grant.html Sun Apr 19 14:29:24 2009 -0700 1.3 @@ -351,8 +351,7 @@ 1.4 One class of feature selection scoring method are those which calculate some sort of “match” between each gene image 1.5 and the target image. Those genes which match the best are good candidates for features. 1.6 One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between 1.7 -each gene and each cortical area. 1.8 -todo: fig 1.9 +each gene and each cortical area. The top row of Figure shows the three genes most correlated with area SS. 1.10 Conditional entropy An information-theoretic scoring method is to find features such that, if the features (gene 1.11 expression levels) are known, uncertainty about the target (the regional identity) is reduced. Entropy measures uncertainty, 1.12 so what we want is to find features such that the conditional distribution of the target has minimal entropy. The distribution 1.13 @@ -369,10 +368,11 @@ 1.14 1.15 1.16 1.17 -Figure 1: The top row shows the three genes which (individually) best predict area AUD, according to logistic regression. 1.18 -The bottom row shows the three genes which (individually) best match area AUD, according to gradient similarity. From 1.19 -left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a, Ptk7, Aph1a again, and Lepr 1.20 -todo: fig 1.21 +Figure 1: Top row: Genes Nfic, A930001M12Rik, C130038G02Rik are the most correlated with area SS (somatosensory 1.22 +cortex). Bottom row: Genes C130038G02Rik, Cacna1i, Car10 are those with the best fit using logistic regression. Within 1.23 +each picture, the vertical axis roughly corresponds to anterior at the top and posterior at the bottom, and the horizontal 1.24 +axis roughly corresponds to medial at the left and lateral at the right. The red outline is the boundary of region MO. Pixels 1.25 +are colored according to correlation, with red meaning high correlation and blue meaning low. 1.26 Gradient similarity We noticed that the previous two scoring methods, which are pointwise, often found genes whose 1.27 pattern of expression did not look similar in shape to the target region. Fort his reason we designed a non-pointwise local 1.28 scoring method to detect when a gene had a pattern of expression which looked like it had a boundary whose shape is similar 1.29 @@ -404,38 +404,51 @@ 1.30 such as Aph1a in the example, may be particularly good markers. None of these genes are, individually, a perfect marker 1.31 for AUD; we deliberately chose a “difficult” area in order to better contrast pointwise with geometric methods. 1.32 Combinations of multiple genes are useful 1.33 -Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combi- 1.34 -natorially. according to logistic regression, gene wwc119 is the best fit single gene for predicting whether or not a pixel on 1.35 _________________________________________ 1.36 17For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor 1.37 variable was the value of the expression of the gene underneath that pixel. The resulting scores were used to rank the genes in terms of how well 1.38 they predict area AUD. 1.39 18For each gene the gradient similarity between (a) a map of the expression of each gene on the cortical surface and (b) the shape of area AUD, 1.40 was calculated, and this was used to rank the genes. 1.41 - 19“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652 1.42 - 1.43 + 1.44 + 1.45 + 1.46 +Figure 2: The top row shows the three genes which (individually) best predict area AUD, according to logistic regression. 1.47 +The bottom row shows the three genes which (individually) best match area AUD, according to gradient similarity. From 1.48 +left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a, Ptk7, Aph1a again, and Lepr 1.49 1.50 1.51 -Figure 2: Upper left: wwc1. Upper right: mtif2. Lower left: wwc1 + mtif2 (each pixel’s value on the lower left is the sum 1.52 -of the corresponding pixels in the upper row). Within each picture, the vertical axis roughly corresponds to anterior at the 1.53 -top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right. 1.54 -The red outline is the boundary of region MO. Pixels are colored approximately according to the density of expressing cells 1.55 -underneath each pixel, with red meaning a lot of expression and blue meaning little. 1.56 +Figure 3: Upper left: wwc1. Upper right: mtif2. Lower left: wwc1 + mtif2 (each pixel’s value on the lower left is the sum 1.57 +of the corresponding pixels in the upper row). 1.58 +Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combi- 1.59 +natorially. according to logistic regression, gene wwc119 is the best fit single gene for predicting whether or not a pixel on 1.60 the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure shows wwc1’s spatial expression 1.61 pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, however the gene 1.62 overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the 1.63 -overshoot is the medial surface of the cortex. MO is only found on the lateral surface (todo). 1.64 +overshoot is the medial surface of the cortex. MO is only found on the lateral surface. 1.65 Gene mtif220 is shown in figure the upper-right of Fig. . Mtif2 captures MO’s upper-left boundary, but not its lower-right 1.66 boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these 1.67 two figures, we get the lower-left of Figure . This combination captures area MO much better than any single gene. 1.68 -Areas which can be identified by single genes 1.69 -todo 1.70 -Underexpression of a gene can serve as a marker 1.71 -todo 1.72 +Underexpression of a gene can serve as a marker Underexpression of a gene can sometimes serve as a marker. 1.73 +See, for example, Figure . 1.74 Specific to Aim 1 (and Aim 3) 1.75 Forward stepwise logistic regression todo 1.76 SVM on all genes at once 1.77 In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical 1.78 +_________________________________________ 1.79 + 19“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652 1.80 + 20“mitochondrial translational initiation factor 2”; EntrezGene ID 76784 1.81 + 1.82 + 1.83 + Figure 4: Gene Pitx2 is selectively underexpressed in area SS (somatosensory). 1.84 + 1.85 + 1.86 +Figure 5: From left to right and top to bottom, single genes which roughly identify areas SS (somatosensory primary + 1.87 +supplemental), SSs (supplemental somatosensory), PIR (piriform), FRP (frontal pole), RSP (retrosplenial), COApm (Corti- 1.88 +cal amygdalar, posterior part, medial zone). Grouping some areas together, we have also found genes to identify the groups 1.89 +ACA+PL+ILA+DP+ORB+MO (anterior cingulate, prelimbic, infralimbic, dorsal peduncular, orbital, motor), posterior 1.90 +and lateral visual (VISpm, VISpl, VISI, VISp; posteromedial, posterolateral, lateral, and primary visual). The genes are 1.91 +Pitx2, Aldh1a2, Ppfibp1, Slco1a5, Tshz2, Trhr, Col12a1, Ets1. 1.92 surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%21. As noted above, 1.93 however, a classifier that looks at all the genes at once isn’t practically useful. 1.94 The requirement to find combinations of only a small number of genes limits us from straightforwardly applying many 1.95 @@ -443,16 +456,22 @@ 1.96 combines feature selection with supervised learning. 1.97 Decision trees 1.98 todo 1.99 +Areas which can be identified by single genes 1.100 +Using all of the methods we have tried to far, we have already found single genes which roughly identify some areas and 1.101 +groupings of areas. For each of these areas, an example of a gene which roughly identifies it is shown in Figure . We have 1.102 +not yet cross-verified these genes in other atlases. 1.103 +In addition, there are a number of areas which are almost identified by single genes: COAa+NLOT (anterior part of 1.104 +cortical amygdalar area, nucleus of the lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate), VIS 1.105 +(visual), AUD (auditory). 1.106 Specific to Aim 2 (and Aim 3) 1.107 Raw dimensionality reduction results 1.108 todo 1.109 (might want to incld nnMF since mentioned above) 1.110 -_________________________________________ 1.111 - 20“mitochondrial translational initiation factor 2”; EntrezGene ID 76784 1.112 - 215-fold cross-validation. 1.113 Dimensionality reduction plus K-means or spectral clustering 1.114 Many areas are captured by clusters of genes 1.115 todo 1.116 +_________________________________________ 1.117 + 215-fold cross-validation. 1.118 todo 1.119 Research plan 1.120 Further work on flatmapping 1.121 @@ -490,6 +509,7 @@ 1.122 4.Run the procedures that we developed on the cortex: we will present, for each area, a short list of markers to identify 1.123 that area; and we will also present lists of “panels” of genes that can be used to delineate many areas at once. 1.124 Develop algorithms to suggest a division of a structure into anatomical parts 1.125 +# mixture models, etc 1.126 1.Explore dimensionality reduction algorithms applied to pixels: including TODO 1.127 2.Explore dimensionality reduction algorithms applied to genes: including TODO 1.128 3.Explore clustering algorithms applied to pixels: including TODO 1.129 @@ -596,7 +616,6 @@ 1.130 Principle 4: Work in 2-D whenever possible 1.131 — 1.132 note: 1.133 - do we need to cite: no known markers, impressive results? 1.134 two hemis 1.135 1.136
2.1 Binary file grant.odt has changed
3.1 Binary file grant.pdf has changed
4.1 --- a/grant.txt Sun Apr 19 14:19:52 2009 -0700 4.2 +++ b/grant.txt Sun Apr 19 14:29:24 2009 -0700 4.3 @@ -263,14 +263,19 @@ 4.4 4.5 One class of feature selection scoring method are those which calculate some sort of "match" between each gene image and the target image. Those genes which match the best are good candidates for features. 4.6 4.7 -One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between each gene and each cortical area. Figure \ref{SScorr} shows the three genes most correlated with area SS. 4.8 - 4.9 -\begin{figure}\label{SScorr} 4.10 +One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between each gene and each cortical area. The top row of Figure \ref{SScorrLr} shows the three genes most correlated with area SS. 4.11 + 4.12 +\begin{figure}\label{SScorrLr} 4.13 \includegraphics[scale=.31]{singlegene_SS_corr_top_1_2365_jet.eps} 4.14 \includegraphics[scale=.31]{singlegene_SS_corr_top_2_242_jet.eps} 4.15 -\includegraphics[scale=.31]{singlegene_SS_corr_top_3_654_jet.eps} 4.16 - 4.17 -\caption{Genes (and predicted genes) Nfic, A930001M12Rik, C130038G02Rik are the most correlated with area SS (somatosensory cortex). Within each picture, the vertical axis roughly corresponds to anterior at the top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right. The red outline is the boundary of region MO. Pixels are colored according to correlation, with red meaning high correlation and blue meaning low.} 4.18 +\includegraphics[scale=.31]{singlegene_SS_corr_top_3_654_jet.eps}\\ 4.19 +\\ 4.20 +\includegraphics[scale=.31]{singlegene_SS_lr_top_1_654_jet.eps} 4.21 +\includegraphics[scale=.31]{singlegene_SS_lr_top_2_685_jet.eps} 4.22 +\includegraphics[scale=.31]{singlegene_SS_lr_top_3_724_jet.eps} 4.23 + 4.24 + 4.25 +\caption{Top row: Genes Nfic, A930001M12Rik, C130038G02Rik are the most correlated with area SS (somatosensory cortex). Bottom row: Genes C130038G02Rik, Cacna1i, Car10 are those with the best fit using logistic regression. Within each picture, the vertical axis roughly corresponds to anterior at the top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right. The red outline is the boundary of region MO. Pixels are colored according to correlation, with red meaning high correlation and blue meaning low.} 4.26 \end{figure} 4.27 4.28 4.29 @@ -284,7 +289,6 @@ 4.30 4.31 This finds pairs of genes which are most informative (at least at these discretization thresholds) relative to the question, "Is this surface pixel a member of the target area?". 4.32 4.33 -todo: fig 4.34 4.35 \vspace{0.3cm}**Gradient similarity** 4.36 We noticed that the previous two scoring methods, which are pointwise, often found genes whose pattern of expression did not look similar in shape to the target region. Fort his reason we designed a non-pointwise local scoring method to detect when a gene had a pattern of expression which looked like it had a boundary whose shape is similar to the shape of the target region. We call this scoring method "gradient similarity". 4.37 @@ -318,7 +322,7 @@ 4.38 4.39 \vspace{0.3cm}**Combinations of multiple genes are useful** 4.40 4.41 -Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combinatorially. according to logistic regression, gene wwc1\footnote{"WW, C2 and coiled-coil domain containing 1"; EntrezGene ID 211652} is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, however the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the lateral surface (todo). 4.42 +Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combinatorially. according to logistic regression, gene wwc1\footnote{"WW, C2 and coiled-coil domain containing 1"; EntrezGene ID 211652} is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, however the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the lateral surface. 4.43 4.44 Gene mtif2\footnote{"mitochondrial translational initiation factor 2"; EntrezGene ID 76784} is shown in figure the upper-right of Fig. \ref{MOcombo}. Mtif2 captures MO's upper-left boundary, but not its lower-right boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these two figures, we get the lower-left of Figure \ref{MOcombo}. This combination captures area MO much better than any single gene. 4.45 4.46 @@ -334,13 +338,17 @@ 4.47 4.48 4.49 4.50 -\vspace{0.3cm}**Areas which can be identified by single genes** 4.51 - 4.52 -todo 4.53 4.54 \vspace{0.3cm}**Underexpression of a gene can serve as a marker** 4.55 - 4.56 -todo 4.57 +Underexpression of a gene can sometimes serve as a marker. See, for example, Figure \ref{hole}. 4.58 + 4.59 + 4.60 +\begin{figure}\label{hole} 4.61 +\includegraphics[scale=.31]{holeExample_2682_SS_jet.eps} 4.62 +\caption{Gene Pitx2 is selectively underexpressed in area SS (somatosensory).} 4.63 +\end{figure} 4.64 + 4.65 + 4.66 4.67 === Specific to Aim 1 (and Aim 3) === 4.68 \vspace{0.3cm}**Forward stepwise logistic regression** 4.69 @@ -359,6 +367,27 @@ 4.70 4.71 todo 4.72 4.73 +\vspace{0.3cm}**Areas which can be identified by single genes** 4.74 + 4.75 +Using all of the methods we have tried to far, we have already found single genes which roughly identify some areas and groupings of areas. For each of these areas, an example of a gene which roughly identifies it is shown in Figure \ref{singleSoFar}. We have not yet cross-verified these genes in other atlases. 4.76 + 4.77 +In addition, there are a number of areas which are almost identified by single genes: COAa+NLOT (anterior part of cortical amygdalar area, nucleus of the lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate), VIS (visual), AUD (auditory). 4.78 + 4.79 + 4.80 +\begin{figure}\label{singleSoFar} 4.81 +\includegraphics[scale=.31]{singlegene_example_2682_Pitx2_SS_jet.eps} 4.82 +\includegraphics[scale=.31]{singlegene_example_371_Aldh1a2_SSs_jet.eps} 4.83 +\includegraphics[scale=.31]{singlegene_example_2759_Ppfibp1_PIR_jet.eps} 4.84 +\includegraphics[scale=.31]{singlegene_example_3310_Slco1a5_FRP_jet.eps} 4.85 +\includegraphics[scale=.31]{singlegene_example_3709_Tshz2_RSP_jet.eps} 4.86 +\includegraphics[scale=.31]{singlegene_example_3674_Trhr_COApm_jet.eps} 4.87 +\includegraphics[scale=.31]{singlegene_example_925_Col12a1_ACA+PL+ILA+DP+ORB+MO_jet.eps} 4.88 +\includegraphics[scale=.31]{singlegene_example_1334_Ets1_post_lat_vis_jet.eps} 4.89 + 4.90 +\caption{From left to right and top to bottom, single genes which roughly identify areas SS (somatosensory primary + supplemental), SSs (supplemental somatosensory), PIR (piriform), FRP (frontal pole), RSP (retrosplenial), COApm (Cortical amygdalar, posterior part, medial zone). Grouping some areas together, we have also found genes to identify the groups ACA+PL+ILA+DP+ORB+MO (anterior cingulate, prelimbic, infralimbic, dorsal peduncular, orbital, motor), posterior and lateral visual (VISpm, VISpl, VISI, VISp; posteromedial, posterolateral, lateral, and primary visual). The genes are $Pitx2$, $Aldh1a2$, $Ppfibp1$, $Slco1a5$, $Tshz2$, $Trhr$, $Col12a1$, $Ets1$.} 4.91 +\end{figure} 4.92 + 4.93 + 4.94 4.95 === Specific to Aim 2 (and Aim 3) === 4.96 4.97 @@ -481,3 +510,8 @@ 4.98 %%"genomic anatomy" is a name found in the titles of one of the cited papers which seems good; maybe "computational genomic anatomy" 4.99 4.100 %% todo: actually i'm pretty sure AGEA doesn't find ANY areas, but i said "most" and "often" to be cautious. 4.101 + 4.102 +%% todo: MO is only found on the lateral surface (todo). 4.103 +%% todo: predicted genes like Riken 4.104 + 4.105 +%% todo: should we disclose genes?