cg
changeset 39:9365a696c0b8
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Tue Apr 14 02:31:37 2009 -0700 (16 years ago) |
parents | 82076af297cd |
children | cb2ac88dd526 |
files | grant.doc grant.html grant.odt grant.pdf grant.txt |
line diff
1.1 Binary file grant.doc has changed
2.1 --- a/grant.html Tue Apr 14 02:23:38 2009 -0700
2.2 +++ b/grant.html Tue Apr 14 02:31:37 2009 -0700
2.3 @@ -285,6 +285,7 @@
2.4 for features.
2.5 One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between
2.6 each gene and each cortical area.
2.7 +todo: fig
2.8 Conditional entropy An information-theoretic scoring method is to find features such that, if the features (gene
2.9 expression levels) are known, uncertainty about the target (the regional identity) is reduced. Entropy measures uncertainty,
2.10 so what we want is to find features such that the conditional distribution of the target has minimal entropy. The distribution
2.11 @@ -293,11 +294,11 @@
2.12 for each gene, five thresholded binary masks of the gene data. For each gene, we created a binary mask of its expression
2.13 levels over pixels using each of these thresholds: the mean of that gene, the mean minus one standard deviation, the mean
2.14 minus two standard deviations, the mean plus one standard deviation, the mean plus two standard deviations.
2.15 -Now, for each region, we ran a forward stepwise procedure which attempted to find pairs of gene expression binary masks
2.16 -such that the conditional entropy of the target area’s binary mask, conditioned upon the pair of gene expression binary
2.17 -masks, is minimized.
2.18 -This finds pairs of genes which are most informative, at least at these discretization thresholds.
2.19 -Gradient similarity todo
2.20 +Now, for each region, we created and ran a forward stepwise procedure which attempted to find pairs of gene expression
2.21 +binary masks such that the conditional entropy of the target area’s binary mask, conditioned upon the pair of gene expression
2.22 +binary masks, is minimized.
2.23 +This finds pairs of genes which are most informative (at least at these discretization thresholds) relative to the question,
2.24 +“Is this surface pixel a member of the target area?”.
2.25
2.26
2.27
2.28 @@ -306,6 +307,13 @@
2.29 top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right.
2.30 The red outline is the boundary of region MO. Pixels are colored approximately according to the density of expressing cells
2.31 underneath each pixel, with red meaning a lot of expression and blue meaning little.
2.32 +todo: fig
2.33 +Gradient similarity We noticed that the previous two scoring methods, which are pointwise, often found genes whose
2.34 +pattern of expression did not look similar in shape to the target region. Fort his reason we designed a non-pointwise local
2.35 +scoring method to detect when a gene had a pattern of expression which looked like it had a boundary whose shape is similar
2.36 +to the shape of the target region.
2.37 +had shape of the pattern of expression did not seem to match the shape of the target area.
2.38 +todo
2.39 Using combinations of multiple genes is necessary and sufficient to delineate some cortical areas
2.40 Here we give an example of a cortical area which is not marked by any single gene, but which can be identified combi-
2.41 natorially. according to logistic regression, gene wwc19 is the best fit single gene for predicting whether or not a pixel on
2.42 @@ -327,12 +335,7 @@
2.43 Aph1a in the example, may be particularly good markers. None of these genes are, individually, a perfect marker for AUD;
2.44 we deliberately chose a “difficult” area in order to better contrast pointwise with geometric methods.
2.45 Areas which can be identified by single genes
2.46 -todo
2.47 -Areas can sometimes be marked by underexpression
2.48 -todo
2.49 -Specific to Aim 1 (and Aim 3)
2.50 -Forward stepwise logistic regression todo
2.51 -__
2.52 +_________________________________________
2.53 9“WW, C2 and coiled-coil domain containing 1”; EntrezGene ID 211652
2.54 10“mitochondrial translational initiation factor 2”; EntrezGene ID 76784
2.55 11For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
2.56 @@ -346,6 +349,11 @@
2.57 Figure 2: The top row shows the three genes which (individually) best predict area AUD, according to logistic regression.
2.58 The bottom row shows the three genes which (individually) best match area AUD, according to gradient similarity. From
2.59 left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a, Ptk7, Aph1a again, and Lepr
2.60 +todo
2.61 +Areas can sometimes be marked by underexpression
2.62 +todo
2.63 +Specific to Aim 1 (and Aim 3)
2.64 +Forward stepwise logistic regression todo
2.65 SVM on all genes at once
2.66 In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical
2.67 surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%13. As noted above,
3.1 Binary file grant.odt has changed
4.1 Binary file grant.pdf has changed
5.1 --- a/grant.txt Tue Apr 14 02:23:38 2009 -0700
5.2 +++ b/grant.txt Tue Apr 14 02:31:37 2009 -0700
5.3 @@ -229,13 +229,19 @@
5.4
5.5 The simplest way to use information theory is on discrete data, so we discretized our gene expression data by creating, for each gene, five thresholded binary masks of the gene data. For each gene, we created a binary mask of its expression levels over pixels using each of these thresholds: the mean of that gene, the mean minus one standard deviation, the mean minus two standard deviations, the mean plus one standard deviation, the mean plus two standard deviations.
5.6
5.7 -Now, for each region, we ran a forward stepwise procedure which attempted to find pairs of gene expression binary masks such that the conditional entropy of the target area's binary mask, conditioned upon the pair of gene expression binary masks, is minimized.
5.8 -
5.9 -This finds pairs of genes which are most informative, at least at these discretization thresholds.
5.10 +Now, for each region, we created and ran a forward stepwise procedure which attempted to find pairs of gene expression binary masks such that the conditional entropy of the target area's binary mask, conditioned upon the pair of gene expression binary masks, is minimized.
5.11 +
5.12 +This finds pairs of genes which are most informative (at least at these discretization thresholds) relative to the question, "Is this surface pixel a member of the target area?".
5.13
5.14 todo: fig
5.15
5.16 \vspace{0.3cm}**Gradient similarity**
5.17 +We noticed that the previous two scoring methods, which are pointwise, often found genes whose pattern of expression did not look similar in shape to the target region. Fort his reason we designed a non-pointwise local scoring method to detect when a gene had a pattern of expression which looked like it had a boundary whose shape is similar to the shape of the target region.
5.18 +
5.19 +
5.20 +
5.21 +had shape of the pattern of expression did not seem to match the shape of the target area.
5.22 +
5.23 todo
5.24
5.25