cg

changeset 38:82076af297cd
.
author: bshanks@bshanks.dyndns.org
date: Tue Apr 14 02:23:38 2009 -0700 (16 years ago)
parents: af3389b432e9
children: 9365a696c0b8
files: grant.html grant.odt grant.pdf grant.txt
--- a/grant.html	Mon Apr 13 23:17:40 2009 -0700
+++ b/grant.html	Tue Apr 14 02:23:38 2009 -0700
@@ -251,7 +251,7 @@
-software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret&#8217;s formats.
+software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret&#8217;s file formats.
@@ -267,12 +267,45 @@
+We created a normalized version of the gene expression data by subtracting each gene&#8217;s mean expression level (over all
+surface pixels) and dividing each gene by its standard deviation.
+Feature selection and scoring methods
+Correlation Recall that the instances are surface pixels, and consider the problem of attempting to classify each instance
+as either a member of a particular anatomical area, or not.  The target area can be represented as a binary mask over the
+surface pixels.
+The features and the target area are both functions on the surface pixels; alternately, they can be thought of as images
+which can be displayed on the flatmapped surface. One class of feature selection scoring method are those which calculate
+some sort of &#8220;match&#8221; between each gene image and the target image. Those genes which match the best are good candidates
+for features.
+One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between
+each gene and each cortical area.
+Conditional entropy An information-theoretic scoring method is to find features such that,  if the features (gene
+expression levels) are known, uncertainty about the target (the regional identity) is reduced. Entropy measures uncertainty,
+so what we want is to find features such that the conditional distribution of the target has minimal entropy. The distribution
+to which we are referring is the probability distribution over the population of surface pixels.
+The simplest way to use information theory is on discrete data, so we discretized our gene expression data by creating,
+for each gene, five thresholded binary masks of the gene data.  For each gene, we created a binary mask of its expression
+levels over pixels using each of these thresholds: the mean of that gene, the mean minus one standard deviation, the mean
+minus two standard deviations, the mean plus one standard deviation, the mean plus two standard deviations.
+Now, for each region, we ran a forward stepwise procedure which attempted to find pairs of gene expression binary masks
+such that the conditional entropy of the target area&#8217;s binary mask, conditioned upon the pair of gene expression binary
+masks, is minimized.
+This finds pairs of genes which are most informative, at least at these discretization thresholds.
+Gradient similarity todo
+
+                   
+
+Figure 1: Upper left: wwc1. Upper right: mtif2. Lower left: wwc1 + mtif2 (each pixel&#8217;s value on the lower left is the sum
+of the corresponding pixels in the upper row). Within each picture, the vertical axis roughly corresponds to anterior at the
+top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right.
+The red outline is the boundary of region MO. Pixels are colored approximately according to the density of expressing cells
+underneath each pixel, with red meaning a lot of expression and blue meaning little.
@@ -280,19 +313,26 @@
-Gnee mtif210 is shown in figure the upper-right of Fig. . Mtif2 captures MO&#8217;s upper-left boundary, but not its lower-right
+Gene mtif210 is shown in figure the upper-right of Fig. . Mtif2 captures MO&#8217;s upper-left boundary, but not its lower-right
-Correlation todo
-Conditional entropy todo
-Gradient similarity todo
-_________________________________________
+whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes genes
+which don&#8217;t express over the entire area. Genes which have high rankings using both pointwise and border criteria, such as
+Aph1a in the example, may be particularly good markers. None of these genes are, individually, a perfect marker for AUD;
+we deliberately chose a &#8220;difficult&#8221; area in order to better contrast pointwise with geometric methods.
+Areas which can be identified by single genes
+todo
+Areas can sometimes be marked by underexpression
+todo
+Specific to Aim 1 (and Aim 3)
+Forward stepwise logistic regression todo
+__
@@ -301,28 +341,11 @@
-                   
-
-Figure 1: Upper left: wwc1. Upper right: mtif2. Lower left: wwc1 + mtif2 (each pixel&#8217;s value on the lower left is the sum
-of the corresponding pixels in the upper row). Within each picture, the vertical axis roughly corresponds to anterior at the
-top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right.
-The red outline is the boundary of region MO. Pixels are colored approximately according to the density of expressing cells
-underneath each pixel, with red meaning a lot of expression and blue meaning little.
-whose salient expression border seems to partially line up with the border of AUD; its weakness is that this includes genes
-which don&#8217;t express over the entire area. Genes which have high rankings using both pointwise and border criteria, such as
-Aph1a in the example, may be particularly good markers. None of these genes are, individually, a perfect marker for AUD;
-we deliberately chose a &#8220;difficult&#8221; area in order to better contrast pointwise with geometric methods.
-Areas which can be identified by single genes
-todo
-Areas can sometimes be marked by underexpression
-todo
-Specific to Aim 1 (and Aim 3)
-Forward stepwise logistic regression todo
--- a/grant.txt	Mon Apr 13 23:17:40 2009 -0700
+++ b/grant.txt	Tue Apr 14 02:23:38 2009 -0700
@@ -183,7 +183,7 @@
-We have created software to (politely) download all of the SEV files from the Allen Institute website. We have also created software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret's formats.
+We have created software to (politely) download all of the SEV files from the Allen Institute website. We have also created software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret's file formats.
@@ -200,6 +200,8 @@
+We created a normalized version of the gene expression data by subtracting each gene's mean expression level (over all surface pixels) and dividing each gene by its standard deviation.
+
@@ -210,6 +212,32 @@
+=== Feature selection and scoring methods ===
+
+
+\vspace{0.3cm}**Correlation**
+Recall that the instances are surface pixels, and consider the problem of attempting to classify each instance as either a member of a particular anatomical area, or not. The target area can be represented as a binary mask over the surface pixels. 
+
+The features and the target area are both functions on the surface pixels; alternately, they can be thought of as images which can be displayed on the flatmapped surface. One class of feature selection scoring method are those which calculate some sort of "match" between each gene image and the target image. Those genes which match the best are good candidates for features.
+
+One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between each gene and each cortical area.
+
+todo: fig
+
+\vspace{0.3cm}**Conditional entropy**
+An information-theoretic scoring method is to find features such that, if the features (gene expression levels) are known, uncertainty about the target (the regional identity) is reduced. Entropy measures uncertainty, so what we want is to find features such that the conditional distribution of the target has minimal entropy. The distribution to which we are referring is the probability distribution over the population of surface pixels.
+
+The simplest way to use information theory is on discrete data, so we discretized our gene expression data by creating, for each gene, five thresholded binary masks of the gene data. For each gene, we created a binary mask of its expression levels over pixels using each of these thresholds: the mean of that gene, the mean minus one standard deviation, the mean minus two standard deviations, the mean plus one standard deviation, the mean plus two standard deviations.
+
+Now, for each region, we ran a forward stepwise procedure which attempted to find pairs of gene expression binary masks such that the conditional entropy of the target area's binary mask, conditioned upon the pair of gene expression binary masks, is minimized.
+
+This finds pairs of genes which are most informative, at least at these discretization thresholds.
+
+todo: fig
+
+\vspace{0.3cm}**Gradient similarity**
+todo
+
@@ -227,14 +255,7 @@
-\vspace{0.3cm}**Correlation**
-todo
-
-\vspace{0.3cm}**Conditional entropy**
-todo
-
-\vspace{0.3cm}**Gradient similarity**
-todo
+
author	bshanks@bshanks.dyndns.org
date	Tue Apr 14 02:23:38 2009 -0700 (16 years ago)
parents	af3389b432e9
children	9365a696c0b8
files	grant.html grant.odt grant.pdf grant.txt