cg

diff grant.txt @ 38:82076af297cd
.
author: bshanks@bshanks.dyndns.org
date: Tue Apr 14 02:23:38 2009 -0700 (16 years ago)
parents: af3389b432e9
children: 9365a696c0b8
--- a/grant.txt	Mon Apr 13 23:17:40 2009 -0700
+++ b/grant.txt	Tue Apr 14 02:23:38 2009 -0700
@@ -183,7 +183,7 @@
-We have created software to (politely) download all of the SEV files from the Allen Institute website. We have also created software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret's formats.
+We have created software to (politely) download all of the SEV files from the Allen Institute website. We have also created software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret's file formats.
@@ -200,6 +200,8 @@
+We created a normalized version of the gene expression data by subtracting each gene's mean expression level (over all surface pixels) and dividing each gene by its standard deviation.
+
@@ -210,6 +212,32 @@
+=== Feature selection and scoring methods ===
+
+
+\vspace{0.3cm}**Correlation**
+Recall that the instances are surface pixels, and consider the problem of attempting to classify each instance as either a member of a particular anatomical area, or not. The target area can be represented as a binary mask over the surface pixels. 
+
+The features and the target area are both functions on the surface pixels; alternately, they can be thought of as images which can be displayed on the flatmapped surface. One class of feature selection scoring method are those which calculate some sort of "match" between each gene image and the target image. Those genes which match the best are good candidates for features.
+
+One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between each gene and each cortical area.
+
+todo: fig
+
+\vspace{0.3cm}**Conditional entropy**
+An information-theoretic scoring method is to find features such that, if the features (gene expression levels) are known, uncertainty about the target (the regional identity) is reduced. Entropy measures uncertainty, so what we want is to find features such that the conditional distribution of the target has minimal entropy. The distribution to which we are referring is the probability distribution over the population of surface pixels.
+
+The simplest way to use information theory is on discrete data, so we discretized our gene expression data by creating, for each gene, five thresholded binary masks of the gene data. For each gene, we created a binary mask of its expression levels over pixels using each of these thresholds: the mean of that gene, the mean minus one standard deviation, the mean minus two standard deviations, the mean plus one standard deviation, the mean plus two standard deviations.
+
+Now, for each region, we ran a forward stepwise procedure which attempted to find pairs of gene expression binary masks such that the conditional entropy of the target area's binary mask, conditioned upon the pair of gene expression binary masks, is minimized.
+
+This finds pairs of genes which are most informative, at least at these discretization thresholds.
+
+todo: fig
+
+\vspace{0.3cm}**Gradient similarity**
+todo
+
@@ -227,14 +255,7 @@
-\vspace{0.3cm}**Correlation**
-todo
-
-\vspace{0.3cm}**Conditional entropy**
-todo
-
-\vspace{0.3cm}**Gradient similarity**
-todo
+
author	bshanks@bshanks.dyndns.org
date	Tue Apr 14 02:23:38 2009 -0700 (16 years ago)
parents	af3389b432e9
children	9365a696c0b8