cg
diff grant.txt @ 38:82076af297cd
.
author | bshanks@bshanks.dyndns.org |
---|---|
date | Tue Apr 14 02:23:38 2009 -0700 (16 years ago) |
parents | af3389b432e9 |
children | 9365a696c0b8 |
line diff
1.1 --- a/grant.txt Mon Apr 13 23:17:40 2009 -0700
1.2 +++ b/grant.txt Tue Apr 14 02:23:38 2009 -0700
1.3 @@ -183,7 +183,7 @@
1.4 == Preliminary work ==
1.5
1.6 === Format conversion between SEV, MATLAB, NIFTI ===
1.7 -We have created software to (politely) download all of the SEV files from the Allen Institute website. We have also created software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret's formats.
1.8 +We have created software to (politely) download all of the SEV files from the Allen Institute website. We have also created software to convert between the SEV, MATLAB, and NIFTI file formats, as well as some of Caret's file formats.
1.9
1.10
1.11 === Flatmap of cortex ===
1.12 @@ -200,6 +200,8 @@
1.13 * A 2-D matrix whose entries represent the regional label associated with each surface pixel
1.14 * For each gene, a 2-D matrix whose entries represent the average expression level underneath each surface pixel
1.15
1.16 +We created a normalized version of the gene expression data by subtracting each gene's mean expression level (over all surface pixels) and dividing each gene by its standard deviation.
1.17 +
1.18 To move beyond a single average expression level for each surface pixel, we plan to create a separate matrix for each cortical layer to represent the average expression level within that layer. Cortical layers are found at different depths in different parts of the cortex. In preparation for extracting the layer-specific datasets, we have extended Caret with routines that allow the depth of the ROI for volume-to-surface projection to vary.
1.19
1.20 In the Research Plan, we describe how we will automatically locate the layer depths. For validation, we have manually demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
1.21 @@ -210,6 +212,32 @@
1.22
1.23
1.24
1.25 +=== Feature selection and scoring methods ===
1.26 +
1.27 +
1.28 +\vspace{0.3cm}**Correlation**
1.29 +Recall that the instances are surface pixels, and consider the problem of attempting to classify each instance as either a member of a particular anatomical area, or not. The target area can be represented as a binary mask over the surface pixels.
1.30 +
1.31 +The features and the target area are both functions on the surface pixels; alternately, they can be thought of as images which can be displayed on the flatmapped surface. One class of feature selection scoring method are those which calculate some sort of "match" between each gene image and the target image. Those genes which match the best are good candidates for features.
1.32 +
1.33 +One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between each gene and each cortical area.
1.34 +
1.35 +todo: fig
1.36 +
1.37 +\vspace{0.3cm}**Conditional entropy**
1.38 +An information-theoretic scoring method is to find features such that, if the features (gene expression levels) are known, uncertainty about the target (the regional identity) is reduced. Entropy measures uncertainty, so what we want is to find features such that the conditional distribution of the target has minimal entropy. The distribution to which we are referring is the probability distribution over the population of surface pixels.
1.39 +
1.40 +The simplest way to use information theory is on discrete data, so we discretized our gene expression data by creating, for each gene, five thresholded binary masks of the gene data. For each gene, we created a binary mask of its expression levels over pixels using each of these thresholds: the mean of that gene, the mean minus one standard deviation, the mean minus two standard deviations, the mean plus one standard deviation, the mean plus two standard deviations.
1.41 +
1.42 +Now, for each region, we ran a forward stepwise procedure which attempted to find pairs of gene expression binary masks such that the conditional entropy of the target area's binary mask, conditioned upon the pair of gene expression binary masks, is minimized.
1.43 +
1.44 +This finds pairs of genes which are most informative, at least at these discretization thresholds.
1.45 +
1.46 +todo: fig
1.47 +
1.48 +\vspace{0.3cm}**Gradient similarity**
1.49 +todo
1.50 +
1.51
1.52
1.53
1.54 @@ -227,14 +255,7 @@
1.55 \caption{Upper left: $wwc1$. Upper right: $mtif2$. Lower left: wwc1 + mtif2 (each pixel's value on the lower left is the sum of the corresponding pixels in the upper row). Within each picture, the vertical axis roughly corresponds to anterior at the top and posterior at the bottom, and the horizontal axis roughly corresponds to medial at the left and lateral at the right. The red outline is the boundary of region MO. Pixels are colored approximately according to the density of expressing cells underneath each pixel, with red meaning a lot of expression and blue meaning little.}
1.56 \end{figure}
1.57
1.58 -\vspace{0.3cm}**Correlation**
1.59 -todo
1.60 -
1.61 -\vspace{0.3cm}**Conditional entropy**
1.62 -todo
1.63 -
1.64 -\vspace{0.3cm}**Gradient similarity**
1.65 -todo
1.66 +
1.67
1.68 \vspace{0.3cm}**Geometric and pointwise scoring methods provide complementary information**
1.69