cg

diff grant.html @ 70:5cdbbf86e10b
.
author: bshanks@bshanks.dyndns.org
date: Mon Apr 20 16:23:22 2009 -0700 (16 years ago)
parents: 60d7c1c1b94f
children: 48dae6cb2c09
--- a/grant.html	Mon Apr 20 15:08:40 2009 -0700
+++ b/grant.html	Mon Apr 20 16:23:22 2009 -0700
@@ -318,101 +318,146 @@
-
-                                  
-           Figure 1: Gene Pitx2 is selectively underexpressed in area SS (somatosensory).
-We downloaded the ABA data and applied a mask to select only those voxels which belong to cerebral cortex. We divided
-the cortex into hemispheres.
-Using Caret[5], we created a mesh representation of the surface of the selected voxels.  For each gene, for each node of
-the mesh, we calculated an average of the gene expression of the voxels &#8220;underneath&#8221; that mesh node.  We then flattened
-the cortex, creating a two-dimensional mesh.
-We sampled the nodes of the irregular, flat mesh in order to create a regular grid of pixel values. We converted this grid
-into a MATLAB matrix.
-We manually traced the boundaries of each of 49 cortical areas from the ABA coronal reference atlas slides.  We then
-converted these manual traces into Caret-format regional boundary data on the mesh surface.  We projected the regions
-onto the 2-d mesh, and then onto the grid, and then we converted the region data into MATLAB format.
-At this point, the data is in the form of a number of 2-D matrices, all in registration, with the matrix entries representing
-a grid of points (pixels) over the cortical surface:
+
+Figure 1:    Gene  Pitx2
+is selectively   underex-
+pressed  in  area  SS  (so-
+matosensory).           We downloaded the ABA data and applied a mask to select only those voxels which belong to
+                cerebral cortex. We divided the cortex into hemispheres.
+                  Using Caret[5], we created a mesh representation of the surface of the selected voxels.  For
+                each gene, for each node of the mesh, we calculated an average of the gene expression of the
+                voxels &#8220;underneath&#8221; that mesh node.  We then flattened the cortex, creating a two-dimensional
+                mesh.
+                  We sampled the nodes of the irregular, flat mesh in order to create a regular grid of pixel
+                values. We converted this grid into a MATLAB matrix.
+                  We manually traced the boundaries of each of 49 cortical areas from the ABA coronal reference
+                atlas slides.  We then converted these manual traces into Caret-format regional boundary data
+                on the mesh surface.  We projected the regions onto the 2-d mesh, and then onto the grid, and
+                then we converted the region data into MATLAB format.
+                  At this point, the data is in the form of a number of 2-D matrices, all in registration, with
+                the matrix entries representing a grid of points (pixels) over the cortical surface:
-We created a normalized version of the gene expression data by subtracting each gene&#8217;s mean expression level (over all
-surface pixels) and dividing each gene by its standard deviation.
-The features and the target area are both functions on the surface pixels.  They can be referred to as scalar fields over
-the space of surface pixels; alternately, they can be thought of as images which can be displayed on the flatmapped surface.
-To move beyond a single average expression level for each surface pixel, we plan to create a separate matrix for each
-cortical layer to represent the average expression level within that layer.  Cortical layers are found at different depths in
-different parts of the cortex. In preparation for extracting the layer-specific datasets, we have extended Caret with routines
-that allow the depth of the ROI for volume-to-surface projection to vary.
-In the Research Plan, we describe how we will automatically locate the layer depths. For validation, we have manually
-demarcated the depth of the outer boundary of cortical layer 5 throughout the cortex.
-Feature selection and scoring methods
-Underexpression of a gene can serve as a marker Underexpression of a gene can sometimes serve as a marker. See,
-for example, Figure 1.
-Correlation Recall that the instances are surface pixels, and consider the problem of attempting to classify each instance
-as either a member of a particular anatomical area, or not. The target area can be represented as a boolean mask over the
-surface pixels.
-One class of feature selection scoring method are those which calculate some sort of &#8220;match&#8221; between each gene image
-and the target image. Those genes which match the best are good candidates for features.
-One of the simplest methods in this class is to use correlation as the match score. We calculated the correlation between
-each gene and each cortical area. The top row of Figure 2 shows the three genes most correlated with area SS.
-
-                                                        
-                                                        
-Figure 2:  Top row:  Genes Nfic, A930001M12Rik, C130038G02Rik are the most correlated with area SS (somatosensory
-cortex).  Bottom row: Genes C130038G02Rik, Cacna1i, Car10 are those with the best fit using logistic regression.  Within
-each picture, the vertical axis roughly corresponds to anterior at the top and posterior at the bottom, and the horizontal
-axis roughly corresponds to medial at the left and lateral at the right. The red outline is the boundary of region MO. Pixels
-are colored according to correlation, with red meaning high correlation and blue meaning low.
-Conditional entropy An information-theoretic scoring method is to find features such that,  if the features (gene
-expression levels) are known, uncertainty about the target (the regional identity) is reduced. Entropy measures uncertainty,
-so what we want is to find features such that the conditional distribution of the target has minimal entropy. The distribution
-to which we are referring is the probability distribution over the population of surface pixels.
-The simplest way to use information theory is on discrete data, so we discretized our gene expression data by creating,
-for each gene, five thresholded boolean masks of the gene data. For each gene, we created a boolean mask of its expression
-levels using each of these thresholds: the mean of that gene, the mean minus one standard deviation, the mean minus two
-standard deviations, the mean plus one standard deviation, the mean plus two standard deviations.
-Now, for each region, we created and ran a forward stepwise procedure which attempted to find pairs of gene expression
-boolean masks such that the conditional entropy of the target area&#8217;s boolean mask, conditioned upon the pair of gene
-expression boolean masks, is minimized.
-This finds pairs of genes which are most informative (at least at these discretization thresholds) relative to the question,
-&#8220;Is this surface pixel a member of the target area?&#8221;. Its advantage over linear methods such as logistic regression is that it
-takes account of arbitrarily nonlinear relationships; for example, if the XOR of two variables predicts the target, conditional
-entropy would notice, whereas linear methods would not.
-Gradient similarity We noticed that the previous two scoring methods, which are pointwise, often found genes whose
-pattern of expression did not look similar in shape to the target region.  For this reason we designed a non-pointwise local
-scoring method to detect when a gene had a pattern of expression which looked like it had a boundary whose shape is similar
-to the shape of the target region. We call this scoring method &#8220;gradient similarity&#8221;.
-One might say that gradient similarity attempts to measure how much the border of the area of gene expression and
-the border of the target region overlap.  However, since gene expression falls off continuously rather than jumping from its
-maximum value to zero, the spatial pattern of a gene&#8217;s expression often does not have a discrete border. Therefore, instead
-of looking for a discrete border, we look for large gradients.  Gradient similarity is a symmetric function over two images
-(i.e. two scalar fields). It is is high to the extent that matching pixels which have large values and large gradients also have
-gradients which are oriented in a similar direction. The formula is:
-                &#x2211;
-             pixel<img src="cmsy7-32.png" alt="&#x2208;" />pixels cos(abs(&#x2220;&#x2207;1 -&#x2220;&#x2207;2)) &#x22C5;|&#x2207;1| + |&#x2207;2| 
+              
+              
+Figure   2:      Top   row:      Genes   Nfic   and
+A930001M12Rik  are  the  most  correlated  with
+area SS  (somatosensory  cortex).   Bottom  row:
+Genes C130038G02Rik  and  Cacna1i  are  those
+with the best fit using logistic regression. Within
+each picture,  the  vertical  axis  roughly  corre-
+sponds to  anterior  at  the  top  and  posterior  at
+the bottom,  and  the  horizontal  axis  roughly
+corresponds  to  medial  at  the  left  and  lateral
+at the right.   The  red  outline  is  the  boundary
+of region  MO.  Pixels  are  colored  according  to
+correlation,  with  red  meaning  high  correlation
+and blue meaning low.                           We created a normalized version of the gene expression data by sub-
+                                    tracting each gene&#8217;s mean expression level (over all surface pixels) and
+                                    dividing each gene by its standard deviation.
+                                       The features and the target area are both functions on the surface
+                                    pixels.  They can be referred to as scalar fields over the space of sur-
+                                    face pixels; alternately, they can be thought of as images which can be
+                                    displayed on the flatmapped surface.
+                                       To move beyond a single average expression level for each surface
+                                    pixel, we plan to create a separate matrix for each cortical layer to rep-
+                                    resent the average expression level within that layer. Cortical layers are
+                                    found at different depths in different parts of the cortex. In preparation
+                                    for extracting the layer-specific datasets, we have extended Caret with
+                                    routines that allow the depth of the ROI for volume-to-surface projection
+                                    to vary.
+                                       In the Research Plan, we describe how we will automatically locate
+                                    the layer depths. For validation, we have manually demarcated the depth
+                                    of the outer boundary of cortical layer 5 throughout the cortex.
+                                     Feature selection and scoring methods
+                                    Underexpression of a gene can serve as a marker Underexpression
+                                    of a gene can sometimes serve as a marker. See, for example, Figure 1.
+                                       Correlation Recall that the instances are surface pixels, and con-
+                                    sider the problem of attempting to classify each instance as either a
+                                    member of a particular anatomical area, or not. The target area can be
+                                    represented as a boolean mask over the surface pixels.
+                                       One class of feature selection scoring method are those which calcu-
+                                    late some sort of &#8220;match&#8221; between each gene image and the target image.
+                                    Those genes which match the best are good candidates for features.
+                                       One of the simplest methods in this class is to use correlation as
+                                    the match score.  We calculated the correlation between each gene and
+                                    each cortical area. The top row of Figure 2 shows the three genes most
+correlated with area SS.
+              
+              
+Figure 3: The top row shows the two genes which
+(individually) best predict area AUD, according
+to logistic regression. The bottom row shows the
+two genes which (individually) best match area
+AUD, according to gradient similarity.  From left
+to right and top to bottom, the genes are Ssr1,
+Efcbp1, Ptk7, and Aph1a.                       Conditional entropy An information-theoretic scoring method is
+                                    to find features such that, if the features (gene expression levels) are
+                                    known, uncertainty about the target (the regional identity) is reduced.
+                                    Entropy measures uncertainty, so what we want is to find features such
+                                    that the conditional distribution of the target has minimal entropy. The
+                                    distribution to which we are referring is the probability distribution over
+                                    the population of surface pixels.
+                                       The simplest way to use information theory is on discrete data, so
+                                    we discretized our gene expression data by creating, for each gene, five
+                                    thresholded boolean masks of the gene data. For each gene, we created a
+                                    boolean mask of its expression levels using each of these thresholds: the
+                                    mean of that gene, the mean minus one standard deviation, the mean
+                                    minus two standard deviations, the mean plus one standard deviation,
+                                    the mean plus two standard deviations.
+                                       Now, for each region, we created and ran a forward stepwise pro-
+                                    cedure which attempted to find pairs of gene expression boolean masks
+                                    such that the conditional entropy of the target area&#8217;s boolean mask, con-
+                                    ditioned upon the pair of gene expression boolean masks, is minimized.
+                                       This finds pairs of genes which are most informative (at least at these
+                                    discretization thresholds) relative to the question, &#8220;Is this surface pixel
+                                    a member of the target area?&#8221;. Its advantage over linear methods such
+                                    as logistic regression is that it takes account of arbitrarily nonlinear re-
+                                    lationships; for example, if the XOR of two variables predicts the target,
+                                    conditional entropy would notice, whereas linear methods would not.
+              
+   
+Figure 4: Upper left: wwc1. Upper right: mtif2.
+Lower left:  wwc1 + mtif2 (each pixel&#8217;s value on
+the lower left is the sum of the corresponding pix-
+els in the upper row).                            Gradient  similarity We  noticed  that  the  previous  two  scoring
+                                    methods, which are pointwise, often found genes whose pattern of ex-
+                                    pression did not look similar in shape to the target region.   For this
+                                    reason we designed a non-pointwise local scoring method to detect when
+                                    a gene had a pattern of expression which looked like it had a boundary
+                                    whose shape is similar to the shape of the target region.  We call this
+                                    scoring method &#8220;gradient similarity&#8221;.
+                                       One might say that gradient similarity attempts to measure how
+                                    much the border of the area of gene expression and the border of the
+                                    target region overlap.  However, since gene expression falls off continu-
+                                    ously rather than jumping from its maximum value to zero, the spatial
+                                    pattern of a gene&#8217;s expression often does not have a discrete border.
+                                    Therefore,  instead of looking for a discrete border,  we look for large
+                                    gradients.  Gradient similarity is a symmetric function over two images
+                                    (i.e.  two scalar fields).  It is is high to the extent that matching pixels
+                                    which have large values and large gradients also have gradients which
+                                    are oriented in a similar direction. The formula is:
+                                       &#x2211;
+                                    pixel<img src="cmsy7-32.png" alt="&#x2208;" />pixels cos(abs(&#x2220;&#x2207;1 -&#x2220;&#x2207;2)) &#x22C5;|&#x2207;1| + |&#x2207;2| 
-where &#x2207;1  and &#x2207;2  are the gradient vectors of the two images at the current pixel; &#x2220;&#x2207;i is the angle of the gradient of
-image i at the current pixel; |&#x2207;i| is the magnitude of the gradient of image i at the current pixel; and pixel_valuei is the
-value of the current pixel in image i.
+                                       where &#x2207;1  and &#x2207;2  are the gradient vectors of the two images at the
+current pixel; &#x2220;&#x2207;i is the angle of the gradient of image i at the current pixel; |&#x2207;i| is the magnitude of the gradient of image
+i at the current pixel; and pixel_valuei is the value of the current pixel in image i.
-Most of the genes in Figure 4 were identified via gradient similarity.
+Most of the genes in Figure 5 were identified via gradient similarity.
-
-                                                        
-                                                        
-Figure 3: The top row shows the three genes which (individually) best predict area AUD, according to logistic regression.
-The bottom row shows the three genes which (individually) best match area AUD, according to gradient similarity.  From
-left to right and top to bottom, the genes are Ssr1, Efcbp1, Aph1a, Ptk7, Aph1a again, and Lepr
+_________________________________________
+  17For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
@@ -420,93 +465,133 @@
-Areas which can be identified by single genes Using gradient similarity, we have already found single genes which
-roughly identify some areas and groupings of areas. For each of these areas, an example of a gene which roughly identifies
-it is shown in Figure 4. We have not yet cross-verified these genes in other atlases.
-In addition, there are a number of areas which are almost identified by single genes:  COAa+NLOT (anterior part of
-cortical amygdalar area, nucleus of the lateral olfactory tract), ENT (entorhinal), ACAv (ventral anterior cingulate), VIS
-(visual), AUD (auditory).
-These results validate our expectation that the ABA dataset can be exploited to find marker genes for many cortical
-areas, while also validating the relevancy of our new scoring method, gradient similarity.
-Combinations of multiple genes are useful and necessary for some areas
-In Figure 5, we give an example of a cortical area which is not marked by any single gene, but which can be identified
-combinatorially.  This shows that our proposal to develop a method to find combinations of marker genes is both possible
-and necessary.
-Feature selection integrated with prediction As noted earlier, in general, any predictive method can be used for
-feature selection by running it inside a stepwise wrapper. Also, some predictive methods integrate soft constraints on number
-of features used. Examples of both of these will be seen in the section &#8220;Multivariate Predictive methods&#8221;.
-Multivariate Predictive methods
-Forward stepwise logistic regression Logistic regression is a popular method for predictive modeling of categorial data.
-As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise logistic regression to
-find single genes, pairs of genes, and triplets of genes which predict areal identify.  This is an example of feature selection
-integrated with prediction using a stepwise wrapper. Some of the single genes found were shown in various figures throughout
-this document, and Figure 5 shows a combination of genes which was found.
+              
+              
+              
+              
+Figure 5:   From  left  to  right  and  top  to  bot-
+tom, single  genes  which  roughly  identify  ar-
+eas SS (somatosensory primary +supplemental),
+SSs (supplemental  somatosensory),  PIR  (piri-
+form), FRP  (frontal  pole),  RSP  (retrosplenial),
+COApm (Cortical amygdalar, posterior part, me-
+dial zone).   Grouping  some  areas  together,  we
+have also  found  genes  to  identify  the  groups
+ACA+PL+ILA+DP+ORB+MO (anterior cingu-
+late, prelimbic, infralimbic, dorsal peduncular, or-
+bital, motor), posterior and lateral visual (VISpm,
+VISpl, VISI, VISp; posteromedial, posterolateral,
+lateral, and primary visual; the posterior and lat-
+eral visual area is distinguished from its neigh-
+bors, but not from the entire rest of the cortex).
+The genes are Pitx2, Aldh1a2, Ppfibp1, Slco1a5,
+Tshz2, Trhr, Col12a1, Ets1.                     Areas which can be identified by single genes Using gradient
+                                    similarity,  we have already found single genes which roughly identify
+                                    some areas and groupings of areas. For each of these areas, an example
+                                    of a gene which roughly identifies it is shown in Figure 5.  We have not
+                                    yet cross-verified these genes in other atlases.
+                                       In addition, there are a number of areas which are almost identified
+                                    by single genes: COAa+NLOT (anterior part of cortical amygdalar area,
+                                    nucleus of the lateral olfactory tract), ENT (entorhinal), ACAv (ventral
+                                    anterior cingulate), VIS (visual), AUD (auditory).
+                                       These results validate our expectation that the ABA dataset can
+                                    be exploited to find marker genes for many cortical areas, while also
+                                    validating the relevancy of our new scoring method, gradient similarity.
+                                       Combinations of multiple genes are useful and necessary for
+                                    some areas
+                                       In Figure 4, we give an example of a cortical area which is not marked
+                                    by any single gene, but which can be identified combinatorially.  Acc-
+                                    cording to logistic regression, gene wwc1 is the best fit single gene for
+                                    predicting whether or not a pixel on the cortical surface belongs to the
+                                    motor area (area MO). The upper-left picture in Figure 4 shows wwc1&#8217;s
+                                    spatial expression pattern over the cortex. The lower-right boundary of
+                                    MO is represented reasonably well by this gene, however the gene over-
+                                    shoots the upper-left boundary.  This flattened 2-D representation does
+                                    not show it, but the area corresponding to the overshoot is the medial
+                                    surface of the cortex. MO is only found on the lateral surface. Gene mtif2
+                                    is shown in the upper-right.  Mtif2 captures MO&#8217;s upper-left boundary,
+                                    but not its lower-right boundary.  Mtif2 does not express very much on
+                                    the medial surface. By adding together the values at each pixel in these
+                                    two figures, we get the lower-left image. This combination captures area
+                                    MO much better than any single gene.
+                                       This shows that our proposal to develop a method to find combina-
+                                    tions of marker genes is both possible and necessary.
+                                       Feature selection integrated with prediction As noted earlier,
+                                    in general, any predictive method can be used for feature selection by
+                                    running it inside a stepwise wrapper.  Also, some predictive methods
+                                    integrate soft constraints on number of features used. Examples of both
+                                    of these will be seen in the section &#8220;Multivariate Predictive methods&#8221;.
+                                     Multivariate Predictive methods
+                                    Forward stepwise logistic regression Logistic regression is a popu-
+                                    lar method for predictive modeling of categorial data.  As a pilot run,
+                                    for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed
+                                    forward stepwise logistic regression to find single genes, pairs of genes,
+                                    and triplets of genes which predict areal identify.  This is an example
+                                    of feature selection integrated with prediction using a stepwise wrapper.
+                                    Some of the single genes found were shown in various figures throughout
-  17For each gene, a logistic regression in which the response variable was whether or not a surface pixel was within area AUD, and the predictor
+                                    this document, and Figure 4 shows a combination of genes which was
+                                    found.
+                                       We felt that, for single genes, gradient similarity did a better job
+than logistic regression at capturing our subjective impression of a &#8220;good gene&#8221;.
-                                                                   
-                                                                   
-Figure 4:  From left to right and top to bottom, single genes which roughly identify areas SS (somatosensory primary +
-supplemental), SSs (supplemental somatosensory), PIR (piriform), FRP (frontal pole), RSP (retrosplenial), COApm (Corti-
-cal amygdalar, posterior part, medial zone). Grouping some areas together, we have also found genes to identify the groups
-ACA+PL+ILA+DP+ORB+MO (anterior cingulate, prelimbic, infralimbic, dorsal peduncular, orbital, motor), posterior
-and lateral visual (VISpm, VISpl, VISI, VISp; posteromedial, posterolateral, lateral, and primary visual; the posterior and
-lateral visual area is distinguished from its neighbors, but not from the entire rest of the cortex).  The genes are Pitx2,
-Aldh1a2, Ppfibp1, Slco1a5, Tshz2, Trhr, Col12a1, Ets1.
-                                            
-                                
-Figure 5:  Upper left:  wwc1.  Upper right:  mtif2.  Lower left:  wwc1 + mtif2 (each pixel&#8217;s value on the lower left is the
-sum of the corresponding pixels in the upper row).  Acccording to logistic regression, gene wwc1 is the best fit single gene
-for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in
-Figure 5 shows wwc1&#8217;s spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably
-well by this gene, however the gene overshoots the upper-left boundary.  This flattened 2-D representation does not show
-it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the lateral surface.
-Gene mtif2 is shown in the upper-right. Mtif2 captures MO&#8217;s upper-left boundary, but not its lower-right boundary. Mtif2
-does not express very much on the medial surface. By adding together the values at each pixel in these two figures, we get
-the lower-left image. This combination captures area MO much better than any single gene.
-We felt that, for single genes, gradient similarity did a better job than logistic regression at capturing our subjective
-impression of a &#8220;good gene&#8221;.
-SVM on all genes at once
-In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical
-surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%19. This shows that
-the genes included in the ABA dataset are sufficient to define much of cortical anatomy. As noted above, however, a classifier
-that looks at all the genes at once isn&#8217;t as practically useful as a classifier that uses only a few genes.
-Data-driven redrawing of the cortical map
-We have applied the following dimensionality reduction algorithms to reduce the dimensionality of the gene expression
-profile associated with each voxel: Principal Components Analysis (PCA), Simple PCA (SPCA), Multi-Dimensional Scaling
-(MDS), Isomap, Landmark Isomap, Laplacian eigenmaps, Local Tangent Space Alignment (LTSA), Hessian locally linear
-embedding, Diffusion maps, Stochastic Neighbor Embedding (SNE), Stochastic Proximity Embedding (SPE), Fast Maximum
-Variance Unfolding (FastMVU), Non-negative Matrix Factorization (NNMF). Space constraints prevent us from showing
-many of the results, but as a sample, PCA, NNMF, and landmark Isomap are shown in the second, third, and fourth rows
-of Figure 6.
+
+
+                                          
+Figure 6:  First row:  the first 6 reduced dimensions, using PCA. Second
+row: the first 6 reduced dimensions, using NNMF. Third row:  the first
+six reduced dimensions, using landmark Isomap.  Bottom row:  examples
+of kmeans clustering applied to reduced datasets to find 7 clusters.  Left:
+19 of the major subdivisions of the cortex. Second from left: PCA. Third
+from left:  NNMF. Right:  Landmark Isomap.  Additional details:  In the
+third and fourth rows, 7 dimensions were found, but only 6 displayed.  In
+the last row: for PCA, 50 dimensions were used; for NNMF, 6 dimensions
+were used; for landmark Isomap, 7 dimensions were used.                  SVM on all genes at once
+                                                           In order to see how well one can do when
+                                                         looking at all genes at once, we ran a support
+                                                         vector machine to classify cortical surface pix-
+                                                         els based on their gene expression profiles.  We
+                                                         achieved classification accuracy of about 81%19.
+                                                         This shows that the genes included in the ABA
+                                                         dataset are sufficient to define much of cortical
+                                                         anatomy. As noted above, however, a classifier
+                                                         that looks at all the genes at once isn&#8217;t as prac-
+                                                         tically useful as a classifier that uses only a few
+                                                         genes.
+                                                         Data-driven redrawing of the cor-
+                                                         tical map
+                                                         We  have  applied  the  following  dimensional-
+                                                         ity reduction algorithms to reduce the dimen-
+                                                         sionality of the gene expression profile associ-
+                                                         ated  with  each  voxel:  Principal  Components
+                                                         Analysis (PCA), Simple PCA (SPCA), Multi-
+                                                         Dimensional  Scaling  (MDS),  Isomap,  Land-
+                                                         mark Isomap, Laplacian eigenmaps, Local Tan-
+                                                         gent Space Alignment (LTSA), Hessian locally
+                                                         linear  embedding,  Diffusion  maps,  Stochastic
+                                                         Neighbor Embedding (SNE), Stochastic Prox-
+                                                         imity Embedding (SPE), Fast Maximum Vari-
+                                                         ance Unfolding (FastMVU), Non-negative Ma-
+                                                         trix Factorization (NNMF). Space constraints
+                                                         prevent us from showing many of the results,
+                                                         but as a sample, PCA, NNMF, and landmark
+                                                         Isomap are shown in the first, second, and third
+                                                         rows of Figure 6.
-row of Figure 6.  To compare, the first row of Figure 6 shows some of the major subdivisions of cortex.  These results
-clearly show that different dimensionality reduction techniques capture different aspects of the data and lead to different
-clusterings, indicating the utility of our proposal to produce a detailed comparion of these techniques as applied to the
-domain of genomic anatomy.
-todo: nnmf 7
+row of Figure 6. To compare, the leftmost picture on the bottom row of Figure 6 shows some of the major subdivisions of
+cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data
+and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparion of these techniques
+as applied to the domain of genomic anatomy.
-
-                                  
-             
-            
-             
-                                                        
-Figure 6:  Top row:  19 of the major subdivisions of the cortex.  Second row:  the first 6 reduced dimensions, using PCA.
-Third row:  the first 6 reduced dimensions, using NNMF. Fourth row:  the first six reduced dimensions, using landmark
-Isomap.  Bottom row:  examples of kmeans clustering applied to reduced datasets to find 7 clusters.  Left:  PCA. Middle:
-NNMF. Right:  Landmark Isomap.  Additional details:  In the third and fourth rows, 7 dimensions were found, but only 6
-displayed. In the last row: for PCA, 50 dimensions were used; for NNMF, 6 dimensions were used; for landmark Isomap, 7
-dimensions were used.
author	bshanks@bshanks.dyndns.org
date	Mon Apr 20 16:23:22 2009 -0700 (16 years ago)
parents	60d7c1c1b94f
children	48dae6cb2c09