cg

changeset 18:5d6dfc57654a
.
author: bshanks@bshanks.dyndns.org
date: Sun Apr 12 15:34:12 2009 -0700 (16 years ago)
parents: ff9b47f2c7d3
children: 717d4025b861
files: grant.html grant.odt grant.pdf grant.txt
--- a/grant.html	Sun Apr 12 04:01:58 2009 -0700
+++ b/grant.html	Sun Apr 12 15:34:12 2009 -0700
@@ -145,7 +145,10 @@
-               todo
+               A crucial choice when designing a clustering method is how to measure
+            similarity, across either pairs of instances, or clusters, or both.  There is much
+            overlap between scoring methods for feature selection (discussed above under
+            Aim 1) and scoring methods for similarity.
@@ -173,11 +176,11 @@
+                                            4
+
-                                            4
-
@@ -213,11 +216,7 @@
-               The cortex is divided into areas and layers.  To a first approximation, the
-            parcellation of the cortex into areas can be drawn as a 2-D map on the surface of
-            the cortex.  In the third dimension, the boundaries between the areas continue
-            downwards into the cortical depth,  perpendicular to the surface.   The layer
-__________________________
+_______________
@@ -225,6 +224,10 @@
+               The cortex is divided into areas and layers.  To a first approximation, the
+            parcellation of the cortex into areas can be drawn as a 2-D map on the surface of
+            the cortex.  In the third dimension, the boundaries between the areas continue
+            downwards into the cortical depth,  perpendicular to the surface.   The layer
@@ -266,14 +269,50 @@
+                                            6
+
-            todo
+            There does not appear to be much work on the automated analysis of spatial
+            gene expression data.
+               There is a substantial body of work on the analysis of gene expression data,
+            however, most of this concerns gene expression data which is not fundamentally
+            spatial, for example, microarray datasets.  In some cases, a few locations have
+            been sampled, but such a dataset is still of a fundamentally different character
+            than a dataset containing a large grid of sampling points distributed over space.
+            In relating gene expression to anatomy, it is the spatial aspects of the problem
+            which are the most important.
+               As noted above, there has been much work on both supervised learning and
+            clustering, and there are many available algorithms for each.  Many of these
+            algorithms are flexible enough to accomodate new scoring measures; and the
+            performance of most of the algorithms is greatly affected by preprocessing and
+            by the choice of which representation to use for feature values. We think it likely
+            that for this application, the development of domain-specific scoring measures
+            (such as gradient similarity, which is discussed in Preliminary Work) will be
+            necessary in order to achieve the best results. In essence, the machine learning
+            community has provided algorithms, but the scientist must provide a framework
+            for representing the problem domain, and the way that this framework is set
+            up has a large impact on performance. Creating a good framework can require
+            creatively reconceptualizing the problem domain, and is not merely a mechanical
+            &#8220;fine-tuning&#8221; of numerical parameters.  Therefore, the completion of Aims 1
+            and 2 involves more than just reimplementing an existing algorithm, and more
+            than just choosing between a set of existing algorithms, and will constitute a
+            substantial contribution to biology.
+               We are aware of one other effort to computationally analyze spatial gene
+            expression data.
+               In the Preliminary Work, we show that
+               The creation of a domain-specific scoring measure may be required in order
+            to achieve good performance, and it is not impossible that the algorithms them-
+            selves will have to be extended.  We plan to test out existing algorithms and
+            scoring measures,
+               Therefore, we anticipate
+               Therefore, it is unclear which of the
+               todo
-                                            6
-
+                                            7
+
@@ -305,6 +344,12 @@
+            seems to partially line up with the border of AUD; its weakness is that this
+            includes genes which don&#8217;t express over the entire area. Genes which have high
+            rankings using both pointwise and border criteria, such as Aph1a in the example,
+            may be particularly good markers.   None of these genes are,  individually,  a
+            perfect marker for AUD; we deliberately chose a &#8220;difficult&#8221; area in order to
+            better contrast pointwise with geometric methods.
@@ -315,7 +360,7 @@
-                                            7
+                                            8
@@ -327,8 +372,6 @@
-                                            8
-
@@ -336,15 +379,11 @@
-            seems to partially line up with the border of AUD; its weakness is that this
-            includes genes which don&#8217;t express over the entire area. Genes which have high
-            rankings using both pointwise and border criteria, such as Aph1a in the example,
-            may be particularly good markers.   None of these genes are,  individually,  a
-            perfect marker for AUD; we deliberately chose a &#8220;difficult&#8221; area in order to
-            better contrast pointwise with geometric methods.
+                                            9
+
-             Aim 1 (and Aim 3)
+             Specific to Aim 1 (and Aim 3)
@@ -354,27 +393,18 @@
-__________________________
-   6Using the Shogun SVM package (todo:cite), with parameters type=GMNPSVM (multi-
-class b-SVM), kernal = gaussian with sigma = 0.1, c = 10, epsilon = 1e-1 &#8211; these are the
-first parameters we tried, so presumably performance would improve with different choices of
-parameters. 5-fold cross-validation.
-                                            9
-
-             Aim 2 (and Aim 3)
-             Raw dimensionality reduction results
-             Dimensionality reduction plus K-means or spectral clus-
-            tering
-            Many areas are captured by clusters of genes
+             Specific to Aim 2 (and Aim 3)
+            Raw dimensionality reduction results
+               Dimensionality reduction plus K-means or spectral clustering
+               Many areas are captured by clusters of genes
-            todo
-               amongst other thigns:
+            todo amongst other things:
@@ -387,6 +417,10 @@
+__________________________
+   65-fold cross-validation.
+                                            10
+
@@ -397,8 +431,6 @@
-                                            10
-
@@ -424,8 +456,9 @@
-______________________________________________
-    stuff  i  dunno  where  to  put  yet  (there  is  more  scattered  through  grant-
+                                            11
+
+            _______________________________________________________________________________________________________ stuff i dunno where to put yet (there is more scattered through grant-
@@ -436,16 +469,14 @@
-                                            11
-
-            manifold into a plane is optimal for this application. We will compare mappings
-            which attempt to preserve size (such as the one used by Caret??) with mappings
-            which preserve angle (conformal maps).
-               Although there is much 2-D organization in anatomy, there are also struc-
-            tures whose shape is fundamentally 3-dimensional.  If possible, we would like
-            the method we develop to include a statistical test that warns the user if the
-            assumption of 2-D structure seems to be wrong.
-               todo: replace aim # bullet pts with #s
+manifold into a plane is optimal for this application. We will compare mappings
+which attempt to preserve size (such as the one used by Caret??) with mappings
+which preserve angle (conformal maps).
+    Although there is much 2-D organization in anatomy, there are also struc-
+tures whose shape is fundamentally 3-dimensional.  If possible, we would like
+the method we develop to include a statistical test that warns the user if the
+assumption of 2-D structure seems to be wrong.
+    todo: replace aim # bullet pts with #s
--- a/grant.txt	Sun Apr 12 04:01:58 2009 -0700
+++ b/grant.txt	Sun Apr 12 15:34:12 2009 -0700
@@ -79,8 +79,7 @@
-
-todo
+A crucial choice when designing a clustering method is how to measure similarity, across either pairs of instances, or clusters, or both. There is much overlap between scoring methods for feature selection (discussed above under Aim 1) and scoring methods for similarity. 
@@ -137,6 +136,23 @@
+There does not appear to be much work on the automated analysis of spatial gene expression data. 
+
+There is a substantial body of work on the analysis of gene expression data, however, most of this concerns gene expression data which is not fundamentally spatial, for example, microarray datasets. In some cases, a few locations have been sampled, but such a dataset is still of a fundamentally different character than a dataset containing a large grid of sampling points distributed over space. In relating gene expression to anatomy, it is the spatial aspects of the problem which are the most important.
+
+As noted above, there has been much work on both supervised learning and clustering, and there are many available algorithms for each. Many of these algorithms are flexible enough to accomodate new scoring measures; and the performance of most of the algorithms is greatly affected by preprocessing and by the choice of which representation to use for feature values. We think it likely that for this application, the development of domain-specific scoring measures (such as gradient similarity, which is discussed in Preliminary Work) will be necessary in order to achieve the best results. In essence, the machine learning community has provided algorithms, but the scientist must provide a framework for representing the problem domain, and the way that this framework is set up has a large impact on performance. Creating a good framework can require creatively reconceptualizing the problem domain, and is not merely a mechanical "fine-tuning" of numerical parameters. Therefore, the completion of Aims 1 and 2 involves more than just reimplementing an existing algorithm, and more than just choosing between a set of existing algorithms, and will constitute a substantial contribution to biology.
+
+We are aware of one other effort to computationally analyze spatial gene expression data. 
+
+
+In the Preliminary Work, we show that 
+
+The creation of a domain-specific scoring measure may be required in order to achieve good performance, and it is not impossible that the algorithms themselves will have to be extended. We plan to test out existing algorithms and scoring measures, 
+
+Therefore, we anticipate 
+
+Therefore, it is unclear which of the 
+
@@ -199,14 +215,14 @@
-=== Aim 1 (and Aim 3) ===
+=== Specific to Aim 1 (and Aim 3) ===
-In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%\footnote{Using the Shogun SVM package (todo:cite), with parameters type=GMNPSVM (multiclass b-SVM), kernal = gaussian with sigma = 0.1, c = 10, epsilon = 1e-1 -- these are the first parameters we tried, so presumably performance would improve with different choices of parameters. 5-fold cross-validation.}. As noted above, however, a classifier that looks at all the genes at once isn't practically useful. 
+In order to see how well one can do when looking at all genes at once, we ran a support vector machine to classify cortical surface pixels based on their gene expression profiles. We achieved classification accuracy of about 81%\footnote{5-fold cross-validation.}. As noted above, however, a classifier that looks at all the genes at once isn't practically useful. 
@@ -217,12 +233,12 @@
-=== Aim 2 (and Aim 3) ===
-
-=== Raw dimensionality reduction results ===
-
-
-=== Dimensionality reduction plus K-means or spectral clustering ===
+=== Specific to Aim 2 (and Aim 3) ===
+
+**Raw dimensionality reduction results**
+
+
+**Dimensionality reduction plus K-means or spectral clustering**
@@ -244,9 +260,7 @@
-todo
-
-amongst other thigns:
+todo amongst other things:
author	bshanks@bshanks.dyndns.org
date	Sun Apr 12 15:34:12 2009 -0700 (16 years ago)
parents	ff9b47f2c7d3
children	717d4025b861
files	grant.html grant.odt grant.pdf grant.txt