cg

changeset 99:a48955c639d4
.
author: bshanks@bshanks.dyndns.org
date: Wed Apr 22 06:43:51 2009 -0700 (16 years ago)
parents: a75c226cbdd6
children: fa7c0a924e7a
files: grant.doc grant.html grant.odt grant.pdf grant.txt
--- a/grant.html	Wed Apr 22 06:23:09 2009 -0700
+++ b/grant.html	Wed Apr 22 06:43:51 2009 -0700
@@ -77,7 +77,7 @@
-have idiosyncratic anatomy.  Subjects may be improperly registred to the atlas.  The method used to measure
+have idiosyncratic anatomy.  Subjects may be improperly registered to the atlas.  The method used to measure
@@ -175,7 +175,7 @@
-                               query by demarcating regions and then specifing either the strength of
+                               query by demarcating regions and then specifying either the strength of
@@ -195,8 +195,8 @@
-match a target image. Their match score is Jaccard similarity.
-_________________________________________
+match a target image.
+_____________________
@@ -226,8 +226,8 @@
-                               a hierarchial tree of clusters, rather than a single set of clusters which
-partition the voxels. This is called hierarchial clustering.
+                               a hierarchical tree of clusters, rather than a single set of clusters which
+partition the voxels. This is called hierarchical clustering.
@@ -290,7 +290,7 @@
-                               which seems to pick out a single, spatially continguous region.  This
+                               which  seems  to  pick  out  a  single,  spatially  contiguous  region.   This
@@ -306,31 +306,30 @@
-                               tion (NNMF), and a hierarchial recursive bifurcation clustering scheme
+                               tion (NNMF), and a hierarchical recursive bifurcation clustering scheme
-                                  AGEA[15] includes a preset hierarchial clustering of voxels based
+                                  AGEA[15] includes a preset hierarchical clustering of voxels based
-                               cluster the genes within that dataset.  EMAGE clusters via hierarchial
-                               complete linkage clustering with un-centred correlation as the similarity
-                               score.
-                                  [6] clustered genes.  For each cluster, prototypical spatial expres-
-                               sion patterns were created by averaging the genes in the cluster.  The
-                               prototypes were analyzed manually, without clustering voxels.
+                               cluster the genes within that dataset. EMAGE clusters via hierarchical
+                               complete linkage clustering.
+                                  [6] clusters genes. For each cluster, prototypical spatial expression
+                               patterns were created by averaging the genes in the cluster.  The pro-
+                               totypes were analyzed manually, without clustering voxels.
-cation has not yet been found.  The projects using gene expression on cortex did not attempt to make use of
+                               cation has not yet been found. The projects using gene expression on
+cortex did not attempt to make use of the radial profile of gene expression.  Also, none of these projects did a
-the radial profile of gene expression.  Also, none of these projects did a separate dimensionality reduction step
-before clustering pixels, none tried to cluster genes first in order to guide automated clustering of pixels into
-spatial regions, and none used co-clustering algorithms.
+separate dimensionality reduction step before clustering pixels, none tried to cluster genes first in order to guide
+automated clustering of pixels into spatial regions, and none used co-clustering algorithms.
@@ -400,11 +399,11 @@
-areas; AGEA&#8217;s Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA&#8217;s hierarchial
+areas; AGEA&#8217;s Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA&#8217;s hierarchical
-work on computationally finding marker genes for cortical areas, or on finding a hierarchial clustering that will
+work on computationally finding marker genes for cortical areas, or on finding a hierarchical clustering that will
@@ -414,7 +413,7 @@
-are overlayed.                                        The  method  developed  in  aim  (1)  will  be  applied  to
+are overlaid.                                          The  method  developed  in  aim  (1)  will  be  applied  to
@@ -529,7 +528,7 @@
-can be identified combinatorially.  Acccording to logistic regression, gene wwc1 is the best fit single gene for
+can be identified combinatorially.   According to logistic regression,  gene wwc1 is the best fit single gene for
@@ -542,7 +541,7 @@
-gorial data. As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise
+gorical data. As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise
@@ -565,7 +564,7 @@
-produce a detailed comparion of these techniques as applied to the domain of genomic anatomy.
+produce a detailed comparison of these techniques as applied to the domain of genomic anatomy.
@@ -605,7 +604,7 @@
-support vector machines.
+support vector machines (SVMs).
@@ -644,7 +643,7 @@
-profiles, the same techniques can be applied instead to the pixels13. It is possible that the features generated in
+profiles, the same techniques can be applied instead to the pixels.  It is possible that the features generated in
@@ -660,7 +659,7 @@
-Co-clustering There are some algorithms which simultaineously incorporate clustering on instances and on
+Co-clustering There are some algorithms which simultaneously incorporate clustering on instances and on
@@ -675,12 +674,6 @@
-_________________________________________
-   13Consider a matrix whose rows represent pixel locations, and whose columns represent genes. An entry in this matrix represents the
-gene expression level at a given pixel. One can look at this matrix as a collection of pixels, each corresponding to a vector of many gene
-expression levels; or one can look at it as a collection of genes, each corresponding to a vector giving that gene&#8217;s expression at each
-pixel.  Similarly, dimensionality reduction can be used to replace a large number of genes with a small number of features, or it can be
-used to replace a large number of pixels with a small number of features.
@@ -689,7 +682,7 @@
-Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify
+Using the methods developed in Aim 2, we will present one or more hierarchical cortical maps. We will identify
@@ -698,35 +691,26 @@
-October 2009-April 2010: Develop scoring methods and to test them in various supervised learning frameworks.
-Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend
-supervised learning frameworks which use multivariate versions of the best scoring methods.
+October 2009-April 2010:  Develop scoring methods, dimensionality reduction, and supervised learning meth-
+ods.
-February-July 2010:  Continue to develop scoring methods and supervised learning frameworks.  Explore the
-best way to integrate radial profiles with supervised learning. Explore the best way to make supervised learning
-techniques robust against incorrect labels (i.e.   when the areas drawn on the input cortical map are slightly
-off). Quantitatively compare the performance of different supervised learning techniques. Validate marker genes
-found in the ABA dataset by checking against other gene expression datasets.  Create documentation and unit
-tests for software toolbox for Aim 1. Respond to user bug reports for Aim 1 software toolbox.
+February-July 2010: Continue to develop scoring methods and supervised learning frameworks.  Extend tech-
+niques for robustness.   Compare the performance of techniques.   Validate marker genes.   Prepare software
+toolbox for Aim 1.
-June 2010-March 2011:  Explore dimensionality reduction algorithms for Aim 2.  Explore standard hierarchial
-clustering algorithms, used in combination with dimensionality reduction, for Aim 2.  Explore co-clustering algo-
-rithms. Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial
-profile information.  Quantitatively compare the performance of different dimensionality reduction and clustering
-techniques.  Quantitatively compare the value of different flatmapping methods and ways of representing radial
-profiles.
+June 2010-March 2011:  Explore dimensionality reduction algorithms for Aim 2.  Explore clustering algorithms.
+Adapt clustering algorithms to use radial profile information. Compare the performance of techniques.
-new ways of organizing the cortex into areas are discovered, read the literature and talk to people to learn about
-research related to interpreting our results.  Create documentation and unit tests for software toolbox for Aim 2.
-Respond to user bug reports for Aim 2 software toolbox.
+new ways of organizing the cortex into areas are discovered, interpret the results.  Prepare software toolbox for
+Aim 2.
-Follow up on responses to our papers. Possibly submit another paper.
+Possibly submit another paper.
--- a/grant.txt	Wed Apr 22 06:23:09 2009 -0700
+++ b/grant.txt	Wed Apr 22 06:43:51 2009 -0700
@@ -79,7 +79,7 @@
-Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects have idiosyncratic anatomy. Subjects may be improperly registred to the atlas. The method used to measure gene expression may be noisy. The atlas may have errors. It is even possible that some areas in the anatomical atlas are "wrong" in that they do not have the same shape as the natural domains of gene expression to which they correspond. These sources of error can affect the displacement and the shape of both the gene expression data and the anatomical target areas. Therefore, it is important to use feature selection methods which are robust to these kinds of errors.
+Both gene expression data and anatomical atlases have errors, due to a variety of factors. Individual subjects have idiosyncratic anatomy. Subjects may be improperly registered to the atlas. The method used to measure gene expression may be noisy. The atlas may have errors. It is even possible that some areas in the anatomical atlas are "wrong" in that they do not have the same shape as the natural domains of gene expression to which they correspond. These sources of error can affect the displacement and the shape of both the gene expression data and the anatomical target areas. Therefore, it is important to use feature selection methods which are robust to these kinds of errors.
@@ -146,7 +146,7 @@
-GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifing either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
+GeneAtlas\cite{carson_digital_2005} and EMAGE \cite{venkataraman_emage_2008} allow the user to construct a search query by demarcating regions and then specifying either the strength of expression or the name of another gene or dataset whose expression pattern is to be matched. Neither GeneAtlas nor EMAGE allow one to search for combinations of genes that define a region in concert but not separately.
@@ -164,7 +164,7 @@
-\cite{hemert_matching_2008} describes a technique to find combinations of marker genes to pick out an anatomical region. They use an evolutionary algorithm to evolve logical operators which combine boolean (thresholded) images in order to match a target image. Their match score is Jaccard similarity.
+\cite{hemert_matching_2008} describes a technique to find combinations of marker genes to pick out an anatomical region. They use an evolutionary algorithm to evolve logical operators which combine boolean (thresholded) images in order to match a target image. %%Their match score is Jaccard similarity.
@@ -188,9 +188,9 @@
-%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchial tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchial clustering.
-
-It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchial tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchial clustering.
+%%It is desirable to determine not just one set of regions, but also how these regions relate to each other, if at all; perhaps some of the regions are more similar to each other than to the rest, suggesting that, although at a fine spatial scale they could be considered separate, on a coarser spatial scale they could be grouped together into one large region. This suggests the outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
+
+It is desirable to determine not just one set of regions, but also how these regions relate to each other. The outcome of clustering may be a hierarchical tree of clusters, rather than a single set of clusters which partition the voxels. This is called hierarchical clustering.
@@ -231,7 +231,7 @@
-Gene clusters could also be used to directly yield a clustering on instances. This is because many genes have an expression pattern which seems to pick out a single, spatially continguous region. This suggests the following procedure: cluster together genes which pick out similar regions, and then to use the more popular common regions as the final clusters. In Preliminary Studies, Figure \ref{geneClusters}, we show that a number of anatomically recognized cortical regions, as well as some "superregions" formed by lumping together a few regions, are associated with gene clusters in this fashion.
+Gene clusters could also be used to directly yield a clustering on instances. This is because many genes have an expression pattern which seems to pick out a single, spatially contiguous region. This suggests the following procedure: cluster together genes which pick out similar regions, and then to use the more popular common regions as the final clusters. In Preliminary Studies, Figure \ref{geneClusters}, we show that a number of anatomically recognized cortical regions, as well as some "superregions" formed by lumping together a few regions, are associated with gene clusters in this fashion.
@@ -249,20 +249,20 @@
-Factorization (NNMF), and a hierarchial recursive bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the usefulness of computational genomic anatomy. We have run NNMF on the cortical dataset
-
-%% \footnote{We ran "vanilla" NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was needed. The paper under discussion also mentions that they tried a hierarchial variant of NNMF, which we have not yet tried.} and while the results are promising, they also demonstrate that NNMF is not necessarily the best dimensionality reduction method for this application (see Preliminary Studies, Figure \ref{dimReduc}).
+Factorization (NNMF), and a hierarchical recursive bifurcation clustering scheme based on correlation as the similarity score. The paper yielded impressive results, proving the usefulness of computational genomic anatomy. We have run NNMF on the cortical dataset
+
+%% \footnote{We ran "vanilla" NNMF, whereas the paper under discussion used a modified method. Their main modification consisted of adding a soft spatial contiguity constraint. However, on our dataset, NNMF naturally produced spatially contiguous clusters, so no additional constraint was needed. The paper under discussion also mentions that they tried a hierarchical variant of NNMF, which we have not yet tried.} and while the results are promising, they also demonstrate that NNMF is not necessarily the best dimensionality reduction method for this application (see Preliminary Studies, Figure \ref{dimReduc}).
-%% \cite{thompson_genomic_2008} reports that both mNNMF and hierarchial mNNMF clustering were useful, and that hierarchial recursive bifurcation gave similar results.
-
-
-AGEA\cite{ng_anatomic_2009} includes a preset hierarchial clustering of voxels based on a recursive bifurcation algorithm with correlation as the similarity metric. EMAGE\cite{venkataraman_emage_2008} allows the user to select a dataset from among a large number of alternatives, or by running a search query, and then to cluster the genes within that dataset. EMAGE clusters via hierarchial complete linkage clustering with un-centred correlation as the similarity score.
+%% \cite{thompson_genomic_2008} reports that both mNNMF and hierarchical mNNMF clustering were useful, and that hierarchical recursive bifurcation gave similar results.
+
+
+AGEA\cite{ng_anatomic_2009} includes a preset hierarchical clustering of voxels based on a recursive bifurcation algorithm with correlation as the similarity metric. EMAGE\cite{venkataraman_emage_2008} allows the user to select a dataset from among a large number of alternatives, or by running a search query, and then to cluster the genes within that dataset. EMAGE clusters via hierarchical complete linkage clustering. %% with un-centered correlation as the similarity score.
-\cite{chin_genome-scale_2007} clustered genes. For each cluster, prototypical spatial expression patterns were created by averaging the genes in the cluster. The prototypes were analyzed manually, without clustering voxels.
+\cite{chin_genome-scale_2007} clusters genes. For each cluster, prototypical spatial expression patterns were created by averaging the genes in the cluster. The prototypes were analyzed manually, without clustering voxels.
@@ -309,12 +309,12 @@
-\cite{ng_anatomic_2009} describes the application of AGEA to the cortex. The paper describes interesting results on the structure of correlations between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either of our aims, as it neither finds marker genes, nor does it suggest a cortical map based on gene expression data. Neither of the other components of AGEA can be applied to cortical areas; AGEA's Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA's hierarchial clustering does not produce clusters corresponding to the cortical areas\footnote{In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas (there may be clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these). The reason that Gene Finder cannot the find marker genes for cortical areas is that, although the user chooses a seed voxel, Gene Finder chooses the ROI for which genes will be found, and it creates that ROI by (pairwise voxel correlation) clustering around the seed.}.
+\cite{ng_anatomic_2009} describes the application of AGEA to the cortex. The paper describes interesting results on the structure of correlations between voxel gene expression profiles within a handful of cortical areas. However, this sort of analysis is not related to either of our aims, as it neither finds marker genes, nor does it suggest a cortical map based on gene expression data. Neither of the other components of AGEA can be applied to cortical areas; AGEA's Gene Finder cannot be used to find marker genes for the cortical areas; and AGEA's hierarchical clustering does not produce clusters corresponding to the cortical areas\footnote{In both cases, the cause is that pairwise correlations between the gene expression of voxels in different areas but the same layer are often stronger than pairwise correlations between the gene expression of voxels in different layers but the same area. Therefore, a pairwise voxel correlation clustering algorithm will tend to create clusters representing cortical layers, not areas (there may be clusters which presumably correspond to the intersection of a layer and an area, but since one area will have many layer-area intersection clusters, further work is needed to make sense of these). The reason that Gene Finder cannot the find marker genes for cortical areas is that, although the user chooses a seed voxel, Gene Finder chooses the ROI for which genes will be found, and it creates that ROI by (pairwise voxel correlation) clustering around the seed.}.
-In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes, (b) there has been almost no comparison of different algorithms or scoring methods, and (c) there has been no work on computationally finding marker genes for cortical areas, or on finding a hierarchial clustering that will yield a map of cortical areas de novo from gene expression data.
+In summary, for all three aims, (a) only one of the previous projects explores combinations of marker genes, (b) there has been almost no comparison of different algorithms or scoring methods, and (c) there has been no work on computationally finding marker genes for cortical areas, or on finding a hierarchical clustering that will yield a map of cortical areas de novo from gene expression data.
@@ -322,7 +322,7 @@
-\caption{Prototypes corresponding to sample gene clusters, clustered by gradient similarity. Region boundaries for the region that most matches each prototype are overlayed.}
+\caption{Prototypes corresponding to sample gene clusters, clustered by gradient similarity. Region boundaries for the region that most matches each prototype are overlaid.}
@@ -443,14 +443,14 @@
-In Figure \ref{MOcombo}, we give an example of a cortical area which is not marked by any single gene, but which can be identified combinatorially. Acccording to logistic regression, gene wwc1 is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the dorsal surface. Gene mtif2 is shown in the upper-right. Mtif2 captures MO's upper-left boundary, but not its lower-right boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these two figures, we get the lower-left image. This combination captures area MO much better than any single gene. 
+In Figure \ref{MOcombo}, we give an example of a cortical area which is not marked by any single gene, but which can be identified combinatorially. According to logistic regression, gene wwc1 is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the dorsal surface. Gene mtif2 is shown in the upper-right. Mtif2 captures MO's upper-left boundary, but not its lower-right boundary. Mtif2 does not express very much on the medial surface. By adding together the values at each pixel in these two figures, we get the lower-left image. This combination captures area MO much better than any single gene. 
-%%Acccording to logistic regression, gene wwc1\footnote{"WW, C2 and coiled-coil domain containing 1"; EntrezGene ID 211652} is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the lateral surface.
+%%According to logistic regression, gene wwc1\footnote{"WW, C2 and coiled-coil domain containing 1"; EntrezGene ID 211652} is the best fit single gene for predicting whether or not a pixel on the cortical surface belongs to the motor area (area MO). The upper-left picture in Figure \ref{MOcombo} shows wwc1's spatial expression pattern over the cortex. The lower-right boundary of MO is represented reasonably well by this gene, but the gene overshoots the upper-left boundary. This flattened 2-D representation does not show it, but the area corresponding to the overshoot is the medial surface of the cortex. MO is only found on the lateral surface.
@@ -466,7 +466,7 @@
-Logistic regression is a popular method for predictive modeling of categorial data. As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise logistic regression to find single genes, pairs of genes, and triplets of genes which predict areal identify. This is an example of feature selection integrated with prediction using a stepwise wrapper. Some of the single genes found were shown in various figures throughout this document, and Figure \ref{MOcombo} shows a combination of genes which was found. 
+Logistic regression is a popular method for predictive modeling of categorical data. As a pilot run, for five cortical areas (SS, AUD, RSP, VIS, and MO), we performed forward stepwise logistic regression to find single genes, pairs of genes, and triplets of genes which predict areal identify. This is an example of feature selection integrated with prediction using a stepwise wrapper. Some of the single genes found were shown in various figures throughout this document, and Figure \ref{MOcombo} shows a combination of genes which was found. 
@@ -489,7 +489,7 @@
-After applying the dimensionality reduction, we ran clustering algorithms on the reduced data. To date we have tried k-means and spectral clustering. The results of k-means after PCA, NNMF, and landmark Isomap are shown in the last row of Figure \ref{dimReduc}. To compare, the leftmost picture on the bottom row of Figure \ref{dimReduc} shows some of the major subdivisions of cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparion of these techniques as applied to the domain of genomic anatomy.
+After applying the dimensionality reduction, we ran clustering algorithms on the reduced data. To date we have tried k-means and spectral clustering. The results of k-means after PCA, NNMF, and landmark Isomap are shown in the last row of Figure \ref{dimReduc}. To compare, the leftmost picture on the bottom row of Figure \ref{dimReduc} shows some of the major subdivisions of cortex. These results clearly show that different dimensionality reduction techniques capture different aspects of the data and lead to different clusterings, indicating the utility of our proposal to produce a detailed comparison of these techniques as applied to the domain of genomic anatomy.
@@ -533,7 +533,7 @@
-We will develop a feature selection procedure for choosing the best small set of marker genes for a given anatomical area. In addition to using the scoring measures that we develop, we will also explore (a) feature selection using a stepwise wrapper over "vanilla" classifiers such as logistic regression, (b) supervised learning methods such as decision trees which incrementally/greedily combine single gene markers into sets, and (c) supervised learning methods which use soft constraints to minimize number of features used, such as sparse support vector machines. 
+We will develop a feature selection procedure for choosing the best small set of marker genes for a given anatomical area. In addition to using the scoring measures that we develop, we will also explore (a) feature selection using a stepwise wrapper over "vanilla" classifiers such as logistic regression, (b) supervised learning methods such as decision trees which incrementally/greedily combine single gene markers into sets, and (c) supervised learning methods which use soft constraints to minimize number of features used, such as sparse support vector machines (SVMs). 
@@ -552,8 +552,9 @@
-Instead of applying dimensionality reduction to the gene expression profiles, the same techniques can be applied instead to the pixels\footnote{Consider a matrix whose rows represent pixel locations, and whose columns represent genes. An entry in this matrix represents the gene expression level at a given pixel. One can look at this matrix as a collection of pixels, each corresponding to a vector of many gene expression levels; or one can look at it as a collection of genes, each corresponding to a vector giving that gene's expression at each pixel. Similarly, dimensionality reduction can be used to replace a large number of genes with a small number of features, or it can be used to replace a large number of pixels with a small number of features.}. It is possible that the features generated in this way by some dimensionality reduction techniques will directly correspond to interesting spatial regions.
-
+Instead of applying dimensionality reduction to the gene expression profiles, the same techniques can be applied instead to the pixels. It is possible that the features generated in this way by some dimensionality reduction techniques will directly correspond to interesting spatial regions.
+
+%% \footnote{Consider a matrix whose rows represent pixel locations, and whose columns represent genes. An entry in this matrix represents the gene expression level at a given pixel. One can look at this matrix as a collection of pixels, each corresponding to a vector of many gene expression levels; or one can look at it as a collection of genes, each corresponding to a vector giving that gene's expression at each pixel. Similarly, dimensionality reduction can be used to replace a large number of genes with a small number of features, or it can be used to replace a large number of pixels with a small number of features.}
@@ -564,7 +565,7 @@
-There are some algorithms which simultaineously incorporate clustering on instances and on features (in our case, genes and pixels), for example, IRM\cite{kemp_learning_2006}. These are called co-clustering or biclustering algorithms.
+There are some algorithms which simultaneously incorporate clustering on instances and on features (in our case, genes and pixels), for example, IRM\cite{kemp_learning_2006}. These are called co-clustering or biclustering algorithms.
@@ -583,7 +584,7 @@
-Using the methods developed in Aim 2, we will present one or more hierarchial cortical maps. We will identify and explain how the statistical structure in the gene expression data led to any unexpected or interesting features of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of areas, which are discovered.
+Using the methods developed in Aim 2, we will present one or more hierarchical cortical maps. We will identify and explain how the statistical structure in the gene expression data led to any unexpected or interesting features of these maps, and we will provide biological hypotheses to interpret any new cortical areas, or groupings of areas, which are discovered.
@@ -600,18 +601,18 @@
-\\ **October 2009-April 2010**: Develop scoring methods and to test them in various supervised learning frameworks. Also test out various dimensionality reduction schemes in combination with supervised learning. create or extend supervised learning frameworks which use multivariate versions of the best scoring methods.
+\\ **October 2009-April 2010**: Develop scoring methods, dimensionality reduction, and supervised learning methods.
-\\ **February-July 2010**: Continue to develop scoring methods and supervised learning frameworks. Explore the best way to integrate radial profiles with supervised learning. Explore the best way to make supervised learning techniques robust against incorrect labels (i.e. when the areas drawn on the input cortical map are slightly off). Quantitatively compare the performance of different supervised learning techniques. Validate marker genes found in the ABA dataset by checking against other gene expression datasets. Create documentation and unit tests for software toolbox for Aim 1. Respond to user bug reports for Aim 1 software toolbox.
+\\ **February-July 2010**: Continue to develop scoring methods and supervised learning frameworks. Extend techniques for robustness. Compare the performance of techniques. Validate marker genes. Prepare software toolbox for Aim 1.
-\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms for Aim 2. Explore standard hierarchial clustering algorithms, used in combination with dimensionality reduction, for Aim 2. Explore co-clustering algorithms. Think about how radial profile information can be used for Aim 2. Adapt clustering algorithms to use radial profile information. Quantitatively compare the performance of different dimensionality reduction and clustering techniques. Quantitatively compare the value of different flatmapping methods and ways of representing radial profiles.
+\\ **June 2010-March 2011**: Explore dimensionality reduction algorithms for Aim 2. Explore clustering algorithms. Adapt clustering algorithms to use radial profile information. Compare the performance of techniques. 
-\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, read the literature and talk to people to learn about research related to interpreting our results. Create documentation and unit tests for software toolbox for Aim 2. Respond to user bug reports for Aim 2 software toolbox.
+\\ **February-May 2011**: Using the methods developed for Aim 2, explore the genomic anatomy of the cortex. If new ways of organizing the cortex into areas are discovered, interpret the results. Prepare software toolbox for Aim 2.
-\\ **May-August 2011**: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Follow up on responses to our papers. Possibly submit another paper.
+\\ **May-August 2011**: Revisit Aim 1 to see if what was learned during Aim 2 can improve the methods for Aim 1. Possibly submit another paper.
author	bshanks@bshanks.dyndns.org
date	Wed Apr 22 06:43:51 2009 -0700 (16 years ago)
parents	a75c226cbdd6
children	fa7c0a924e7a
files	grant.doc grant.html grant.odt grant.pdf grant.txt