Loraine Lab Research

Finding Functional Neighbors with Co-expression

Expression microarray data are getting better and more abundant. As a result, it is now possible to answer many previously intractable questions simply by mining microarray data in clever ways.

Our group is interested in taking advantage of microarray data to explore a wide range of questions, from the practical issues of how individual probes behave across hundreds of sample types to more far-reaching questions addressing how co-expression networks in different cell types adjust to abiotic stress or disease.

Highly co-expressed genes from the Arabidopsis glycolysis pathway

Highly-co-expressed genes from the Arabidopsis glycolysis pathway.

Thus far we have focused mainly on analyzing publicly-available data sets from Arabidopsis thaliana, a small, easy-to-grow plant that has been well-developed as a model system in plant biology.

The project began as a collaboration with the Chris Somerville lab at the Carnegie Insitute. Working with Staffan Persson, then a postdoc in the Somerville lab, postdoc Hairong Wei examined patterns of co-expression among genes encoding metabolic pathway enzymes, using an early version of the AraCyc database of metabolic pathways in Arabidopsis thaliana. We found that genes encoding pathway enzymes are highly-coexpressed relative to genes selected at random from the entire pool.

This result suggested a way to use co-expression analysis to search for genes that might play a role in these same pathways, either as new members, regulators, or consumers pathway end-products.

We developed a search method called Pathway-Level Co-expression that uses patterns of co-expression among a group of query genes to identify candidate functional neighbors, other genes in the genome that might play a role in whatever process the query genes mediate. The method accepts a list of query genes and then compares each query with all other genes in the genome. Next, it ranks each gene according to its degree of co-expression with the query group. Genes that are co-expressed at higher levels (e.g., with larger r2 values from pairwise linear regression) with more members of the query group appear nearer the top of the results list. The top-ranking genes represent likely candidates for involvement in the query genes' common function. The method works best when the query genes are themselves co-expressed and if the process they mediate is subject to regulation at the level of mRNA abundance.

We published an implementation of the method as a free, user-friendly Web site located at CressExpress.org. Using CressExpress, biologists can enter query genes from Arabidopsis, select subsets of sample types, and then use these samples to compute linear regressions between the query genes and the remaining genes in the genome. The tool uses these results to perform PLC analysis. When the regression and search phase terminates, the tools sends the user an email message that links to zip file with results along with a record of all the parameters they used, including the list of samples.

Working with expression data from Arabidopsis has given us the opportunity to refine and test our methods. Now we are working on ways to expand our system to handle data from other genomes and use these data to address new questions. In parallel, we are planning to use mutant collections from Arabidopsis to test some of the candidates identified through PLC. Of particular interest are pathways that generate phytochemicals that can protect against disease in humans and animals.

To perform co-expression analysis using Arabidopsis data, visit the CressExpress Web site.