GOAT

DictyEnsembl

Home

Genome Sequencing

Modes of Gene Expression

genepath

Download Software

Publications

 


O.R.D.E.R
(intranet only)

 

GOAT: an R Tool for Analyzing Gene Ontology Term Enrichment

Xu Q and Shaulsky G. GOAT: an R tool for analyzing Gene Ontology term enrichment. 2005. Applied Bioinformatics 4:281-283
Note: These packages are for R 1.8.x or 1.9.x only, the GOAT package for R >2.0 will come soon.

Introduction

GOAT is a component of the microarray analysis tool package used in our lab to analyze Dictyostelium discoideum expression data. It has been extended so that it can be used to analyze GO enrichment for gene lists from any organism with genome-scale annotation data. GOAT is an R package with a Tk graphical user interface (GUI), so it can be integrated into other R based microarray data analysis packages such as BioConductor.

Both the source code and compiled library are freely distributed "as is".

Download

The current version of GOAT is 2.0, and requires an R installation of version 1.81 or higher. You can download that from the R Project.

Binary distribution for Windows Binary distribution for Linux(Redhat 9.0) Binary distribution for MacOSX(10.3)

Sample gene list data:

Dictyostelium sample gene list S. cerevisiae sample gene list

Installation

The GOAT package requires an R installation (1.81 or higher). For the Tk GUI to work, you will also need the Tcl/Tk library installed. Here are some instructions about how to install the Tcl/Tk library:

  1. For Microsoft Windows, the current R distribution is bundled with the Tcl/Tk library;
  2. For linux users, you need to contact your system administrator for the library installation, and in some cases you will need to reinstall R after the library is installed so that R is configured properly.
  3. For Mac OS X users, X11 should be installed on your system first, then install both Tcl/Tk for X11 and Tcl/Tk for Aqua .

To install the GOAT package:

(1). In Windows, on the R menu, go to "Packages" -> "install packages from local zip files..." and select the downloaded "GOAT.zip".

(2). In linux console, type "R CMD INSTALL GOAT_2.0_linux.tar.gz" as root. After installation, change the folder "data" in the installed GOAT library (usually /usr/lib/R/library/GOAT/data) to be writable for regular user to download the ontology and annotation data.

(3). In Mac OS X, on the R menu, go to "Packages" -> "install from local files" -> "source directory" and select the decompressed GOAT directory.

Running the package

GOAT package is distributed with no ontology and annotation data. The first time the package is loaded, it will download these data from the GO Consortium website. For the input gene list, GOAT by default uses gene IDs from each organism's standard database, which is the second column in the standard annotation file: DB_Object_ID (Additional details are available at the GO website). For example, Dictyotelium genes will use the dictyBase IDs, Drosophila genes will use flyBase IDs, yeast genes will use SGD IDs etc. If your genes are not in this format, you need to convert them to this format or let GOAT changes default to your format as long as it's listed in the annotation as well (see the end of this section for detail).

Here is a typical running session:
> library(GOAT)
# load the library
# It will check the writability of the "data" folder, if not writable, a warning is issued, 
# and you have to make the change before continuing the analysis.

> initGOAT()
# now you'll see the GUI
#The following is a screenshot of the Tk GUI:

# next you select the target gene list: choose the file containing the gene Ids 
# (click the "Browse" button) or type in the R object containing the genes.
# The R object should be a vector containing the gene Ids. Gene Ids are by 
# default taking the format same as the 2nd column in the standard annotation files.
# next you select the reference gene list: select "whole genome" or choose 
# gene list file or R object.
# next select the organism you are working on.
# click "Analyze" button to start the enrichment analysis.
# After it's done, plots will be generated for each ontology.
# The first time you run GOAT to analyze gene list from a certain organism, GOAT 
# will take an extra step to download the annotation file for the organism, parse it
# and traverse the GO structures to generate GO-gene mapping files so that in the
# future run, the analysis can be done very quick.
# Now you can save the displayed images, for example, the image displayed in device 2:
> dev.set(2)
> dev.print(png, file="anywhere/anyname.png", height=800, width=1000)
# GOAT saved all the enrichment analysis result in tabular format with one table for
# each ontology. In this example, they are atest.txt.proc for the biological process ontology,
# atest.txt.func for the molecular function ontology and atest.txt.comp for cellular component
# ontology. You can print them out and review:
> write.table(atest.txt.proc, file="atest.proc.xls", sep="\t",quote=F, col.names=F, row.names=F)
# Remeber that since the ontology is a DAG, some GO terms will show up multiple times in the
# output table, and probably with different levels.

The following is a figure for the Biological Process ontology of a Dictyostelium gene list:


The left panel shows the enrichment ratio and hierarchy. The numbers on the
left of the bars are GO node levels. The right panel shows information about
those GO nodes: number of gene in the target gene list (list), number of genes
in the reference gene list (total), the multiple-testing corrected p-value
(p-value) and the GO term (Annotation).

In case the Tcl/Tk library is not installed, you still can do the enrichment
analysis and updating without using the GUI. Here is a sample session:

> library(GOAT)
>
> # read in the target list
> mtarget <- read.table("/tmp/dicty.sample.txt", as.is=c(1))[,1]
>
> # enrichment analysis
> tmplist <- list()
> tmplist[[1]] <- mtarget
> tmplist[[2]] <- NA  # compare against the entire genome. or you can assign some other refrence list
> analyzeEnrichment(tmplist, "ddb", "mtarget")
> # it will return 3 objects: mtarget.proc, mtarget.func and mtarget.comp for
> # the three ontologies respectively.
>
> # now you can plot the returned result:
> plotGO(mtarget.proc)

If your genes are not in standard format, you can use GOAT to use your format as default. You need to find out which column in the annotation file is the same as in your gene list. For example, gene id like "YAR103C" is same as in column 11 of the yeast annotation file:

> library(GOAT)
> updateAnno("sgd", column=11)

Once this is done, you can use your gene ID format in the future.

Update

Simply select the organism and click the "Update" button in the Tk GUI. GOAT will automatically download the current version of ontology flat files and annotation files from the GO website and parse them. If you don't select any organisms, only the ontologies will be updated.

AffyMetrix Chips

With the release of GOAT 2.0, we have added support for AffyMetrix chips. Here is a sample running session:

> library(GOAT)
> initGOAT(chip="affy")
> # now just input your target Affy array id list, choose your chip and
do analysis as described in the "Running the package" section.

One thing to note is that for Affy chips, there's any extra step after done with the enrichment analysis, which is to associate the Affy ids in your list as well as gene ids to each GO term. So in the resulted tabular output for each ontology, there's an extra column showing the Affy ids.

For now, GOAT only support the AffyMetrix Mouse 430 v2 chip. Other chips supported depending user demand. Requests can be sent to Q. Xu. Update: Support for the Human hgu133 plus v2 chip has been added!


Contact: Q. Xu
Last updated on March 08, 2005
 
Functional Genomics Home | BCM Home | Contact