Download files for database

These databases can be used freely in academic and non-profit research organization. Researchers at other institutions please contact us before downloading.

Arabidopsis thaliana

1.       All gene sets

a.  Excel format with detailed information

b.  GSEA .gmt format

2.       Gene sets .gmt format by categories

a.  Co-expression

b.  Gene Ontology

c.  KEGG, AraCyc, Plant Ontology, miRNA, TF-targets, and Computational

Mus musculus (mouse)

1.       All gene sets

a.       Excel format with detailed information

b.       GSEA .GMT format in gene symbol

c.        GMT format in Mouse Genome Informatics (MGI) ID

d.       GMT format in Entrez ID

e.        GMT format in Ensembl ID

2.       Gene sets in .gmt format by categories

a.       Co-expression

b.       Gene Ontology

c.        Curated Pathway

d.       Metabolic

e.        TF targets

f.        miRNA targets

g.        Location

h.       Others

i.         New:TF binding from Chip-Seq

3.       Information:

a.       Summary of sources

b.       BioDBcore description

c.        Public annotation databases

d.       Original literatures for lists of differentially expressed genes

Saccharomyces cerevisiae (yeast)

1.       All gene sets

a.       Excel format with detailed information

b.       GSEA .gmt format

2.       Gene sets with .gmt format by categories

a.       Co-expression

b.       Gene Ontology

Bos taurus (cow)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Caenorhabditis elegans (nematode)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Canis familiaris (dog)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Danio rerio (zebrafish)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Drosophila melanogaster (fruit-fly)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Equus ferus caballus (horse)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Felis catus (cat)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Gallus gallus (chicken)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Homo sapiens (human)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Oryctolagus cuniculus (rabbit)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Pan troglodytes (chimpanzee)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Rattus norvegicus (brown rat)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Sus scrofa (pig)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Xenopus tropicalis (frog)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Dictyostelium discoideum (slime mold)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Leishmania major (aleppo boil)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Plasmodium falciparum (malarial parasite)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Escherichia coli k12 (intestinal bacteria)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Brachypodium distachyon (purple false brome)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Glycine max (soybean)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Hordeum vulgare (barley)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Oryza sativa (Asian rice)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Oryza sativa indica (Indica group rice)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Physcomitrella patens (moss)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Populus trichocarpa (black cottonwood)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Solanum lycopersicum (tomoto)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Solanum tuberosum (potato)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Sorghum bicolor (sorghum grass)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Vitis vinifera (grape vine)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Zea mays (corn)

1.       All gene sets (only Gene Ontology)

a. Excel format with detailed information

b. GSEA .gmt format

Hints on how to run gene set enrichment analysis (GSEA):

1.  Download GSEA program from Broad institute.

2.  Download gene sets from links above.

3.  Load your expression data into GSEA. You can follow the GSEA documentation on various ways to do so. For a simple experiment, like the 24hr cold-treated plants vs. control in GSE5534, I have found it is best to convert the data to a RNK (Ranked list) file. Based on statistical analysis such as a T-test, I eliminate genes that are not significantly different between experiment and control. I often use less selective cutoffs for P values or Q value in filtering, so that I have a few hundred to a few thousand genes. The genes are then ranked by fold change and the file saved in a text file with .rnk as extension in file name. This is the file that I used in the paper. Note that runing GSEA using RNK files needs to be initiated from GSEA main menu Tools GseaPreranked.

4.  Another essential file is a .chip file that maps probe IDs to gene symbols. This information needs to be downloaded from microarray manufacture. This file is for Affymetrix ATH1 chip.

Update log:

 

1.  8/20/2012: Updated literature gene sets to include additional gene sets from ArrayExpress.

2.  8/27/2012: Corrected duplicate entry in literature gene sets LIT_SAKUMA_FLUORESCENCE-1H-DROUGHT-STRESS/CONTROL_DN. This causes GSEA to have errors. The two gene sets are renamed. Literature gene sets and all gene sets and Excel file updated.

3.  8/19/2013: Updated the database AraPath (Database in Arabidopsis). There are a total of 4,774 updated gene sets, including 1,426 literature gene sets from GEO and ArrayExpress and 3,348 Gene Ontology gene sets.

4.  9/8/2013: Added the new database GSKB (Gene Set Knowledgebase in Mouse), which includes a total of 42,056 gene sets of Mouse.

5.  9/11/2013: Added the new database in Yeast, which includes a total of 8,457 gene sets of Yeast.

6.  9/12/2013: Added the new database (only Gene Ontology gene sets) of 30 species including Bos taurus (cow), Caenorhabditis elegans (nematode), Canis familiaris (dog), Danio rerio (zebrafish), Drosophila melanogaster (fruit-fly), Equus ferus caballus (horse), Felis catus (cat), Gallus gallus (chicken), Homo sapiens (human), Oryctolagus cuniculus (rabbit), Pan troglodytes (chimpanzee), Rattus norvegicus (brown-rat), Sus scrofa (pig), Xenopus tropicalis (frog), Dictyostelium discoideum (slime mold), Leishmania major (aleppo-boil), Plasmodium falciparum (malarial parasite), Escherichia coli k12 (intestinal bacteria), Brachypodium distachyon (purple false brome), Glycine max (soybean), Hordeum vulgare (barley), Oryza sativa (Asian rice), Oryza sativa indica (Indica group rice), Physcomitrella patens (moss), Populus trichocarpa (black cottonwood), Solanum lycopersicum (tomoto), Solanum tuberosum (potato), Sorghum bicolor (sorghum grass), Vitis vinifera (grape vine), and Zea mays (corn).

Please feel free to email us at Xijin.Ge@sdstate.edu for questions.

Home