geneMouse {GenomicFeatures}R Documentation

UCSC Gene Predictions for mm9

Description

Gene coordinates and annotations for M. musculus from UCSC. Coordinates are relative to the mm9 build and are in nucleotides from the 5' end of the positive "+" strand. Each “gene”, or row in the dataset, corresponds to a unique combination of transcript (TSS, TES and exons) and coding sequence (start and end).

Usage

data(geneMouse)

Format

A data frame with 49409 observations on the following 12 variables.

name
The name of the gene.
chrom
The name of the chromosome the gene is located on.
strand
The strand the gene is coded on, "+", or "-".
txStart
Transcription start site.
txEnd
Transcription stop site.
cdsStart
Start position of the coding sequence.
cdsEnd
End position of the coding sequence.
exonCount
The number of exons.
exonStarts
A comma separated list of the exon start positions.
exonEnds
A comma separated list of exon stop positions.
proteinID
An ID for the protein produced, missing values are coded as NA
.
alignID
Unique identifier of each gene and RNA alignment pair, apparently redundant with name.

Details

For genes coded on the negative strand the txStart is really the end, and similarly for the coding regions.

Source

This table was taken directly from the knownGene table in the UCSC database for mm9, see http://genome.ucsc.edu/cgi-bin/hgTables and Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known Genes. Bioinformatics. 2006 May 1;22(9):1036-46.

Examples

data(geneMouse)
str(geneMouse)
transcripts(geneMouse)

[Package GenomicFeatures version 0.0.9 Index]