Search our database (version 4) of validated splicing mutations
Populate search field with a sample mutation (ATM), a sample mutation (TP53), or a pair of variants (natural, cryptic) in linkage disequalibrium
(Optional) Enter a gene name and click "Populate fields" to populate range search fields based on the specified gene
A peer-reviewed manuscript describing this resource is available through F1000Research
Shirley BC, Mucaki EJ, Rogan PK. Pan-cancer repository of validated natural and cryptic mRNA splicing mutations F1000Research 2019, 7:1908 (https://doi.org/10.12688/f1000research.17204.3)
Information Theory is proven to accurately predict the impact a mutation has on mRNA splicing, and has been used to interpret coding and non-coding mutations that alter mRNA splicing in both common and rare diseases (Caminsky et al., 2014; Burke et al., 2018; Dos Santos et al., 2018; Yang et al., 2017; Caminsky et al., 2016; Mucaki et al., 2016; Peterlongo et al., 2014; Mucaki et al., 2013; Mucaki et al., 2011; Rogan et al., 2003; Rogan et al., 1998; Rogan and Schneider, 1995). Mucaki et al. (2016) describes an information-theory based framework for the interpretation and prioritization of non-coding variants of uncertain significance, which has been applied in multiple studies involving novel variants in cancer patients (Burke et al., 2018; Dos Santos et al., 2018; Caminsky et al., 2016; Mucaki et al., 2016). The Information-Theory based software used in these studies is available in the bioinformatic suite MutationForecaster (www.mutationforecaster.com). This resource includes the Shannon Pipeline (which can quickly analyze millions of variants for their impact on mRNA splicing; Shirley et al., 2013) and Veridical (which validates Shannon Pipeline output by analyzing RNAseq BAM files for mutated splicing in patients with the variant of interest; Viner et al., 2014 and Dorman et al., 2014).
We examined variants obtained through The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) from thousands of patients. Using the Shannon Pipeline and Veridical, we analyzed over 209 million variants (>168 million TCGA variants in 33 cancer types, >41 million ICGC variants in 7 cancer types) and validated 341,486 variants for their direct impact on mRNA splicing. The user can query the resource for a mutation (or a range of coordinates containing mutations). This resource provides the user with the expected change in splice site strength caused by the variant of interest, and the observed mutant splicing event (i.e. sequencing reads which indicate cryptic site use). For some mutations, an image of the RNAseq data for the region of interest is provided. Expression levels of the gene containing the mutation can also be toggled for different tissues.
The following is a breakdown of all unique splicing mutations found in each TCGA tumor type (variants unique in a per-tissue basis; novel or < 1% of population [dbSNP150] are counted):
Tumor Type | Tumor Description | # Unique Variants |
---|---|---|
TCGA-ACC | Adrenocortical Carcinoma | 1717 |
TCGA-LAML | Acute Myeloid Leukemia | 19503 |
TCGA-BRCA | Bladder Urothelial Carcinoma | 24181 |
TCGA-BLCA | Breast Invasive Carcinoma | 9865 |
TCGA-CESC | Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma | 25822 |
TCGA-CHOL | Cholangiocarcinoma | 9817 |
TCGA-COAD | Colon Adenocarcinoma | 7512 |
TCGA-DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | 6036 |
TCGA-ESCA | Esophageal Carcinoma | 19361 |
TCGA-GBM | Glioblastoma Multiforme | 935 |
TCGA-HNSC | Head and Neck Squamous Cell Carcinoma | 2840 |
TCGA-KICH | Kidney Chromophobe | 26519 |
TCGA-KIRC | Kidney Renal Clear Cell Carcinoma | 6711 |
TCGA-KIRP | Kidney Renal Papillary Cell Carcinoma | 4892 |
TCGA-LGG | Brain Lower Grade Glioma | 1346 |
TCGA-LIHC | Liver Hepatocellular Carcinoma | 12461 |
TCGA-LUAD | Lung Adenocarcinoma | 18262 |
TCGA-LUSC | Lung Squamous Cell Carcinoma | 2628 |
TCGA-MESO | Mesothelioma | 303 |
TCGA-OV | Ovarian Serous Cystadenocarcinoma | 88136 |
TCGA-PAAD | Pancreatic Adenocarcinoma | 1585 |
TCGA-PCPG | Pheochromocytoma and Paraganglioma | 90 |
TCGA-PRAD | Prostate Adenocarcinoma | 944 |
TCGA-READ | Rectum Adenocarcinoma | 3083 |
TCGA-SARC | Sarcoma | 20024 |
TCGA-SKCM | Skin Cutaneous Melanoma | 12515 |
TCGA-STAD | Stomach Adenocarcinoma | 20245 |
TCGA-TGCT | Testicular Germ Cell Tumors | 467 |
TCGA-THCA | Thyroid carcinoma | 56962 |
TCGA-THYM | Thymoma | 16599 |
TCGA-UCEC | Uterine Corpus Endometrial Carcinoma | 28524 |
TCGA-UCS | Uterine Carcinosarcoma | 10716 |
TCGA-UVM | Uveal Melanoma | 2498 |
ICGC-CLLE | Chronic Lymphocytic Leukemia | 2041 |
ICGC-ESAD | Esophageal Adenocarcinoma | 61 |
ICGC-LIRI | Liver Cancer | 2255 |
ICGC-MALY | Malignant Lymphoma | 2652 |
ICGC-OV | Ovarian Cancer | 2818 |
ICGC-PACA | Pancreatic Cancer Endocrine Neoplasms | 3182 |
ICGC-RECA | Renal Cell Cancer | 4255 |
This resource is not intended to be a comprehensive database of all human splicing mutations and the absence of a variant from the database should not be used to exclude the impact of a variant on splicing. While the variant(s) you have submitted may not be present, there is no substitute for appropriate laboratory testing and/or analyses of splicing mutation with robust bioinformatic methods. We recommend subscribing to MutationForecaster to evaluate the variants present in your genomic sequence for possible splicing effects.