Somatic mutational signatures (MSs) identified by genome sequencing play important roles in exploring the cause and development of cancer. Thus far, many such signatures have been identified, and some of them do imply causes of cancer. However, a major bottleneck is that we do not know the potential meanings (i.e. carcinogenesis or biological functions) and contributing genes for most of them. Here, we presented a computational framework, Gene Somatic Genome Pattern (GSGP), which can decipher the molecular mechanisms of the MSs. More importantly, it is the first time that the GSGP is able to process MSs from ribonucleic acid (RNA) sequencing, which greatly extended the applications of both MS analysis and RNA sequencing (RNAseq). As a result, GSGP analyses match consistently with previous reports and identify the etiologies for a number of novel signatures. Notably, we applied GSGP to RNAseq data and revealed an RNA-derived MS involved in deficient deoxyribonucleic acid mismatch repair and microsatellite instability in colorectal cancer. Researchers can perform customized GSGP analysis using the web tools or scripts we provide.
