Bioinformatic analyses of protein sequences play an important role in the discovery and subsequent safety assessment of insect control proteins in Genetically Modified (GM) crops. Due to the rapid adoption of high-throughput sequencing methods over the last decade, the number of protein sequences in GenBank and other public databases has increased dramatically. Many of these protein sequences are the product of whole genome sequencing efforts, coupled with automated protein sequence prediction and annotation pipelines. Published genome sequencing studies provide a rich and expanding foundation of new source organisms and proteins for insect control or other desirable traits in GM products. However, data generated by automated pipelines can also confound regulatory safety assessments that employ bioinformatics. Largely this issue does not arise due to underlying sequence, but rather its annotation or associated metadata, and the downstream integration of that data into existing repositories. Observations made during bioinformatic safety assessments are described.
