Membrane protein classification is a key to inferring the function of uncharacterized membrane protein. To get around the time-consuming and expensive biochemical experiments in the wet lab, there has been a lot of research focusing on developing fast and reliable bioinformatics or computer modeling methods for membrane protein prediction. However, most research is inclined to incorporate as many types of protein data as possible, yet in many cases, the number of accessible protein data types is quite limited. To solve this challenge, a channel attention adapted deep learning model that takes the position-specific scoring matrix (PSSM) as only input and its simplified version without channel attention have been developed. They are named SE-BLTCNN and BLTCNN, respectively (the abbreviations for “SE embedded BiLSTM-TextCNN” and “plain BiLSTM-TextCNN”). The basic ideais to embed the Squeeze-and-Excitation (SE) block into the architecture of the text convolutional neural network (textCNN) and combine the resulting architecture with the bidirectional long short-term memory (BiLSTM) layer. An ablation experiment is also conducted to verify the effectiveness of using BiLSTM to extract high-level features from PSSM. On the benchmark sample set, the BLTCNN can achieve average precision as high as 96.2% and turns out to be state-of-the-art membrane protein type predictors based solely on PSSM data to the best of our knowledge; the SE-BLTCNN is second-best among all comparison methods, at a bit lower average precision of 95.7%. In addition, through empirical research, the excessive zero-padding in the training examples has been pinpointed as the major cause of performance loss of the SE-LTCNN; and by using the adjusted sample set, it has been confirmed that the SE-BLTCNN model outperforms the BLTCNN once this major cause is suppressed. The core code and dataset are available at https://github.com/Raymond-2017/membrane-protein-classifiction.
Copyright © 2022 Elsevier Ltd. All rights reserved.