Features of Genetic Sigma Factors Learning Model Simulating Neural Network
Life Sciences -Bio informatics
DOI:
https://doi.org/10.22376/ijlpr.2023.13.5.L267-L273Keywords:
sigma factors, motif prediction, LSTM, DNA binding motif, pattern learningAbstract
Sigma factors play a crucial role in the gene regulation process, which bind with RNA polymerase to unwind the genesequence by identifying the recognition of transcription starting motif pattern, a combination of nucleobases (Adenine(A),Cytosine(C), Guanine(G), Thymine(T)). The advancements in DNA analysis are useful for the geneticist to learn differentpatterns from different perspectives for the identification of mutations in genetic structures, new organisms ranging fromunicellular to multicellular, and useful for creating new gene patterns to get relief from hereditary diseases. To meet all thesechallenging needs, the proposed research work aimed to predict the DNA motif patterns of various sigma factors. Thus themain objective is to create a novel method named "Features of Genetic Sigma Factors Learning Model simulating NeuralNetwork" to predict the prefix motif patterns of major sigma factors such as sigma 70, sigma 32, sigma 24, sigma 19, and sigma38(σ70, σ32, σ24, σ19, and σ38). Each of the sigma factors possesses significant functionality, like vegetative growth for thedevelopment of nutrients. In the proposed model, the novel idea is a generation of a dictionary of DNA motifs that mimics then-gram of natural language processing. The proposed model is trained to feed the DNA motif dictionary, which consists ofpositive and negative motif patterns. The model is tested by an array of K-mer motif patterns taken from the whole E.Colibacterial genome, downloaded from the NCBI website (Escherichia coli str. K-12 substr. MG1655, complete genomeACCESSION: NC_000913). The experimental results of the proposed model yielded 100% accuracy. The model's outcome isa set of patterns that are highly helpful to experts in the biological fields to identify new gene patterns.
References
Liu D, Xiong X, Dasgupta B, Zhang H. Motif discoveries in unaligned molecular sequences using self-organizing neural networks. IEEE Trans Neural Netw. 2006 Jul 1;17(4):919-28. doi: 10.1109/TNN.2006.875987 ((ISBN: 1045- 9227)). PMID 16856655.
Hu S, Ma R, Wang H. An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences. PLOS ONE. 2019 Nov 14;14(11):e0225317. doi: 10.1371/journal.pone.0225317, PMID 31725778.
Dekhtyar M, Morin A, Sakanyan V. Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes. BMC Bioinformatics. 2008 Dec;9:233. doi: 10.1186/1471-2105-9-233, PMID 18471287.
Available from: https://www.sciencedirect.com/topics/neuroscience/sigma- factor#:~:text=The%20seven%20sigma%20factors%20of,that%20are%20encoded%20by %20bacteriophage.
Htike ZZ, Win SL. Recognition of promoters in DNA sequences using weightily averaged one-dependence estimators. Procedia Comput Sci. 2013 Jan 1;23:60-7. doi: 10.1016/j.procs.2013.10.009.
Maynou J, Pairó E, Marco S, Perera A. Sequence information gain-based motif analysis. BMC Bioinformatics. 2015 Dec;16(1):377. doi: 10.1186/s12859-015-0811-x, PMID 26553056.
Gunasekaran H, Ramalakshmi K, Rex Macedo Arokiaraj A, Deepa Kanmani S, Venkatesan C, Suresh Gnana Dhas C. Analysis of DNA sequence classification using CNN and hybrid models. Comp Math Methods Med. 2021 Jul 15;2021:1835056. doi: 10.1155/2021/1835056, PMID 34306171.
Lopez-Rincon A, Tonda A, Mendoza-Maldonado L, Mulders DGJC, Molenkamp R, Perez-Romero CA et al. Classification and specific primer design for accurate detection of SARS-CoV-2 using deep learning. Sci Rep. 2021 Jan 13;11(1):947. doi: 10.1038/s41598-020-80363-5, PMID 33441822.
Jang B, Kim M, Harerimana G, Kang SU, Kim JW. Bi-LSTM model to increase accuracy in text classification: combining Word2vec CNN and attention mechanism. Appl Sci. 2020 Aug 24;10(17):5841. doi: 10.3390/app10175841.
Shadab S, Alam Khan MT, Neezi NA, Adilina S, Shatabdi S. DeepDBP: deep neural networks for identifying DNA-binding proteins. Inform Med Unlocked. 2020 Jan 1;19:100318. doi: 10.1016/j.imu.2020.100318.
Busia A, Dahl GE, Fannjiang C, Alexander DH, Dorfman E, Poplin R, et al. A deep learning approach to pattern recognition for short DNA sequences. bioRxiv. 2018 Jun 22:353474. doi: 10.1101/353474 Corpus ID: 90436562.
Al-Ajlan A, El Allali A. CNN-MGP: convolutional neural networks for metagenomics gene prediction. Interdiscip Sci Comp Life Sci. 2019 Dec;11(4):628-35. doi: 10.1007/s12539-018-0313-4, PMID 30588558.
Mughees N, Mohsin SA, Mughees A, Mughees A. Deep sequence to sequence Bi-LSTM neural networks for day-ahead peak load forecasting. Expert Syst Appl. 2021 Aug 1;175:114844. doi: 10.1016/j.eswa.2021.114844.
Sakalle A, Tomar P, Bhardwaj H, Acharya D, Bhardwaj A. An LSTM-based deep learning network for recognizing emotions using a wireless brainwave-driven system. Expert Syst Appl. 2021 Jul 1;173:114516. doi: 10.1016/j.eswa.2020.114516.
Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modeling techniques for genomics. Nat Rev Genet. 2019 Jul;20(7):389-403. doi: 10.1038/s41576-019-0122-6, PMID 30971806.
Zeng H, Edwards MD, Liu G, Gifford DK. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics. 2016 Jun 15;32(12):i121-7. doi: 10.1093/bioinformatics/btw255, PMID 27307608.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014 Jan 1;15(1):1929-58.
Abbass MM, Bahig HM. An efficient algorithm to identify DNA motifs. Math Comput Sci. 2013 Dec;7(4):387-99. doi: 10.1007/s11786-013-0165-6.
Syed Z, Stultz C, Kellis M, Indyk P, Guttag J. Motif discovery in physiological datasets: a methodology for inferring predictive elements. ACM Trans Knowl Discov Data (TKDD). 2010 Jan 18;4(1):1-23. doi: 10.1145/1644873.1644875.
Eetemadi A, Tagkopoulos I. Genetic Neural Networks: an artificial neural network architecture for capturing gene expression relationships. Bioinformatics. Jul 1, 2019;35(13):2226-34. doi: 10.1093/bioinformatics/bty945, PMID 30452523.
Wang H, Li C, Zhang J, Wang J, Ma Y, Lian Y. A new LSTM-based gene expression prediction model: L-GEPM. J Bioinform Comp Biol. 2019 Aug 29;17(04):195002.
Dizaji KG, Chen W, Huang H. Deep large-scale multitask learning network for gene expression inference. J Comput Biol. 2021 May 1;28(5):485-500. doi: 10.1089/cmb.2020.0438, PMID 34014778.
Published
How to Cite
Issue
Section
Copyright (c) 2023 Sasikala S, Dr. Ratha Jeyalakshmi T

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.