Journal of Hebei Medical University ›› 2024, Vol. 45 ›› Issue (2): 165-171.doi: 10.3969/j.issn.1007-3205.2024.02.008

Previous Articles     Next Articles

Integrated identification of the chemokine-related key genes underlying the progression of nonalcoholic steatohepatitis via bioinformatics and machine learning

  

  1. 1.Department of Gastroenterology, Liuzhou People′s Hospital Affiliated to Guangxi Medical 
    University, Liuzhou 545006, China; 2.Department of Infectious Diseases, Liuzhou People′s 
    Hospital Affiliated to Guangxi Medical University, Liuzhou 545006, China

  • Online:2024-02-25 Published:2024-02-06

Abstract: Objective To integratedly identify the chemokine-related  key genes underlying the progression of nonalcoholic steatohepatitis (NASH) via bioinformatics and machine learning. 
Methods The differentially expressed genes (DEGs) after download of NASH datasets GSE49541 from public database the Gene Expression Omnibus (GEO) were identified via R studio software. Further, the Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed. The DEGs and chemokine-related gene sets were intersected to identity the differentially expressed chemokine-related genes. Identification of the key genes was applied via machine learning LASSO regression and support vector machines-recursive feature elimination (SVM-RFE). The key gene interaction network was  established via the GeneMANIA database. Then the key gene nomogram models in prediction were constructed and the effectiveness of nomograms was validated by receiver operator characteristic (ROC) curve. 
Results A total of 148 DEGs were identified. GO and KEGG analyses revealed that DEGs were mainly enriched in fatty acid  metabolic process, chemokine signaling pathway, and extracellular matrix. Moreover, four key genes, including CCL19, CD24, ROBO1, and SLC12A2, were identified, and a key gene interaction network diagram was constructed. Based on the key genes, a NASH nomogram prediction model was established, with the area under the ROC curve (AUC) of 997 and 95% confidence interval (CI) of 0.988-1.000.
Conclusion CCL19, CD24, ROBO1, and SLC12A2 might be closely related to the occurrence and development of NASH, and are expected to become potential targets for its early diagnosis and precise treatment. 


Key words: nonalcoholic steatohepatitis, bioinformatics, machine learning, chemokine