[关键词]
[摘要]
目的:采用生物信息学方法探索结肠癌组织中与焦亡相关的基因,并探讨其与预后的关系,为结肠癌患者提供新的治疗靶点。方法:分别从TCGA 数据库、GEO 数据库中下载结肠癌患者的基因表达、转录数据及临床数据。利用R 软件提取出TCGA转录数据中细胞焦亡基因的表达量,并找到差异表达基因,构建差异表达基因的蛋白互作网络。采用单因素分析、聚类分析将基因进行分型,比较两种亚型之间生存差异,得到预后相关基因。然后通过Lasso 回归分析、交叉验证及优化,得到基因系数(Coef 系数),构建一种结肠癌预后的预测模型。根据该预测模型计算出TCGA样本的中位风险得分,将样本分为高、低风险组。以GEO样本作为验证组,分别对TCGA、GEO样本进行生存分析(Kaplan-Meier 分析)、绘制ROC 曲线、绘制风险曲线、PCA和t-SNE 分析。结合模型中的风险评分,分别采用单因素及多因素分析来寻找结肠癌患者的独立预后因素。对高、低风险组进行GO和KEGG分析。最后行ssGSEA 分析,对每个样本进行免疫细胞及免疫相关功能打分,得到高、低风险组之间免疫细胞及免疫 细胞相关功能的差异。结果:共鉴定了52个焦亡基因在结肠癌及正常结肠组织中的表达,筛选出40个差异基因。通过Cox回归和Lasso 回归分析,构建了一个基于15 个基因的结肠癌预后风险预测模型,并将结肠癌患者分为高、低风险两组,两组之间生存有明显差异(P<0.001)。根据预测模型计算出TCGA样本的风险评分,并得到的中位风险评分,利用GEO数据库结肠癌患者进行验证,结果显示高低风险组之间生存率存在明显差异(P=0.013)。发现预测模型计算出的风险评分是预测结肠癌患者生存的独立预后因素。对差异基因进行GO富集分析、KEGG富集分析、ssGSEA 分析结果显示,高风险组患者免疫细胞浸润明显减少。结论:通过生物学信息方法构建了一个基于15个基因的结肠癌患者预后风险预测模型,这些基因在结肠癌免疫中也发挥重要作用。
[Key word]
[Abstract]
Objective: To screen the genes related to pyroptosis in colon cancer (CC) tissues by bioinformatics approach and to explore their relationship with patient prognosis to provide new therapeutic targets for CC patients. Methods: Gene expression, transcriptional data and clinical data of CC patients were downloaded from TCGA database and GEO database, respectively. R software was used to extract the expression of pyroptosis-related genes in TCGA transcription data, and the differentially expressed genes (DEGs) were screened out to construct a protein interaction network of the DEGs. The genes were typed by univariate analysis and cluster analysis, and the survival differences between the two subtypes were compared to obtain prognosis-related genes. Then, through Lasso regression analysis, cross-validation and optimization, the gene coefficients (Coef) were obtained to construct a prognosis prediction model for CC. The median risk score of the TCGA samples was calculated according to the prediction model, and the samples were divided into high-and-low risk groups. The GEO samples were used as the validation group, and survival analysis (Kaplan-Meier analysis), ROC curve, risk curve, PCA, and t-SNE analysis were performed on TCGA and GEO samples, respectively. Combined with the risk scores in the model, univariate and multivariate analyses were conducted to find the independent prognostic factors for colon cancer patients. The GO and KEGG analyses were then performed for the high-and-low risk groups. Finally, by ssGSEA analysis, immune cells and immune-related functions were scored for each sample to obtain the difference in immune cells and immune cell-related functions between the high-and-low risk groups. Results: A total of 52 pyroptosis genes were identified in colon cancer and normal colon tissues, and 40 DEGs were selected. A prognostic risk prediction model for colon cancer based on 15 genes was constructed by Cox regression and Lasso regression analysis, and the colon cancer patients were divided into high-and-low risk groups, with significant differences in survival between the two groups (P<0.001). The risk score of TCGA samples was calculated according to the prediction model, and the obtained median risk score was verified using the GEO database, which showed a significant difference in survival between high-and-low risk groups (P=0.013). The risk score calculated by the prediction model was found to be an independent prognostic factor for predicting the survival of colon cancer patients. The GO enrichment analysis, KEGG enrichment analysis, and ssGSEA analysis of the DEGs showed a significant reduction in immune cell infiltration in the high-risk group of patients. Conclusion: A prognostic risk prediction model for colon cancer patients based on 15 genes by a bioinformatic approach was constructed. These genes also play an important role in colon cancer immunity.
[中图分类号]
[基金项目]