20 June 2025: Database Analysis
Machine Learning-Driven Discovery of TRIM Genes as Diagnostic Biomarkers for Idiopathic Pulmonary Fibrosis
Xiangfei Huang CEFG 1, Wen Yu BCE 1, Fuzhou Hua E 2, Aiping Wei E 1, Xifeng Wang AE 1*, Shibiao Chen AEG 1
DOI: 10.12659/MSM.948510
Med Sci Monit 2025; 31:e948510
Abstract
BACKGROUND: Idiopathic pulmonary fibrosis (IPF) is a progressive lung disease with limited effective treatments and significant challenges in early diagnosis. Identifying reliable biomarkers is crucial for improving diagnostic accuracy and patient outcomes.
MATERIAL AND METHODS: We analyzed TRIM family gene expression in IPF patients and healthy controls using GSE93606, GSE33566, and GSE38958 datasets. Consensus clustering and WGCNA identified IPF subtypes and hub genes. Machine learning models (RF, GLM, SVM, XGB) were built to identify key disease genes. A nomogram for clinical prediction was developed and validated. Peripheral blood samples from IPF patients and healthy controls were used to validate gene expression via qPCR.
RESULTS: TRIM family genes were significantly differentially expressed between IPF patients and healthy controls. Two distinct IPF subtypes (C1 and C2) were identified, each exhibiting unique biological functions and signaling pathways. The RF model outperformed other machine learning models, identifying TNIK, NCL, ROPN1L, MTR, and HNRNPH1 as key disease-characteristic genes. The nomogram demonstrated good predictive accuracy (AUC: 0.741, 95% CI: 0.556-0.897). qPCR validation confirmed increased expression of 4 genes in IPF patients, except for ROPN1L, which showed decreased expression.
CONCLUSIONS: This study identifies and validates TRIM family genes as potential biomarkers for IPF diagnosis using clinical samples. The findings support the integration of these biomarkers into diagnostic workflows, potentially enhancing early diagnosis and personalized treatment strategies for IPF patients. Further research is needed to explore the prognostic value and underlying mechanisms of these genes.
Keywords: biomarkers, machine learning, nomograms, Pulmonary Fibrosis, Tripartite Motif Proteins, Humans, idiopathic pulmonary fibrosis, Gene Expression Profiling, Male, Female, Case-Control Studies, Databases, Genetic
Introduction
Idiopathic pulmonary fibrosis (IPF) is a progressive, chronic, and ultimately fatal interstitial lung disease of unknown etiology that primarily affects elderly people [1]. Clinically, IPF is characterized by relentless scarring of the lung parenchyma, leading to irreversible decline in pulmonary function and debilitating symptoms such as chronic dry cough, fatigue, dyspnea, and chest discomfort [2]. From a public health perspective, IPF imposes a substantial burden: the global prevalence ranges from 2 to 8 per 10 000 individuals [3], and its incidence continues to rise with population aging. Economically, IPF contributes significantly to healthcare costs due to frequent hospitalizations, prolonged diagnostic delays, and long-term disease management [4]. Currently, only 2 antifibrotic drugs – pirfenidone and nintedanib – are approved for IPF treatment. While they can decelerate disease progression, they neither halt nor reverse fibrosis, and the prognosis remains poor [5]. These limitations underscore the urgent need for early diagnostic biomarkers and innovative therapeutic targets that can alter the disease trajectory.
Recent studies have highlighted the potential involvement of tripartite motif-containing (TRIM) proteins in fibrotic diseases [6–8]. As a large family of E3 ubiquitin ligases, TRIM proteins play critical roles in autophagy, apoptosis, apoptosis, innate immunity, and cellular homeostasis [9,10]. Intriguingly, several TRIM family members have been implicated in fibrosis through modulation of key pro-fibrotic pathways such as transforming growth factor-beta (TGF-β), nuclear factor kappa B (NF-κB), and Wnt/β-catenin signaling [11,12]. However, the precise roles of TRIM family proteins in the pathogenesis of IPF remain largely unexplored. Most existing research focused on their function in tumors and immune responses [13,14], leaving a significant knowledge gap in understanding how these proteins contribute to lung fibrogenesis.
The present study systematically investigated the expression landscape of TRIM family genes in peripheral blood samples from patients with IPF. We utilized consensus clustering and weighted gene co-expression network analysis (WGCNA) to identify hub genes associated with IPF. Machine learning models, including random forest (RF), generalized linear models (GLM), support vector machines (SVM), and extreme gradient boosting (XGB), were employed to identify disease-characteristic genes. These core genes were subsequently integrated into a nomogram model for IPF prediction, with performance validated in independent datasets. Finally, their expression patterns were experimentally confirmed using quantitative PCR, and correlations with clinical parameters were analyzed.
This study identifies TRIM-related genes significantly associated with IPF through comprehensive bioinformatics and machine learning analyses. These candidate biomarkers hold the potential for early diagnosis and clinical prediction. The results offer a foundation for developing TRIM-focused diagnostic strategies and novel therapeutic interventions.
Data Collection
IDENTIFICATION AND CORRELATION ANALYSIS OF DIFFERENTIALLY-EXPRESSED TRIM FAMILY-RELATED GENES (DETGS):
Differentially-expressed genes (DEGs) were identified using the Wilcoxon rank-sum test in R, with a significance threshold of log2|FC| >1.0 and a
CONSENSUS CLUSTERING ANALYSIS OF PATIENTS WITH IPF BASED ON DETGS:
Using the DETGs and gene expression profile from GSE93606, we conducted consensus clustering analysis with the “ConsensusClusterPlus” package, resulting in the identification of 2 clusters. Principal component analysis (PCA) was then performed using the “limma” package to assess the effectiveness of clustering. Subsequently, gene cohort variation analysis (GSVA) was conducted with the “GSVA” package to discern differences in signal pathways and biological functions between these 2 clusters.
WGCNA:
Using the gene expression data from IPF patients and healthy donors in the training cohort, along with the 2 clusters obtained from consensus clustering analysis, we conducted WGCNA using the “WGCNA” package. This analysis aimed to identify hub genes associated with IPF. Subsequently, the hub genes identified in the training cohort and the 2 clusters were intersected and visualized using a Venn diagram.
CONSTRUCTION OF MACHINE LEARNING MODEL:
We constructed 4 machine learning models – RF, SVM, XGB, and GLM, – based on the common hub genes identified through WGCNA. The complete parameter settings for each model were as follows:
RF was implemented using the caret package (method=“rf”) with default settings. SVM used a radial basis kernel (method=“vmRadial”) with probability outputs enabled (prob.model=TRUE), while the cost parameter (C) and sigma were automatically tuned by caret. XGB was configured with the DART booster (method=“xgbDART”) and default hyperparameters. GLM was implemented as logistic regression (method=“glm”, family=“binomial”) with default regularization.
We used the variable_importance() function from the DALEX package to evaluate and rank genes based on their relative contributions to predictive performance, yielding gene importance scores for all 4 models. The best-performing model was selected based on residual analysis and the area under the receiver operating characteristic (ROC) curve (AUC). Finally, the 5 most important genes identified in the best model were designated as disease-characteristic genes.
CONSTRUCTION AND VALIDATION OF NOMOGRAM:
A nomogram is a graphical calculating device that provides a simple way to obtain a numerical probability of a clinical event based on individual patient data. By incorporating relevant variables into a visual representation, nomograms help clinicians make informed decisions about patient treatment. Thus, the gene expression data of the disease-characteristic genes from GSE93606 were utilized to establish a nomogram using the “rms” package. A calibration curve was plotted to evaluate the relationship between the prediction model and the actual risk, while decision curve analysis (DCA) was used to assess the net benefit of the prediction model. Additionally, gene expression data and clinical information extracted from GSE33566 were used to test the effectiveness of the nomogram, and ROC analysis was performed to assess the accuracy of the constructed prediction model.
CORRELATION ANALYSIS BETWEEN DISEASE-CHARACTERISTIC GENES AND CLINICAL FEATURES:
Clinical information on IPF patients was extracted from GSE93606. The correlation between disease-characteristic genes and clinical features, such as the diffusing capacity of the lung for carbon monoxide (DLco), forced vital capacity (FVC), and survival time of IPF patients, was calculated using GraphPad Prism 9.0.
VALIDATION OF GENE EXPRESSION USING QPCR:
We further validated the expression levels of the selected genes (TNIK, NCL, ROPN1L, MTR, HNRNPH1) using qPCR in healthy controls and clinical IPF patients. Total RNA extraction from peripheral blood samples was followed by cDNA synthesis via reverse transcription. qPCR was carried out with gene-specific primers (Table 1), and the housekeeping gene GAPDH was used as an internal control. Expression levels were quantified using the 2−Δ ΔCt method. The study was approved by the Ethics Committee of the First Affiliated Hospital of Nanchang University (Ethics Approval No. IIT 2024356).
STATISTICAL ANALYSIS:
All statistical analyses were conducted using R software version 4.2.0 and GraphPad Prism 9.0. Machine-learning analysis was performed using RF, SVM, XGBoost (DART version), and GLM in R (v4.2.0). The dataset was randomly split into training (70%) and testing (30%) sets with stratified sampling to maintain class distribution. All models were trained using 5-fold cross-validation. RF used default parameters, SVM used a radial basis function kernel, XGBoost used the DART algorithm with dropout, and GLM was configured with binomial family for binary classification. Model performance was evaluated using the DALEX package. Probability thresholds were set at 0.5 for binary classification, and all analyses used a fixed random seed (123) for reproducibility. In the validation section, we used two-tailed independent-samples
Results
DIFFERENTIALLY-EXPRESSED TRIM FAMILY GENES (DETGS) IN IPF PATIENTS AND CORRELATION ANALYSIS:
Using the gene expression data from dataset GSE93606, we conducted a screening of differentially-expressed TRIM family genes (DETGs) between patients with IPF and healthy donors. Figure 1A presents the box plot illustrating the expression levels of these DETGs, highlighting 14 genes that exhibited statistically significant differences. Furthermore, the heat map in Figure 1B provides a visual representation of the expression patterns of these DETGs.
In our correlation analysis, we observed a positive correlation between the expression of TRIM27 and TRIM33 within the identified DETGs (r=0.244, P=0.0023). Interestingly, these 2 genes showed predominantly negative correlations with other DETGs, as depicted in Figure 1C.
CLUSTER ANALYSIS OF PATIENTS WITH IPF:
We conducted a consensus clustering analysis to identify subtypes of IPF based on the expression of DETGs. Using the gene expression data from GSE93606, patients with IPF were stratified into 2 distinct clusters, labeled C1 and C2 (Figure 2A). Additionally, PCA was performed to demonstrate the distinguishability of patients with IPF based on DETGs, as shown in Figure 2B. We observed significant differences in the expression of DETGs between clusters C1 and C2, illustrated through box plots and heat maps (Figure 2C, 2D), showing 12 genes that exhibited statistically significant differences.
DIFFERENCES IN SIGNAL PATHWAYS AND BIOLOGICAL FUNCTIONS OF C1 AND C2 CLUSTERS:
We used GSVA to enrich the biological functions and signal pathways associated with differentially-expressed genes between clusters C1 and C2. Figure 3A illustrates that the biological functions enriched in C1 were primarily related to T cell differentiation, neuronal signal transduction, and cardiac muscle cell fate commitment. Biological functions enriched in C2 were mainly associated with protein-targeting mitochondrion, methyltransferase complex, and cellular nitrogen compound catabolic process.
Regarding signal pathways, Figure 3B shows that pathways enriched in C1 include tyrosine metabolism, melanogenesis, and phosphatidylinositol signaling. In contrast, pathways enriched in C2 were ribosome, primary immunodeficiency, and RNA polymerase intestinal immune network for IgA production.
IDENTIFICATION OF HUB GENES RELATED TO IPF AND SUBTYPES:
WGCNA was used to identify hub genes associated with IPF and its identified subtypes separately. Blue modules were selected in the IPF/Control group (Figure 4A), while turquoise modules were selected in the C1/C2 group based on the highest correlation coefficient and the most significant P values (Figure 4B). In our analysis, 18 common hub genes were identified between the 2 WGCNAs, as depicted in the Venn diagram (Figure 4C). These genes are likely associated with TRIM family gene expression patterns and the pathophysiology of IPF.
CONSTRUCTION AND ESTIMATION OF MACHINE LEARNING MODELS:
We constructed RF, GLM, SVM, and XGB models using the expression data of common hub genes obtained. The training cohort consisted of 70% of the identified hub genes, while the remaining 30% constituted the validation cohort for each model. After evaluating the residual, reverse cumulative distributions of the residuals (Figure 5A), and the AUC of the ROC for each model, the RF model was selected for further analysis (Figure 5B).
We used the variable importance function from the DALEX package to evaluate and rank genes based on their relative contributions to predictive performance, yielding gene importance scores for all 4 models. The results showed that TNIK, NCL, ROPN1L, MTR, and HNRNPH1 were the top 5 disease-associated genes (Figure 5C). ROC curves were generated to assess the diagnostic potential of these hub genes specifically for predicting IPF. The results demonstrated that all 5 hub genes had significant discriminatory power, indicating their relevance as biomarkers for IPF (Figure 5D). This highlights their potential role in understanding disease mechanisms and guiding therapeutic strategies.
THE NOMOGRAM OF THE DISEASE-CHARACTERISTIC GENES:
The 5 disease-characteristic genes obtained were used for the construction of the nomogram (Figure 6A). The calibration curve suggested a good consistency between the observed and predicted results (Figure 6B). Decision curve analysis (DCA) is a valuable tool for assessing the clinical utility of predictive models by comparing the net benefits of different decision-making strategies within specific threshold probabilities. Our results indicate that the model offers a greater net benefit compared to treating all patients or treating none, highlighting its practical application in clinical decision-making for IPF (Figure 6C). Additionally, the gene expression data extracted from the dataset GSE33566 were used to test the sensitivity and specificity of the constructed prediction model. The ROC analysis yielded an AUC of 0.831 (95% CI: 0.683–0.955, P=0.017), indicating the strong predictive capability of the model (Figure 6D). This suggests that our model can effectively differentiate between IPF patients and healthy controls, providing a valuable tool for early diagnosis.
DISEASE-CHARACTERISTIC GENES AND THE CLINICAL FEATURES:
Based on the clinical data of the patients with IPF extracted from GSE93606, we calculated the correlation between the identified 5 disease-characteristic genes and the clinical features of the patients with IPF. The expression of TNIK was found to significantly correlate to the DLco% predicted, FVC% predicted, and survival time of patients with IPF, and the expression of HNRNPH1 was correlated to the FVC% predicted of the patients with IPF, yet the correlation between the other 3 disease-characteristic genes and the clinical features of the patients with IPF was not statistically significant (Table 2). These findings suggest that TNIK and HNRNPH1 may play more pivotal roles in the progression of IPF and could serve as potential prognostic biomarkers.
VALIDATION OF THE DISEASE-CHARACTERISTIC GENES IN IPF PATIENTS AND HEALTHY CONTROLS:
The expression levels of the 5 identified hub genes were validated in 2 external datasets, GSE33566 and GSE38958. The results showed that, except for ROPN1L, which exhibited increased expression in IPF patients, the other 4 hub genes had decreased expression (Figure 7A, 7B). Additionally, we collected CT imaging data and peripheral blood samples from patients clinically diagnosed with IPF and healthy controls. Compared to the healthy controls, CT images of IPF patients showed significant pulmonary tissue damage, characterized by reticular opacities, linear or nodular opacities, and irregularly shaped cystic spaces with well-defined walls, indicating areas of lost gas exchange function within the lungs (Figure 7C). Total RNA was extracted from peripheral blood samples, and qPCR experiments were conducted to validate the expression levels of the selected disease-characteristic genes. Consistent with the computational analysis, the qPCR results showed an upward trend in the expression of 4 genes, except for ROPN1L, which exhibited decreased expression in IPF patients (Figure 7D). These qPCR validation results further support the potential of these genes as biomarkers for IPF diagnosis.
Discussion
IPF is a global health concern that places a substantial burden on healthcare systems, leading to increased expenditures both before and after diagnosis [17]. Although IPF has been recognized since the 1940s [18], its pathogenesis remains incompletely understood due to its complex and multifactorial nature. Accumulating evidence indicates that repeated micro-injuries to the alveolar epithelium serve as key initiating events in the disease process [19]. These injuries elicit aberrant wound-healing responses, which activate resident fibroblasts [20] and promote the transdifferentiation of epithelial cells into mesenchymal cells via epithelial-mesenchymal transition (EMT) [21,22]. A hallmark of IPF is the accumulation of activated myofibroblasts, primarily driven by TGF-β signaling. This pathway not only facilitates fibroblast-to-myofibroblast differentiation and excessive extracellular matrix (ECM) deposition through EMT [23], but also expands the myofibroblast population by recruiting and activating other cellular sources, such as endothelial cells undergoing endothelial-to-mesenchymal transition (EndoMT) and circulating fibrocytes [24].
In addition, dysregulated TGF-β signaling interacts with other key pathways – including Wnt/β-catenin [25,26], PI3K/AKT [27–29], and MAPK [30,31] – to form a complex signaling network that perpetuates fibrogenesis. Despite major advances in understanding these molecular pathways, the precise mechanisms that govern the onset and progression of IPF remain elusive. As a result, there is an urgent need to identify reliable molecular biomarkers that can facilitate early diagnosis, enable risk stratification, and guide therapeutic decision-making.
With the advent of artificial intelligence and the use of big data, establishing machine learning models for disease diagnosis and prognosis prediction has emerged as a promising direction [32]. Researchers have identified various diagnostic and prognostic biomarkers for IPF, including fibroblast foci [33], serum lipid metabolites [34], plasma matrix metalloproteinase [35], and peripheral blood transcriptomic signatures over the years [15]. However, there is still a lack of robust and easily accessible biomarkers. Given the significant roles of TRIM family genes in IPF [11], their screening as characteristic genes holds promise. Zhou et al [36] recently analyzed TRIM family gene profiles in bronchoalveolar lavage (BAL) cells, identifying 4 DETGs – TRIM7, MEFV, TRIM45, and TRIM47 – that were used to establish a prognostic signature.
To improve clinical applicability, we focused on TRIM gene expression in peripheral blood. Using consensus clustering based on differentially expressed TRIM genes (DETGs), we classified samples into 2 molecular subtypes. We then used weighted gene co-expression network analysis (WGCNA) to identify the gene modules most closely associated with these subtypes, which likely reflect TRIM gene expression patterns. In parallel, WGCNA was also applied to differentiate IPF samples from controls. The intersecting genes from both analyses may therefore be central regulators of IPF that are also linked to TRIM family gene expression. Subsequently, we applied machine learning algorithms to identify the 5 core genes with the highest importance scores, which were used to construct a nomogram predictive of IPF. External validation using an independent dataset confirmed the model’s strong predictive power, highlighting its potential utility for early screening and diagnosis.
Furthermore, we examined correlations between the 5 core genes and clinical characteristics to evaluate their prognostic value. TNIK expression was significantly correlated with diffusion capacity for carbon monoxide (DLco% predicted), forced vital capacity (FVC% predicted), and overall survival time in patients with IPF. HNRNPH1 expression was also significantly correlated with FVC% predicted. These findings suggest that TNIK and HNRNPH1, among the identified genes, play particularly important roles in IPF progression and clinical outcomes.
Among the 5 identified genes, TNIK has been the most extensively studied in the context of IPF. Recently, Ren et al [37] identified TNIK as a promising target for IPF treatment using AI-driven approaches. Their study demonstrated that a TNIK inhibitor not only mitigated bleomycin (BLM)-induced pulmonary fibrosis but also reduced lipopolysaccharide (LPS)-induced lung inflammation in mice. The underlying mechanism may involve the inhibition of EMT and fibroblast-to-myofibroblast transition (FMT) signaling. Notably, they reported that the TNIK inhibitor INS018_055 is currently being evaluated in phase II clinical trials for IPF [38].
Nucleolin, encoded by the NCL gene, is a protein that has been implicated in promoting the proliferation and migration of heat-denatured human dermal fibroblasts, likely by enhancing TGF-β signaling [39]. Additionally, nucleolin has been reported to mediate the increased synthesis of collagen prolyl 4-hydroxylase (C-P4H) in human HT1080 fibroblasts, a key enzyme in collagen synthesis [40]. Nucleolin has been shown to facilitate the proliferation and fibrosis of mesangial cells in a diabetic nephropathy model [41], and it is also involved in silica-induced pulmonary fibrosis [42]. These findings collectively suggest that nucleolin plays a significant role in fibrotic processes across different cell types and pathological conditions.
Regarding HNRNPH1, while direct evidence linking it to fibrosis is lacking, Milosevic et al [43] reported a rapid increase in its cytosolic localization alongside a decrease in nuclear localization in A549, a human lung adenocarcinoma cell line with alveolar epithelial characteristics, in response to TGF-β1. This observation suggests that HNRNPH1 plays a role in TGF-β signaling, but further validation is needed to confirm this hypothesis. Additionally, the roles of MTR and ROPN1L in IPF remain largely unknown, warranting further investigation.
Despite these findings, several limitations should be acknowledged. Although our analyses suggest associations between the identified hub genes and TRIM family expression patterns, the specific regulatory relationships remain undefined. TRIM proteins typically function as E3 ubiquitin ligases, mediating protein degradation through the ubiquitin-proteasome system (UPS) [44]. Thus, changes in the expression of the identified genes may result from indirect regulatory effects rather than direct interactions. Further studies are needed to elucidate these molecular mechanisms. In addition, the use of GEO datasets entails inherent limitations, including relatively small sample sizes and the lack of detailed clinical information such as comorbidities and longitudinal follow-up, which restricts comprehensive interpretation. Although peripheral blood is an accessible sample type for biomarker screening, it may not fully reflect gene expression changes occurring within lung tissue – the primary site of fibrosis. Further mechanistic investigations are also essential to explore the therapeutic potential of these candidate genes in IPF.
Conclusions
This study provides evidence that 5 TRIM family-related genes – TNIK, NCL, ROPN1L, MTR, and HNRNPH1 – are promising biomarkers for early diagnosis of IPF. The machine learning models developed in this study, particularly the random forest model, demonstrated strong diagnostic performance, offering the potential for improved IPF detection in clinical practice. The nomogram based on these biomarkers presents a novel and non-invasive approach to assist clinicians in the early diagnosis of IPF, potentially leading to better patient outcomes through timely intervention.
The clinical application of these findings could significantly enhance the accuracy of IPF screening and improve early detection, enabling more effective treatment of the disease. While peripheral blood biomarkers hold great promise, further validation in larger, more diverse patient populations is essential to confirm their diagnostic utility and applicability across different stages of IPF. Additionally, investigating the expression of these biomarkers in lung tissue will provide a better understanding of their relevance in the fibrotic processes occurring in the lungs.
Future research should focus on exploring the underlying mechanisms through which these TRIM family-related genes influence IPF progression and evaluating their prognostic value in predicting disease outcomes. Longitudinal studies will be crucial to assess the potential of these biomarkers in monitoring disease progression. Furthermore, combining these biomarkers with other diagnostic tools could offer more comprehensive insights into the early detection and management of IPF. Ultimately, translating these findings into clinical practice may provide valuable tools for personalized treatment strategies and improved patient care in IPF.
Figures
Figure 1. Differential expression of TRIM family genes (DETGs) between patients with IPF and healthy donors, and correlations among DETGs. (A) Box plot illustrating the expression levels of DETGs. (B) Heat map visualizing the expression patterns of DETGs. (C) Correlations between each DETG. * P≤0.05; ** P≤0.01; *** P≤0.001. The figures were created with R software version 4.2.0.
Figure 2. Clustering of patients with IPF based on DETG expression. (A) Clustering of patients with IPF into distinct clusters C1 and C2. (B) Principal component analysis demonstrates the effectiveness of clustering. (C) Box plot illustrating the expression of DETGs in clusters C1 and C2. (D) Heat map visualizing the expression patterns of DETGs in clusters C1 and C2. * P≤0.05; ** P≤0.01; *** P≤0.001. The figures were created with R software version 4.2.0.
Figure 3. Enrichment analysis of signal pathways and biological functions between clusters C1 and C2. (A) Enriched biological functions in clusters C1 and C2. (B) Enriched signal pathways in clusters C1 and C2. The figures were created with R software version 4.2.0.
Figure 4. Identification of hub genes related to IPF and subtypes. (A) WGCNA of control and IPF patients. (B) WGCNA of subtypes C1 and C2. (C) There were 18 common hub genes identified using the Venn diagram. The figures were created with R software version 4.2.0.
Figure 5. Construction and validation of machine learning models (RF, GLM, SVM, and XGB) and evaluation of gene importance. (A) Residuals of the 4 models. (B) Evaluation of machine learning models using ROC curves. (C) Importance scores of disease-characteristic genes in the RF model. (D) ROC curve of HNRNPH1, MTR, ROPN1L, NCL, and TNIK. The figures were created with R software version 4.2.0. and GraphPad Prism 9.0.
Figure 6. Construction and assessment of the nomogram model. (A) Construction of a nomogram model with disease-characteristic genes. (B) Calibration curve of the nomogram model. (C) Decision curve analysis of the nomogram model. (D) Evaluation of the nomogram model using ROC curves (AUC: 0.741, 95% C1: 0.556–0.897). The figures were created with R software version 4.2.0.
Figure 7. Validation of hub genes in external datasets and clinical samples. (A) The expression levels of the hub genes in GSE33566. (B) The expression levels of the hub genes in GSE38958. (C) The representative CT images of healthy controls and IPF patients. (D) The relative mRNA expression of disease-characteristic genes. n=3 in each group. Data are presented as the mean±SD. Statistical significance was determined using a two-tailed t test. * P<0.05, ** P<0.01, *** P<0.001. The figures were created with GraphPad Prism 9.0. References
1. Podolanczuk AJ, Thomson CC, Remy-Jardin M, Idiopathic pulmonary fibrosis: State of the art for 2023: Eur Respir J, 2023; 61(4); 2200957
2. Bajwah S, Higginson IJ, Ross JR, Specialist palliative care is more than drugs: A retrospective study of ILD patients: Lung, 2012; 190(2); 215-20
3. Bonella F, Spagnolo P, Ryerson C, Current and future treatment landscape for idiopathic pulmonary fibrosis: Drugs, 2023; 83(17); 1581-93
4. Cox IA, de Graaff B, Ahmed H, The economic burden of idiopathic pulmonary fibrosis in Australia: A cost of illness study: Eur J Health Econ, 2023; 24(7); 1121-39
5. Maher TM, Interstitial lung disease: A review: JAMA, 2024; 331(19); 1655-65
6. Jian J, Liu Y, Zheng Q, The E3 ubiquitin ligase TRIM39 modulates renal fibrosis induced by unilateral ureteral obstruction through regulating proteasomal degradation of PRDX3: Cell Death Discov, 2024; 10(1); 17
7. McNab FW, Rajsbaum R, Stoye JP, O’Garra A, Tripartite-motif proteins and innate immune regulation: Curr Opin Immunol, 2011; 23(1); 46-56
8. Hatakeyama S, TRIM family proteins: Roles in autophagy, immunity, and carcinogenesis: Trends Biochem Sci, 2017; 42(4); 297-311
9. Jiang T, Xia Y, Li Y, TRIM29 promotes antitumor immunity through enhancing IGF2BP1 ubiquitination and subsequent PD-L1 downregulation in gastric cancer: Cancer Lett, 2024; 581; 216510
10. Ahsan N, Shariq M, Surolia A, Multipronged regulation of autophagy and apoptosis: emerging role of TRIM proteins: Cell Mol Biol Lett, 2024; 29(1); 13
11. Huang X, Yu W, Wei A, Beyond tumors: The pivotal role of TRIM proteins in chronic non-tumor lung diseases: J Inflamm Res, 2025; 18; 1899-910
12. Li Q, Yan J, Mao A-P, Tripartite motif 8 (TRIM8) modulates TNFα- and IL-1β-triggered NF-κB activation by targeting TAK1 for K63-linked polyubiquitination: Proc Natl Acad Sci USA, 2011; 108(48); 19341-46
13. Bai X, Tang J, TRIM proteins in breast cancer: Function and mechanism: Biochem Biophys Res Commun, 2023; 640; 26-31
14. Vadon C, Magiera MM, Cimarelli A, TRIM proteins and antiviral microtubule reorganization: A novel component in innate immune responses?: Viruses, 2024; 16(8); 1328
15. Yang IV, Luna LG, Cotter J, The peripheral blood transcriptome identifies the presence and extent of disease in idiopathic pulmonary fibrosis: PLoS One, 2012; 7(6); e37708
16. Prasse A, Binder H, Schupp JC, BAL cell gene expression is indicative of outcome and airway basal cell involvement in idiopathic pulmonary fibrosis: Am J Respir Crit Care Med, 2019; 199(5); 622-30
17. Tarride JE, Hopkins RB, Burke N, Clinical and economic burden of idiopathic pulmonary fibrosis in Quebec, Canada: Clinicoecon Outcomes Res, 2018; 10; 127-37
18. Robbins LL, Idiopathic pulmonary fibrosis; Roentgenologic findings: Radiology, 1948; 51(4); 459-67
19. Spagnolo P, Kropski JA, Jones MG, Idiopathic pulmonary fibrosis: Disease mechanisms and drug development: Pharmacol Ther, 2021; 222; 107798
20. Chanda D, Otoupalova E, Smith SR, Developmental pathways in the pathogenesis of lung fibrosis: Mol Aspects Med, 2019; 65; 56-69
21. Phan THG, Paliogiannis P, Nasrallah GK, Emerging cellular and molecular determinants of idiopathic pulmonary fibrosis: Cell Mol Life Sci, 2021; 78(5); 2031-57
22. Goldmann T, Zissel G, Watz H, Human alveolar epithelial cells type II are capable of TGFβ-dependent epithelial-mesenchymal-transition and collagen-synthesis: Respir Res, 2018; 19(1); 138
23. Willis BC, Borok Z, TGF-beta-induced EMT: mechanisms and implications for fibrotic lung disease: Am J Physiol Lung Cell Mol Physiol, 2007; 293(3); L525-L34
24. Ortiz-Zapater E, Signes-Costa J, Montero P, Roger I, Lung fibrosis and fibrosis in the lungs: Is it all about myofibroblasts?: Biomedicines, 2022; 10(6); 1423
25. Shi C, Chen X, Yin W, Wnt8b regulates myofibroblast differentiation of lung-resident mesenchymal stem cells via the activation of Wnt/β-catenin signaling in pulmonary fibrogenesis: Differentiation, 2022; 125; 35-44
26. Wang J, Li K, Hao D, Pulmonary fibrosis: pathogenesis and therapeutic strategies: MedComm (2020), 2024; 5(10); e744
27. Wang J, Hu K, Cai X, Targeting PI3K/AKT signaling for treatment of idiopathic pulmonary fibrosis: Acta Pharm Sin B, 2022; 12(1); 18-32
28. Pan L, Cheng Y, Yang W, Nintedanib ameliorates bleomycin-induced pulmonary fibrosis, inflammation, apoptosis, and oxidative stress by modulating PI3K/Akt/mTOR pathway in mice: Inflammation, 2023; 46(4); 1531-42
29. Huang G, Yang X, Yu Q, Overexpression of STX11 alleviates pulmonary fibrosis by inhibiting fibroblast activation via the PI3K/AKT/mTOR pathway: Signal Transduct Target Ther, 2024; 9(1); 306
30. Lee JH, Massagué J, TGF-β in developmental and fibrogenic EMTs: Semin Cancer Biol, 2022; 86(Pt 2); 136-45
31. Huang L, Yang X, Feng Y, ShaShen-MaiDong decoction attenuates bleomycin-induced pulmonary fibrosis by inhibiting TGF-β/smad3, AKT/MAPK, and YAP/TAZ pathways: J Ethnopharmacol, 2025; 337(Pt 1); 118755
32. Scott IA, Zuccon G, The new paradigm in machine learning – foundation models, large language models and beyond: A primer for physicians: Intern Med J, 2024; 54(5); 705-15
33. Mäkelä K, Mäyränpää MI, Sihvo HK, Artificial intelligence identifies inflammation and confirms fibroblast foci as prognostic tissue biomarkers in idiopathic pulmonary fibrosis. Hum Pathol: Jan, 2021; 107; 58-68
34. Yang XH, Wang FF, Chi XS, Disturbance of serum lipid metabolites and potential biomarkers in the Bleomycin model of pulmonary fibrosis in young mice: BMC Pulm Med, 2022; 22(1); 176
35. Rosas IO, Richards TJ, Konishi K, MMP1 and MMP7 as potential peripheral blood biomarkers in idiopathic pulmonary fibrosis: PLoS Med, 2008; 5(4); e93
36. Zhou M, Ouyang J, Zhang G, Zhu X, Prognostic value of tripartite motif (TRIM) family gene signature from bronchoalveolar lavage cells in idiopathic pulmonary fibrosis: BMC Pulm Med, 2022; 22(1); 467
37. Ren F, Aliper A, Chen J, A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models: Nat Biotechnol, 2024; 43(1); 63-75
38. Aladinskiy V, Kruse C, Qin L, Discovery of Bis-imidazolecarboxamide derivatives as novel, potent, and selective TNIK inhibitors for the treatment of idiopathic pulmonary fibrosis: J Med Chem, 2024; 67(21); 19121-42
39. Jiang B, Li Y, Liang P, Nucleolin enhances the proliferation and migration of heat-denatured human dermal fibroblasts: Wound Repair Regen, 2015; 23(6); 807-18
40. Fähling M, Mrowka R, Steege A, Translational control of collagen prolyl 4-hydroxylase-alpha(I) gene expression under hypoxia: J Biol Chem, 2006; 281(36); 26089-101
41. Wang S, Chen X, Wang M, Long non-coding RNA CYP4B1-PS1-001 inhibits proliferation and fibrosis in diabetic nephropathy by interacting with nucleolin: Cell Physiol Biochem, 2018; 49(6); 2174-87
42. Zhou Q, Guan Y, Hou R, PolyG mitigates silica-induced pulmonary fibrosis by inhibiting nucleolin and regulating DNA damage repair pathway: Biomed Pharmacother, 2020; 125; 109953
43. Milosevic J, Bulau P, Mortz E, Eickelberg O, Subcellular fractionation of TGF-beta1-stimulated lung epithelial cells: A novel proteomic approach for identifying signaling intermediates: Proteomics, 2009; 9(5); 1230-40
44. Kiss L, Rhinesmith T, Luptak J, Trim-Away ubiquitinates and degrades lysine-less and N-terminally acetylated substrates: Nat Commun, 2023; 14(1); 2160
Figures
Figure 1. Differential expression of TRIM family genes (DETGs) between patients with IPF and healthy donors, and correlations among DETGs. (A) Box plot illustrating the expression levels of DETGs. (B) Heat map visualizing the expression patterns of DETGs. (C) Correlations between each DETG. * P≤0.05; ** P≤0.01; *** P≤0.001. The figures were created with R software version 4.2.0.
Figure 2. Clustering of patients with IPF based on DETG expression. (A) Clustering of patients with IPF into distinct clusters C1 and C2. (B) Principal component analysis demonstrates the effectiveness of clustering. (C) Box plot illustrating the expression of DETGs in clusters C1 and C2. (D) Heat map visualizing the expression patterns of DETGs in clusters C1 and C2. * P≤0.05; ** P≤0.01; *** P≤0.001. The figures were created with R software version 4.2.0.
Figure 3. Enrichment analysis of signal pathways and biological functions between clusters C1 and C2. (A) Enriched biological functions in clusters C1 and C2. (B) Enriched signal pathways in clusters C1 and C2. The figures were created with R software version 4.2.0.
Figure 4. Identification of hub genes related to IPF and subtypes. (A) WGCNA of control and IPF patients. (B) WGCNA of subtypes C1 and C2. (C) There were 18 common hub genes identified using the Venn diagram. The figures were created with R software version 4.2.0.
Figure 5. Construction and validation of machine learning models (RF, GLM, SVM, and XGB) and evaluation of gene importance. (A) Residuals of the 4 models. (B) Evaluation of machine learning models using ROC curves. (C) Importance scores of disease-characteristic genes in the RF model. (D) ROC curve of HNRNPH1, MTR, ROPN1L, NCL, and TNIK. The figures were created with R software version 4.2.0. and GraphPad Prism 9.0.
Figure 6. Construction and assessment of the nomogram model. (A) Construction of a nomogram model with disease-characteristic genes. (B) Calibration curve of the nomogram model. (C) Decision curve analysis of the nomogram model. (D) Evaluation of the nomogram model using ROC curves (AUC: 0.741, 95% C1: 0.556–0.897). The figures were created with R software version 4.2.0.
Figure 7. Validation of hub genes in external datasets and clinical samples. (A) The expression levels of the hub genes in GSE33566. (B) The expression levels of the hub genes in GSE38958. (C) The representative CT images of healthy controls and IPF patients. (D) The relative mRNA expression of disease-characteristic genes. n=3 in each group. Data are presented as the mean±SD. Statistical significance was determined using a two-tailed t test. * P<0.05, ** P<0.01, *** P<0.001. The figures were created with GraphPad Prism 9.0. In Press
Clinical Research
Institutional and Regional Variations in Access to Clinical Trials and Next-Generation Sequencing in Turkis...Med Sci Monit In Press; DOI: 10.12659/MSM.951027
Clinical Research
Low-Intensity Blood Flow-Restricted Multi-Joint Exercise Improves Muscle Function in Patients With Patellof...Med Sci Monit In Press; DOI: 10.12659/MSM.950516
Review article
Musculoskeletal Ultrasound and MRI in the Evaluation of Chemotherapy-Induced Peripheral Neuropathy: A ReviewMed Sci Monit In Press; DOI: 10.12659/MSM.951283
Clinical Research
Sensory Processing, Dissociation, and Affective Symptoms in Misophonia: A Cross-Sectional Study of 35 AdultsMed Sci Monit In Press; DOI: 10.12659/MSM.950938
Most Viewed Current Articles
17 Jan 2024 : Review article 10,187,196
Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron VariantDOI :10.12659/MSM.942799
Med Sci Monit 2024; 30:e942799
13 Nov 2021 : Clinical Research 3,708,487
Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...DOI :10.12659/MSM.932788
Med Sci Monit 2021; 27:e932788
14 Dec 2022 : Clinical Research 2,341,643
Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase LevelsDOI :10.12659/MSM.937990
Med Sci Monit 2022; 28:e937990
16 May 2023 : Clinical Research 706,524
Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...DOI :10.12659/MSM.940387
Med Sci Monit 2023; 29:e940387








