10 October 2025: Special Reports
Machine Learning-Based Pathomics Signature for Perineural Invasion in Colorectal Cancer
Tianyi Pu ABE 1,2, Jiazheng Sun CD 1, Jian Yue CF 3, Zhi Zhang BD 2*, Hongzhong Li FG 1, Guosheng Ren FG 1
DOI: 10.12659/MSM.951110
Med Sci Monit 2025; 31:e951110
Abstract
BACKGROUND: Perineural invasion (PNI) is strongly associated with poor clinical outcomes in colorectal cancer (CRC). However, no machine learning diagnostic model based on pathomics has been established for PNI detection in CRC. To address this issue, we sought to construct a predictive model for PNI grounded in pathological features to enhance diagnostic efficiency.
MATERIAL AND METHODS: We analyzed hematoxylin and eosin–stained histopathological slides from the CRC tissues retrospectively. Segmentation of the acquired images was conducted via CellProfiler, an automated pipeline supporting the extraction of morphological features. To optimize feature selection, we applied the LASSO algorithm, followed by multiple machine learning models to develop diagnostic classifiers for PNI. Furthermore, we investigated the clinicopathological significance of PNI, including its association with T stage, lymph node metastasis, lymphovascular invasion, and molecular biomarkers.
RESULTS: We used 430 CRC surgical resection slides for training, testing, and external validation. A total of 615 histopathological features were extracted, and 10 of them were screened by LASSO to construct diagnostic models for PNI. The models demonstrated robust predictive performance across all cohorts. LightGBM achieved the highest diagnostic accuracy, yielding AUCs of 0.996 (95% CI: 0.991-1.000, training), 0.935 (95% CI: 0.888-0.978, testing), and 0.918 (95% CI: 0.861-0.967, external validation). Patients with CRC with PNI exhibited higher T stage, increased lymph node metastasis, and more frequent lymphovascular invasion.
CONCLUSIONS: The LightGBM model, based on histopathological features, can improve the diagnostic efficiency of PNI. CRC with PNI is associated with poor prognosis.
Keywords: Artificial Intelligence, Colorectal Neoplasms, Nerve Fibers, Pathology, Humans, machine learning, Male, Female, Neoplasm Invasiveness, Middle Aged, Retrospective Studies, Aged, Lymphatic Metastasis, Algorithms, Peripheral Nerves, Neoplasm Staging
Introduction
Recent data show colorectal cancer (CRC) has one of the highest incidence rates among cancers[1]. Notably, increasing attention has been directed toward perineural invasion (PNI) – a histopathological feature characterized by tumor cell infiltration, encirclement, or progression along nerve fibers – due to its clinical implications in the progression of CRC [2]. PNI correlates with increased risks of tumor recurrence and metastasis (particularly in head and neck, prostate, and pancreatic cancers) [3]. PNI involvement can indicate a need for more aggressive treatment, such as extended surgical margins or adjuvant chemoradiotherapy [4].
Histopathological examination remains the criterion standard for diagnosing PNI [5]. According to the current diagnostic criteria, PNI is pathologically described as either cancer encircling greater than or equal to one-third of the nerve circumference, or neoplastic infiltration into the endoneurium, with histologically confirmed malignant cells within the perineurial sheath [6]. However, pathological assessment of PNI presents notable diagnostic challenges due to the labor-intensive microscopic evaluation required and its inherent susceptibility to false-negative interpretations.
With recent advancements in high-resolution whole-slide imaging technology and declining costs for digital data storage, the comprehensive digitization of histopathological specimens has become feasible. Within this context, the emerging field of pathomics, which encompasses large-scale computational derivation of measurable traits in histopathological whole-slide scans followed by analysis using advanced algorithms for diagnostic or prognostic evaluation, has gained significant traction in precision medicine [7,8]. However, machine learning (ML) models based on pathomics features for predicting PNI in CRC remain unexplored to date. This study aimed to extract digital pathological features from hematoxylin and eosin (H&E)-stained pathological slides and use ML algorithms to establish diagnostic models of PNI that are able to increase the diagnostic efficiency and offer precise real-time pathological decision support.
Material and Methods
STUDY POPULATION:
This retrospective study was approved by the Ethics Committee of Chongqing Hospital of Jiangsu Province Hospital (Approval No. 2025009). We enrolled all patients who underwent surgical resection for CRC with complete clinicopathological data. The exclusion criteria were cases with unavailable slides or suboptimal staining quality. Pathological slides from January 2020 to May 2025 served to train the proposed framework, followed by cohort-based performance verification, while those from January 2018 to December 2019 served for external validation. Figure 1 shows a schematic representation of the analytical procedures employed.
ACQUISITION OF IMAGES:
H&E stained histological sections were generated from formalin-fixed paraffin-embedded tissues and scanned at 40× resolution to generate whole-slide imaging in.svs format. Two experienced gastrointestinal pathologists independently assessed PNI. Cases with disagreement underwent consensus review by both pathologists. Using QuPath (v0.5.1), regions of interest were annotated based on the following criteria: (1) exclusion of non-tissue (white background) areas, (2) optimal staining quality (defined by high transparency, distinct nuclear-cytoplasmic contrast, and absence of artifacts), and (3) representation of tumor histology. These regions of interest were then cropped into non-overlapping 512×512-pixel patches (PNG format) for downstream computational analysis. Quality control was performed by manually selecting 10 representative patches of interest per slide from the cropped images. The patch selection criteria were as follows: optimal image fidelity, presence of diagnostic tumor cells, and slides exhibiting PNI included unequivocal images capturing neurotropic involvement.
PATHOLOGICAL FEATURES EXTRACTION:
A multi-modular CellProfiler-based pipeline was developed to systematically segment images and extract discriminative pathological features [8]. We split the single H&E-stained image into separate hematoxylin- and eosin-stained images using the UnmixColors module and converted to them to gray scale images using the ColorToGray module. Afterward, the IdentifyPrimaryObjects module (object diameter range: 10–15 pixels) was used for image segmentation as well as subsequent nuclei detection [9]. A suite of CellProfiler modules – MeasureImageIntensity, MeasureImageQuality, MeasureGranularity, MeasureColocalization, MeasureTexture, MeasureObjectSizeShape, MeasureObjectIntensityDistribution, and MeasureObjectIntensity – was then applied for automated quantification of pathological features using the grayscale images, as well as the separated eosin and hematoxylin images and the identified objects, respectively. The results were exported in Excel format [9].
FEATURES SELECTION AND ML MODELS CONSTRUCTION:
Following the normalization of data, inter-feature collinearity was assessed using Pearson correlation analysis, retaining only 1 feature from any pair with a correlation coefficient exceeding 0.9. To further refine the feature set, LASSO regression was implemented, which uses an L1-penalty term to drive the coefficients of non-informative features to zero, thereby performing embedded feature selection. LASSO operates by balancing model simplicity (reduced feature count) and predictive accuracy (minimized mean squared error [MSE]). Through iterative penalty tuning, we identified an optimal regularization strength that retained only the most discriminative features (n=10) while maintaining a low MSE. Subsequently, 9 ML algorithms were implemented for classification modeling: Distance-based: K-Nearest Neighbors, Light Gradient Boosting Machine (LightGBM), Support Vector Machine, Logistic Regression, ExtraTrees, Random Forests, Gradient Boosting, eXtreme Gradient Boosting, and AdaBoost. We partitioned the cohort using stratified hold-out into training/test sets (7: 3), with independent external validation cohorts Table 1).
STATISTICAL ANALYSIS:
For continuous data exhibiting Gaussian distribution, statistical comparisons were performed with the 2-sample
Results
DATA SETS:
In total, 430 pathological slides from 430 patients with CRC were enrolled in this study, including 240 slides for model training, 102 slides for internal testing, and 88 slides from distinct time periods for external validation. In the training-testing cohort (mean age 65.82±13.95 years) and external validation cohort (64.55±10.93 years), the age difference was non-significant (
EXTRACTION AND SCREENING OF PATHOLOGICAL FEATURES:
Collectively, 615 pathological features for quantity were extracted from 4300 images using CellProfiler. The feature sets included 319 nuclear morphometric features, including texture, size, shape, granularity, and pixel intensity distribution, along with 296 global image features, including quality metrics, intensity histograms, co-localization indices, and inter-channel intensity correlations. For each slide (n=430), feature robustness was ensured by averaging values across 10 representative images. Following quality control (removal of 34 outliers/irrelevant features), 581 features were retained. Univariate screening via Pearson correlation (threshold: |r| >0.9) reduced the feature space to 381 candidates. Subsequently, the optimal λ value of 0.01868817 yielded the minimal MSE, with 10 non-zero features retained in the Lasso regression model (Figure 2).
ML-BASED PNI DIAGNOSTIC MODELS:
Most ML algorithms demonstrated relatively robust diagnostic performance across the training set (AUC: 0.8622–1.000), testing set (AUC: 0.8551–0.9337), and external validation cohort (AUC: 0.7787–0.9432). The LightGBM model showed the highest AUC values: 0.9963 (training), 0.9349 (testing), 0.9182 (validation). Detailed performance metrics of all models are summarized in Figure 3 and Table 1. Furthermore, histogram analysis and DCA confirmed the LightGBM model’s favorable calibration accuracy and clinical net benefit (Figure 4).
INTERPRETATION OF PERSONALIZED PREDICTIONS:
The SHAP (Shapley Additive Explanations) framework was applied for LightGBM model interpretability assessment. Critical determinants were identified and are shown in a ranking and summary plot (Figure 5), which demonstrates the relative predictive contributions of these variables. The analysis revealed that increased values of Hematoxylin_OrigGray_RWC.Coefficient, OrigGray_Hematoxylin_Manders.Coefficient, and OrigGray_Hematoxylin_RWC.Coefficient, as well as decreased values of OrigGray_Eosin_Manders.Coefficient (Costes) and Eosin_OrigGray_Slope, were significantly associated with a higher probability of PNI.
SHAP-based visualization displayed how the 6 leading continuous features influenced the LightGBM model output (Figure 6). Figure 6A, 6B, and 6D demonstrate concordant variation between SHAP values and the features Hematoxylin_OrigGray_RWC.Coefficient, OrigGray_Hematoxylin_Manders.Coefficient, and OrigGray_Hematoxylin_RWC.Coefficient, with threshold values of 0.4, 0.8, and 0.3, respectively. Beyond these thresholds, lower values of these features are associated with a reduced risk of PNI in patients with CRC. Figure 6C and 6E show that increasing values of OrigGray_Eosin_Manders.Coefficient (Costes) and Eosin_OrigGray_Slope have a negative impact on the model’s prediction. Additionally, Figure 6F reveals that OrigGray_Eosin_Manders.Coefficient values between 0.2 and 0.6 contribute positively to the model, while values outside this range are associated with a negative effect.
Hematoxylin_OrigGray_RWC.Coefficient, OrigGray_Hematoxylin_Manders, and OrigGray_Hematoxylin_RWC.Coefficient quantify the co-localization intensity between nuclei and collagen bundles or nerve fibers. Notably, nerve fibers exhibit denser staining than do typical collagen bundles. When the coefficient exceeds 0.4, the spatial coupling between tumor cell nuclei and nerve fibers is significantly increased. The features OrigGray_Eosin_Manders.Coefficient (Costes), Eosin_OrigGray_Slope, and OrigGray_Eosin_Manders.Coefficient reflect the cytoplasmic region (excluding nuclei); higher values of these metrics are associated with a reduced probability of perineural invasion.
ASSOCIATION OF PNI WITH CLINICOPATHOLOGICAL FEATURES IN CRC:
Although the ML model has improved the diagnostic efficiency of PNI in CRC, the precise impact of PNI on patient prognosis remains unclear, especially in the pathological context. We therefore conducted a comprehensive analysis to define the clinicopathological relationships among PNI, lymphovascular invasion, lymph node metastasis, and tumor invasion depth in colon cancer. Given the intestinal neural architecture, in which afferent pathways consist of sensory and parasympathetic fibers while efferent pathways consist of sympathetic/parasympathetic nerves converging at the mesenteric plexus, we investigated whether PNI facilitates deeper tumor spread along the nerves. Indeed, PNI-positive tumors demonstrated a significantly greater invasion depth (Figure 7A, Table 2), suggesting neural infiltration parallels local progression. Tumor cells capable of invading neural tissues demonstrate their ability to penetrate the outermost fibrous membrane of nerves, which suggests they may also have the capacity to infiltrate the fibrous outer layers of vascular structures and therefore exhibit lymphovascular invasion potential. Therefore, we investigated whether CRC patients with PNI also demonstrated increased lymphovascular invasion. Our findings revealed that CRC patients with PNI had nearly twice the incidence of lymphovascular invasion than did CRC patients without PNI (Figure 7B, Table 2). Once tumor cells infiltrate lymphatic vessels, consequent lymph node metastasis becomes more likely. Accordingly, we analyzed the correlation between PNI and lymph node metastasis, observing that CRC patients with PNI indeed exhibited higher rates of lymph node metastasis (Figure 7C, Table 2). While molecular markers (P53, Ki-67, mismatch repair) showed no PNI correlation (Figure 8A, 8B, 8E–8H, Table 2), HER2 overexpression (3+) exhibited a non-significant trend toward increased PNI (Figure 8C, 8D, Table 2).
Discussion
LIMITATIONS:
This study has several limitations that should be acknowledged. First, the retrospective design may have introduced inherent selection bias that could not be entirely eliminated. Second, all data were obtained from a single medical center with a relatively limited sample size, which can affect the generalizability of our findings. External validation using multicenter datasets acquired with diverse equipment is required to confirm the robustness of our model. Furthermore, although the current feature extraction framework demonstrated satisfactory discriminative performance, it is possible that some relevant features were not fully captured during the feature engineering process. Future prospective and multicenter studies are needed to strengthen the validity and generalizability of these findings. Despite these limitations, our results demonstrate promising clinical applicability and establish a solid foundation for future investigations.
In conclusion, we constructed a diagnostic model grounded in histopathological traits to identify PNI in CRC. The model improved diagnostic consistency and accuracy while significantly reducing pathologists’ interpretation time. Furthermore, our investigation clarified the associations between PNI and key pathological features, including tumor invasion depth, lymphovascular invasion, and lymph node metastasis, providing clinically relevant evidence for risk stratification and prognostic evaluation.
Conclusions
The LightGBM model improved the diagnostic efficiency of PNI in CRC, which is associated with deeper tumor infiltration, increased lymph node metastasis, and increased lymphovascular invasion. Our findings have the potential to enhance histopathologic detection of PNI and support precision therapy stratification.
References
1. Abdalwahab AR, Abdelhamed MA, Gad M, Prophylactic para-aortic lymph node dissection in Colo-rectal cancer; Pilot study: World J Surg Oncol, 2024; 22(1); 253
2. Pu T, Sun J, Ren G, Li H, Neuro-immune crosstalk in cancer: Mechanisms and therapeutic implications: Signal Transduct Target Ther, 2025; 10(1); 176
3. Weusthof C, Burkart S, Semmelmayer K, Establishment of a machine learning model for the risk assessment of perineural invasion in head and neck squamous cell carcinoma: Int J Mol Sci, 2023; 24(10); 8938
4. Tai SK, Li WY, Yang MH, Perineural invasion in T1 oral squamous cell carcinoma indicates the need for aggressive elective neck dissection: Am J Surg Pathol, 2013; 37(8); 1164-72
5. Guo JA, Hoffman HI, Shroff SG, Pan-cancer transcriptomic predictors of perineural invasion improve occult histopathologic detection: Clin Cancer Res, 2021; 27(10); 2807-15
6. Schmitd LB, Beesley LJ, Russo N, Redefining perineural invasion: Integration of biology with clinical outcome: Neoplasia, 2018; 20(7); 657-67
7. Bera K, Schalper KA, Rimm DL, Artificial intelligence in digital pathology – new tools for diagnosis and precision oncology: Nat Rev Clin Oncol, 2019; 16(11); 703-15
8. Chen D, Fu M, Chi L, Prognostic and predictive value of a pathomics signature in gastric cancer: Nat Commun, 2022; 13(1); 6903
9. Lan Y, Han B, Zhai T, Clinical application of machine learning-based pathomics signature of gastric atrophy: Front Oncol, 2024; 14; 1289265
10. Wang H, Huo R, He K, Perineural invasion in colorectal cancer: Mechanisms of action and clinical relevance: Cell Oncol (Dordr), 2024; 47(1); 1-17
11. Ying H, Shao J, Liao N, The effect of adjuvant chemotherapy on survival in node negative colorectal cancer with or without perineural invasion: A systematic review and meta-analysis: Front Surg, 2023; 10; 1308757
12. Marra A, Morganti S, Pareja F, Artificial intelligence entering the pathology arena in oncology: Current applications and future perspectives: Ann Oncol, 2025; 36(7); 712-25
13. Tang N, Pan S, Zhang Q, Radiomics for prediction of perineural invasion in colorectal cancer: A systematic review and meta-analysis: Abdom Radiol (NY), 2025; 50(8); 3415-34
14. Liu Z, Luo C, Chen X, Noninvasive prediction of perineural invasion in intrahepatic cholangiocarcinoma by clinicoradiological features and computed tomography radiomics based on interpretable machine learning: A multicenter cohort study: Int J Surg, 2024; 110(2); 1039-51
15. Li Y, Eresen A, Shangguan J, Preoperative prediction of perineural invasion and KRAS mutation in colon cancer using machine learning: J Cancer Res Clin Oncol, 2020; 146(12); 3165-74
16. Wang R, Dai W, Gong J, Development of a novel combined nomogram model integrating deep learning-pathomics, radiomics and immunoscore to predict postoperative outcome of colorectal cancer lung metastasis patients: J Hematol Oncol, 2022; 15(1); 11
17. Haight TJ, Eshaghi A, Deep learning algorithms for brain imaging: from black box to clinical toolbox?: Neurology, 2023; 100(12); 549-50
18. Wong F, Zheng EJ, Valeri JA, Discovery of a structural class of antibiotics with explainable deep learning: Nature, 2024; 626(7997); 177-85
19. Niazi MKK, Parwani AV, Gurcan MN, Digital pathology and artificial intelligence: Lancet Oncol, 2019; 20(5); e253-e61
20. Carpenter AE, Jones TR, Lamprecht MR, CellProfiler: Image analysis software for identifying and quantifying cell phenotypes: Genome Biol, 2006; 7(10); R100
21. Stirling DR, Swain-Bowden MJ, Lucas AM, CellProfiler 4: Improvements in speed, utility and usability: BMC Bioinformatics, 2021; 22(1); 433
22. Ge HT, Chen JW, Wang LL, Preoperative prediction of lymphovascular and perineural invasion in gastric cancer using spectral computed tomography imaging and machine learning: World J Gastroenterol, 2024; 30(6); 542-55
23. Jia H, Li R, Liu Y, Preoperative prediction of perineural invasion and prognosis in gastric cancer based on machine learning through a radiomics-clinicopathological nomogram: Cancers (Basel), 2024; 16(3); 614
24. Liu Y, Sun BJ, Zhang C, Preoperative prediction of perineural invasion of rectal cancer based on a magnetic resonance imaging radiomics model: A dual-center study: World J Gastroenterol, 2024; 30(16); 2233-48
25. Wang H, He K, Huo R, MAGEA6 engages a YY1-dependent transcription to dictate perineural invasion in colorectal cancer: Adv Sci (Weinh), 2025; 12(25); e2501119
26. Li J, Sun Y, Cao L, Wang F, Correlation of NPDC1 expression and perineural invasion status with clinicopathological features in patients with colon cancer: Int J Gen Med, 2023; 16; 4549-63
27. Chen L, Zhang H, Gao K, Investigation of the correlation between AGRN expression and perineural invasion in colon cancer: Front Mol Biosci, 2024; 11; 1510478
28. Zhang F, Chen H, Luo D, Lymphovascular or perineural invasion is associated with lymph node metastasis and survival outcomes in patients with gastric cancer: Cancer Med, 2023; 12(8); 9401-8
29. Wang Y, Fan X, Luo Z, A comprehensive study on the radiomic score derived from perineural invasion in gastric cancer and its correlation with the overall survival of patients: Radiol Med, 2025; 130(6); 865-79
30. Sun ZG, Chen SX, Sun BL, Important role of lymphovascular and perineural invasion in prognosis of colorectal cancer patients with N1c disease: World J Gastroenterol, 2025; 31(5); 102210
31. Liu F, Chu Y, Zheng Q, Major and minor perineural invasion in salivary gland cancer: Front Oncol, 2024; 14; 1466196
32. Selvaggi F, Bannone E, Melchiorre E, Perineural invasion in pancreatic cancer: Current biological function in R status, prognosis, and pain: Surg Open Sci, 2025; 24; 58-60
In Press
Clinical Research
Institutional and Regional Variations in Access to Clinical Trials and Next-Generation Sequencing in Turkis...Med Sci Monit In Press; DOI: 10.12659/MSM.951027
Clinical Research
Low-Intensity Blood Flow-Restricted Multi-Joint Exercise Improves Muscle Function in Patients With Patellof...Med Sci Monit In Press; DOI: 10.12659/MSM.950516
Review article
Musculoskeletal Ultrasound and MRI in the Evaluation of Chemotherapy-Induced Peripheral Neuropathy: A ReviewMed Sci Monit In Press; DOI: 10.12659/MSM.951283
Clinical Research
Sensory Processing, Dissociation, and Affective Symptoms in Misophonia: A Cross-Sectional Study of 35 AdultsMed Sci Monit In Press; DOI: 10.12659/MSM.950938
Most Viewed Current Articles
17 Jan 2024 : Review article 10,187,196
Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron VariantDOI :10.12659/MSM.942799
Med Sci Monit 2024; 30:e942799
13 Nov 2021 : Clinical Research 3,708,487
Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...DOI :10.12659/MSM.932788
Med Sci Monit 2021; 27:e932788
14 Dec 2022 : Clinical Research 2,341,643
Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase LevelsDOI :10.12659/MSM.937990
Med Sci Monit 2022; 28:e937990
16 May 2023 : Clinical Research 706,524
Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...DOI :10.12659/MSM.940387
Med Sci Monit 2023; 29:e940387






