Ensemble Deep Learning Model for Multicenter Classification of Thyroid Nodules on Ultrasound Images

Xi Wei; Ming Gao; Ruiguo Yu; Zhiqiang Liu; Qing Gu; Xun Liu; Zhiming Zheng; Xiangqian Zheng; Jialin Zhu; Sheng Zhang

doi:10.12659/MSM.926096

18 June 2020: Clinical Research

Ensemble Deep Learning Model for Multicenter Classification of Thyroid Nodules on Ultrasound Images

Xi Wei^1ACDEG*, Ming Gao^2BEF, Ruiguo Yu^3BCD, Zhiqiang Liu^3BCE, Qing Gu^4BC, Xun Liu^5BC, Zhiming Zheng^6BC, Xiangqian Zheng^2BC, Jialin Zhu^1ABCDEF, Sheng Zhang^1BD

DOI: 10.12659/MSM.926096

Med Sci Monit 2020; 26:e926096

Authors information Article notes Copyright and License information

0 Comments

Add Comment

Abstract

BACKGROUND: Thyroid nodules are extremely common and typically diagnosed with ultrasound whether benign or malignant. Imaging diagnosis assisted by Artificial Intelligence has attracted much attention in recent years. The aim of our study was to build an ensemble deep learning classification model to accurately differentiate benign and malignant thyroid nodules.

MATERIAL AND METHODS: Based on current advanced methods of image segmentation and classification algorithms, we proposed an ensemble deep learning classification model for thyroid nodules (EDLC-TN) after precise localization. We compared diagnostic performance with four other state-of-the-art deep learning algorithms and three ultrasound radiologists according to ACR TI-RADS criteria. Finally, we demonstrated the general applicability of EDLC-TN for diagnosing thyroid cancer using ultrasound images from multi medical centers.

RESULTS: The method proposed in this paper has been trained and tested on a thyroid ultrasound image dataset containing 26 541 images and the accuracy of this method could reach 98.51%. EDLC-TN demonstrated the highest value for area under the curve, sensitivity, specificity, and accuracy among five state-of-the-art algorithms. Combining EDLC-TN with models and radiologists could improve diagnostic accuracy. EDLC-TN achieved excellent diagnostic performance when applied to ultrasound images from another independent hospital.

CONCLUSIONS: Based on ensemble deep learning, the proposed approach in this paper is superior to other similar existing methods of thyroid classification, as well as ultrasound radiologists. Moreover, our network represents a generalized platform that potentially can be applied to medical images from multiple medical centers.

Keywords: Artificial Intelligence, Image Processing, Computer-Assisted, Thyroid Nodule, Ultrasonography, Adenocarcinoma, Follicular, Adenoma, Adolescent, Aged, 80 and over, Carcinoma, Neuroendocrine, Deep Learning, Goiter, Nodular, Granuloma, Image Interpretation, Computer-Assisted, Thyroid Cancer, Papillary, Thyroid Carcinoma, Anaplastic, young adult

Background

Thyroid nodules are common clinically, and with application of high-frequency ultrasound, their incidence has increased. Ultrasound diagnosis of benign and malignant nodules is mainly performed under guidelines from the American College of Radiology (ACR) [1] and the ultrasound section of the American Thyroid Association (ATA) [2], both of which have been increasingly improved in recent years. But there still remain some defects, and diagnostic accuracy is not consistent due to differing levels of experience among radiologists performing ultrasound [3]. With gradual development of machine learning in recent years, intelligent medical image diagnosis has become available. Deep learning can reveal subtler and more abstract information embedded in images along with the deepening of the network layers. In addition, use of artificial intelligence (AI) for medical or auxiliary medical care can lighten the burden of doctors and optimize medical treatments. Medical image processing is one of the breakthroughs in this field. Both deep learning and AI have achieved high accuracy for classification of skin cancer and detection of pneumonia [4,5], even exceeding that of physicians. This is also true for diagnosis of thyroid nodules [6–9].

In 2008, Lim KJ et al. [10] were the first to apply a neural network to differentiation of benign and malignant thyroid nodules. Ma J et al. [11] were the first to use a convolutional neural network in this field in 2017. They separately trained two networks in the ImageNet database. Then, by concatenating feature images, they used the softmax classifier to diagnose thyroid nodules with an accuracy of 83.02%±0.72%. Imaging diagnosis assisted by AI has attracted much attention in the past several years. If the diagnostic effectiveness of AI – including accuracy, sensitivity, and specificity – is found to be comparable to that of an experienced radiologist performing ultrasound, it will have a tremendous impact on the imaging diagnosis.

However, if ultrasound images are directly used as inputs to a neural network, the shape information from thyroid nodules may be lost. Thus, two different AI models were trained on the basis of ensemble learning [12]. To accurately diagnose thyroid nodules, we calculated the mean output of these two types of models and determined whether the thyroid nodules were benign or malignant using a new model: EDLC-TN (ensemble deep learning-based classifier for thyroid nodules). The aim of our research was to use the deep learning method to differentiate benign and malignant thyroid nodules, thereby improving the accuracy of lesion identification.

Material and Methods

STUDY COHORT AND DATASETS:

We used four independent ultrasound datasets to develop and evaluate EDLC-TN in four different hospitals: Tianjin Medical University Cancer Institute and Hospital (Center 1), Jilin Integrated Traditional Chinese and Western Medicine Hospital (Center 2), Cangzhou Hospital of Integrated Traditional Chinese, Western Medicine of Hebei Province (Center 3), and Peking University BinHai Hospital (Center 4). Between January 2015 and December 2017, consecutive patients in these four medical centers who underwent diagnostic thyroid ultrasound examination and subsequent surgery were included in the study. Exclusion criteria were: (1) images from anatomical sites that were judged as not having tumor according to postoperative pathology; (2) nodules with incomplete or low-quality ultrasound images; and (3) cases with incomplete clinicopathological information. Finally, three datasets from Centers 1 to 3 including a total of 25 509 thyroid ultrasound images were used to train and test the model, of which 15 255 were malignant and 10 254 were benign (confirmed by postoperative pathological diagnosis). Images (n=1,032) from Center 4 differed greatly from the other three in terms of style, clarity, and machine types. Therefore, the dataset from Center 4 was only used as an external validation set for verifying the generalizability of the model. Data from each medical Centers 1 to 3 were randomly divided into training and testing sets at a ratio of approximately 7: 3 (Table 1). In all settings, testing data did not include any images used in training.

This study was approved by the Tianjin Medical University Cancer Institute and Hospital ethics committee. Informed consent from patients was waived due to the retrospective nature. In training and test datasets, ultrasound images were collected and stored by various brands of ultrasonic equipment, such as PHILIPS, GE, Siemens, Mindray, and TOSHIBA. In addition, the images were acquired with superficial probes.

EXPERIMENTAL PATHWAYS:

Our experimental pathways mainly included three parts (Figure 1): segmentation of nodules, ensemble learning for classification, and testing the diagnostic performance of the model. The purpose of the training segmentation model was only to find the nodule automatically. To verify whether the algorithm was effective or not, we manually performed a test check of 500 images, reaching a relevance ratio of more than 98%. Using the segmentation model, the region of interest (ROI) containing the nodule was first segmented and then classification was modeled. Results of the classification were calculated quantitatively as the comprehensive evaluation of the two processes. The classification model was improved based on DenseNet [13] and adopted as a multistep cascade experiment pathway, as shown in Figure 2. The classification result was determined according to the voting of three weak models by the average method and the voting method. Finally, we compared diagnostic performance of the EDLC-TN with that of ultrasonographers and four advanced deep learning models, and conducted an external test.

EDLC-TN MODEL:

A multistep cascade experiment pathway was adopted, as shown in Figure 2.

First, the image boundary with annotation was cut off (Supplementary Table 1) for data cleaning. Then, the nodule and the surrounding area of the image (region of interest, ROI) was extracted. We used a semiautomatic method to achieve this goal, that is, carefully annotating the boundaries of thyroid nodules in 3000 images by hand, and training a nodule segmentation model with these marked images to segment all of the rest images. The structure of segmentation model is shown in Supplementary Table 2, and the method of converting the segmentation results to ROI is shown in Supplementary Table 1.

Through the above process, each image generated a three-channel ROI R, and a one-channel mask M. We used these data to train nodule classification models based on the structure shown in Supplementary Table 3. For better performance, we trained multiple models and combined them through two ensemble learning methods, namely the average and voting methods. The average method calculates the mean value of all base model results. For the voting method, each base model votes on the category of the image, and the final result is the category with more votes.

The Adam optimizer was used during the training. The learning rate was initialized as 0.1. After 60 epoch iterations, it was decreased to 0.01, and then reduced by 10 times after every 200 epochs. The batch size was adjusted to the maximum within the limits of the computer memory. We trained our models on NVIDIA TITAN XP GPU based on the TensorFlow framework.

RADIOLOGIST EVALUATION AND COMPARISON:

To assess the predictive effect of this deep learning algorithm, this paper reflects the performance of radiologists (W.X., Z.J.L. and Z.S.) on 1000 (11.52%, 1000/8,682) ultrasound images randomly selected from the test set and compares accuracy in differentiating between benign and malignant thyroid nodules on ultrasound images with the predictive results of deep learning models. The radiologists assessed nodules according to ACR TI-RADS criteria [1] and predicted whether a nodule was benign or malignant. After each individual independently judged and labeled each ultrasound image, in a kind of double-blind experiment, we used postoperative pathological analysis results (i.e., benign and malignant diagnoses that were completely correct) for statistical analysis. Finally, the average accuracy rate was calculated to assess each individual radiologist’s accuracy in evaluation of an ultrasound image of a thyroid nodule. The independent radiologists involved in the evaluation work were the attending doctor or associate professors. The first reader (W.X.) had 13 years of experience, the second reader (Z.J.L) had 8 years of experience, and the third reader (Z.S.) had more than 30 years of experience in diagnosing thyroid nodules.

COMPARISON WITH FOUR STATE-OF-THE-ART DEEP LEARNING MODELS:

We compared the diagnostic performance of our model with the four machine learning algorithms which are currently most popular and advanced, including ResNeXt [14], SE_Inception_v4 [15], SE_Net [16] and Xception [17]. These models are widely used in the field of AI of medical images [18,19]. The 3000 ultrasound images randomly selected from the test set in Center 1 were used for this part of the study. The area under the receiver operating characteristic (ROC) curve with a 95% confidence interval (CI), accuracy, sensitivity, and specificity were calculated to compare capability for diagnosing thyroid cancer on ultrasound.

GENERAL APPLICABILITY TEST:

In this section, we aimed to investigate the general applicability of our AI system for diagnosing thyroid cancer. We did so by testing our network on a dataset of ultrasound images (n=1032) from Peking University BinHai Hospital, including 502 benign nodule images and 530 malignant nodule images (Table 1).

STATISTICAL ANALYSIS:

Data are shown as the means and standard deviations for continuous variables. The number of patients and images were analyzed for categorical variables. Diagnostic performance of the EDLC-TN and the radiologists was evaluated by calculating sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy. To determine whether the diagnostic performance of our models significantly differed, the AUCs between the EDLC-TN and the other four models were compared using the Z test. The intraclass correlation coefficient (ICC) and Kappa value were used to assess test-retest reliability and inter-reader agreement for different radiologists. All statistical results shown were calculated using MedCalc for Windows v15.8 (MedCalc Software, Ostend, Belgium), and P<0.05 was considered statistically significant.

Results

FOUR IMAGE DATASETS AND STUDY POPULATION:

The total number of ultrasound images in this work was 26 541, including 10 756 benign nodule images and 15 785 malignant nodule images. Of the images, 17 859 (67.29%) images from Centers 1 to 3 were used for training. A total of 7 560 (28.82%) images from Centers 1 to 3 were used for internal testing. The dataset from Center 4 containing 1 032 (3.89%) images was only used as an external test set without training for verifying the generalizability of the model. Table 1 summarizes the number of images used in our training and testing datasets.

A total of 11 865 patients who underwent ultrasound examination and surgery between January 2015 and December 2017 at one of these four centers were included in this research. Demographic data and image information for all patients from four medical centers are shown in Table 2.

CLASSIFICATION BY EDLC-TN:

In this paper, accuracy, specificity and sensitivity were the main evaluation criteria for classification. Two models were similar in structure, so we analyzed the experimental results of one them, Classifier1, as the main model. The results of ensemble learning using different combination strategies are shown in Table 3. Among them, the method of voting requires at least three weak models, so two instances of weak classifier 1 are used.

The accuracy rate of the two weak models was already high. Strong classifier 1 and strong classifier 2 both were obtained by combining two models. Of the three methods, the averaging method calculates the arithmetic mean of the results obtained from the two models, the competition method takes the higher confidence level of two results as the predicted value, and the voting method combines the results of multiple (more than 3) models. All the models vote for benignancy and malignancy, with the majority of votes serving as the final result. Therefore, we found that the strong classifiers had higher accuracy than each weak classifier. The test results for weak and strong classifiers in diagnosis of thyroid nodules are shown in Supplementary Table 4.

The model proposed in this paper is the structure of “classification after segmentation”. The performance of ensemble learning is shown in Figure 3A. With the changing threshold, accuracy, specificity, and sensitivity continue to change. When the threshold is around 0.54, the accuracy, sensitivity, and specificity were all at the high level (93.70%, 93.19% and 94.01%, respectively).

:

In this experiment, three thyroid disease radiologists in the hospital were randomly selected to independently evaluate benign and malignant thyroid ultrasound images (the same test data set used for deep learning) and annotate them. The accuracy of each doctor and their average values are shown in Table 3. Those results indicate that the deep learning model proposed in this paper is more accurate than that of individual radiologists.

In addition, we also carried out relevant experiments with multi-expert cooperating diagnosis, that is, the three radiologists simultaneously performed benign and malignant judgments and voted on one ultrasound image, and the majority of the votes were the final results. After comparing the results of a single model and a single radiologist, the highest accuracy of the model was 93.70%. However, compared with the accuracy of the model, the result of the medical consultation of three radiologists was more accurate, with a rate of 95.43%. Finally, the accuracy was 96.54% with analyses of the model and radiologist combined, which was higher than that for independent diagnosis by either (Table 3).

The ICC and Kappa value were used to assess test-retest reliability and inter-reader agreement for three radiologists. As a result, the ICC of diagnosing results from three radiologists was 0.7052 (95%IC: 0.6836–0.7260). The Kappa values for Radiologist 1 vs. 2, Radiologist 2 vs. 3 and Radiologist 1 vs. 3 were 0.649 (95%IC: 0.609–0.689), 0.656 (95%IC: 0.616–0.696), 0.774 (95%IC: 0.741–0.808), separately.

:

The diagnostic performance of the four machine learning algorithms is shown in Table 4 and Figure 3B. The EDLC-TN model demonstrated the highest value for AUC (0.941, 95% CI: 0.935–0.946), which was significantly higher than the other four models (P<0.0001). Also, the EDLC-TN model performed had the highest values for sensitivity (93.77%), specificity (94.44%), and accuracy (98.51%).

GENERALIZABILITY OF EDLC-TN:

To investigate the generalizability of EDLC-TN in diagnosis of thyroid cancer, we applied the same deep learning framework to ultrasound images from Peking University BinHai Hospital (Center 4), which were not contained in the training set (Table 1). In this test, the EDLC-TN achieved an accuracy of 95.76%, with a sensitivity of 95.88% and a specificity of 93.75% in differentiating between benign and malignant thyroid nodules. The ROC curve is shown in Figure 3C and the area under the ROC curve of EDLC-TN for diagnosing thyroid cancer was 0.979 (95% CI: 0.958–0.992).

Discussion

Many researchers have made significant contributions to the field of deep learning models for differentiating between benign and malignant thyroid lesions. Xia J et al. [20] proposed an extreme learning machine (ELM) based on ultrasound features, such as composition, echogenicity, margin, shape, and calcification, to classify malignant and benign thyroid nodules and it achieved 87.72% diagnostic accuracy. Liu T et al. [21] used the CNN model learned from ImageNet as a pretrained feature extractor for an ultrasound image dataset. Their experimental results with 1 037 images demonstrated an accuracy of 93.1%. Li et al. [6] also structured an ensemble model for diagnosis of thyroid cancer based on ResNet 50 and Darknet 19. However, the diagnostic accuracy was only 85.7% to 88.9% because the types of two sub-models were similar.

In this study, we proposed a new ensemble deep learning classification model called EDLC-TN for classifying benign and malignant thyroid nodules by ultrasound with evidence from multiple centers. The strengths of EDLC-TN model are fourfold. The core of this method is performing deep learning model training on the basis of segmenting the ROI, which is the area where the thyroid nodule is located. The accuracy of this model is the highest among the state-of-the-art algorithms and other models mentioned above. The accuracy of our model in diagnosing benign and malignant thyroid nodules was higher than that of a single radiologist and the model could help improve the diagnostic accuracy of radiologists. This model represents a generalized platform that can be universally applied to ultrasound images from different medical centers. Moreover, remarkable progress has been made with deep learning in the field of image processing, resulting in mature models of segmentation, localization, and classification for natural images. We used ensemble learning methods to connect the results of multiple models of deep learning. With that method, it was possible to distinguish between malignant and benign nodules with the highest accuracy, in contrast to other advanced deep learning models. The diagnostic performance of the radiologists in diagnosing thyroid cancer can be significantly improved if combined with EDLC-TN. Therefore, it could benefit radiologists in diagnosis to a large extent.

Furthermore, our network is a general platform that can be universally applied to ultrasound images from different medical centers. When applying the EDLC-TN model to ultrasound images from a hospital with totally different types of ultrasound equipment, the EDLC-TN achieved excellent accuracy, sensitivity, and specificity. Even compared to a radiologist’s performance, our model also has advantages. The high accuracy with model in our study suggests that the EDLC-TN model has the potential to effectively learn from different types of medical images with a high degree of generalization. This could benefit screening programs and produce more efficient referral systems in all medical fields, particularly in low-resource or remote areas. The result might a wide-ranging impact on both clinical care and public health.

There are several limitations to this study. Our benign datasets contained a high percentage of malignant nodules and nodular goiters, which may have introduced bias. Only three senior radiologists were chosen as the matched group, contributing to study bias. This model did not analyze extensive pathological types of thyroid nodules; they will be assessed in future studies. Our algorithm only gives a classification result and not provide a classification standard or texture analysis. In medicine, a good predictive algorithm often is insufficient. What is needed is the ability to explain an algorithm’s decisions and increase the credibility of diagnostic results [22]. We did not know whether this model can be applied to other types of medical images. These limitations will be overcome by expanding the ultrasound images datasets with various image types.

Conclusions

In this work, we proposed an ensemble deep learning classification model called EDLC-TN for distinguishing between benign and malignant thyroid nodules in ultrasound images. In addition, our network represents a generalized platform that can potentially be applied to different medical centers to assist radiologists.

Figures

Pathways of experiments. Our experimental pathways mainly included three parts. (A) Data desensitization, removal of the sections of the patient’s personal information in the images. (B) Training and validation of ensemble learning for classification of thyroid nodules. In the segmentation part, the nodule area was manually marked and used to train the segmentation model. ROI and mask were extracted by the segmentation model. Then, three weak models were trained and combined to obtain an advanced classification model. (C) Comparison experiments with radiologists and other deep learning models, and external validation experiment. We then compared performance of the classification model with that of three ultrasound radiologists and four state-of-the-art deep learning models. Finally, we conducted an external validation using an independent dataset.

Figure 1. Pathways of experiments. Our experimental pathways mainly included three parts. (A) Data desensitization, removal of the sections of the patient’s personal information in the images. (B) Training and validation of ensemble learning for classification of thyroid nodules. In the segmentation part, the nodule area was manually marked and used to train the segmentation model. ROI and mask were extracted by the segmentation model. Then, three weak models were trained and combined to obtain an advanced classification model. (C) Comparison experiments with radiologists and other deep learning models, and external validation experiment. We then compared performance of the classification model with that of three ultrasound radiologists and four state-of-the-art deep learning models. Finally, we conducted an external validation using an independent dataset.

The multistep cascade experiment pathway of EDLC-TN. (A) The process of extracting ROI and mask. First, the boundary was cut off (a). Second, the nodule area was segregated (b). Then, the mask image of the thyroid nodule was depicted (c). Finally, ROI was segmented (d). (B) The process of classifying images by ensemble learning model. After obtaining the ROI and its corresponding mask, three classification models were trained and combined to obtain an advanced classification model. ROI was put into models and got the final classification result through the voting method.

Figure 2. The multistep cascade experiment pathway of EDLC-TN. (A) The process of extracting ROI and mask. First, the boundary was cut off (a). Second, the nodule area was segregated (b). Then, the mask image of the thyroid nodule was depicted (c). Finally, ROI was segmented (d). (B) The process of classifying images by ensemble learning model. After obtaining the ROI and its corresponding mask, three classification models were trained and combined to obtain an advanced classification model. ROI was put into models and got the final classification result through the voting method.

Performance of the EDLC-TN in identification of thyroid cancer in different datasets. (A) Performance of the EDLC-TN on the training dataset. The accuracy, sensitivity and specificity were 93.70%, 93.19%, and 94.01%, respectively. (B) Diagnostic performance of the EDLC-TN and four other state-of-the-art machine learning algorithms. The EDLC-TN demonstrated the highest value for AUC (0.941, 95% CI: 0.935–0.946), sensitivity (93.77%), specificity (94.44%), and accuracy (98.51%). (C) The performance of EDLC-TN on the external validation dataset. The EDLC-TN achieved an accuracy of 95.76%, with a sensitivity of 95.88%, a specificity of 93.75% and an AUC of 0.979 (95% CI: 0.958–0.992).

Figure 3. Performance of the EDLC-TN in identification of thyroid cancer in different datasets. (A) Performance of the EDLC-TN on the training dataset. The accuracy, sensitivity and specificity were 93.70%, 93.19%, and 94.01%, respectively. (B) Diagnostic performance of the EDLC-TN and four other state-of-the-art machine learning algorithms. The EDLC-TN demonstrated the highest value for AUC (0.941, 95% CI: 0.935–0.946), sensitivity (93.77%), specificity (94.44%), and accuracy (98.51%). (C) The performance of EDLC-TN on the external validation dataset. The EDLC-TN achieved an accuracy of 95.76%, with a sensitivity of 95.88%, a specificity of 93.75% and an AUC of 0.979 (95% CI: 0.958–0.992).

Tables

Table 1. Number of training and testing images from four datasets.

Table 2. Demographic data and image information for all patients from four medical centers.

Table 3. Comparison of the diagnostic performance of EDLC-TN with radiologists.

Table 4. Comparison of the diagnostic performance of EDLC-TN with other four state-of-the-art algorithms.

Supplementary Table 1. The algorithm for finding the upper and lower boundaries of a nodule.

Supplementary Table 2. ROI extraction algorithm structure.

Supplementary Table 3. Classification algorithm structure.

Supplementary Table 4. Test results of weak and strong classifiers in the diagnosis of thyroid nodules.

References

1. Tessler FN, Middleton WD, Grant EG, ACR thyroid imaging, reporting and data system (TI-RADS): White paper of the ACR TI-RADS Committee: J Am Coll Radiol, 2017; 14; 587-95

2. Haugen BR, Alexander EK, Bible KC, 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: The American Thyroid Association Guidelines Task Force on thyroid nodules and differentiated thyroid cancer: Thyroid, 2016; 26; 1-133

3. Hoang JK, Middleton WD, Farjat AE, Reduction in thyroid nodule biopsies and improved accuracy with American College of Radiology Thyroid Imaging Reporting and Data System: Radiology, 2018; 287; 185-93

4. Esteva A, Kuprel B, Novoa RA, Dermatologist-level classification of skin cancer with deep neural networks: Nature, 2017; 542; 115-18

5. Zech JR, Badgeley MA, Liu M, Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study: PLoS Med, 2018; 15; e1002683

6. Li X, Zhang S, Zhang Q, Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: A retrospective, multicohort, diagnostic study: Lancet Oncol, 2019; 20; 193-201

7. Zhang B, Tian J, Pei S, Machine learning-assisted system for thyroid nodule diagnosis: Thyroid, 2019; 29; 858-67

8. Jeong EY, Kim HL, Ha EJ, Computer-aided diagnosis system for thyroid nodules on ultrasonography: Diagnostic performance and reproducibility based on the experience level of operators: Eur Radiol, 2019; 29; 1978-85

9. Buda M, Wildman-Tobriner B, Hoang JK, Management of thyroid nodules seen on us images: Deep learning may match performance of radiologists: Radiology, 2019; 292; 695-701

10. Lim KJ, Choi CS, Yoon DY, Computer-aided diagnosis for the differentiation of malignant from benign thyroid nodules on ultrasonography: Acad Radiol, 2008; 15; 853-58

11. Ma J, Wu F, Zhu J, A pre-trained convolutional neural network based method for thyroid nodule diagnosis: Ultrasonics, 2017; 73; 221-30

12. Igelnik B, Pao YH, LeClair SR, Shen CY, The ensemble approach to neural-network learning and generalization: IEEE Trans Neural Netw, 1999; 10; 19-30

13. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ, Densely connected convolutional networks; 4700-8

14. Xie S, Girshick R, Dollár P, Aggregated residual transformations for deep neural networks; 1492-1500

15. Szegedy C, Loffe S, Vanhoucke V, Alemi AA, Inception-v4, inception-resnet and the impact of residual connections on learning: AAAI, 2017; 4; 12

16. Hu J, Shen L, Sun G: Squeeze-and-excitation networks; 7132-41

17. Chollet F, Xception: Deep learning with depthwise separable convolutions; 1251-58

18. Lee JH, Ha EJ, Kim D, Application of deep learning to the diagnosis of cervical lymph node metastasis from thyroid cancer with CT: External validation and clinical utility for resident training: Eur Radiol, 2020; 30; 3066-72

19. Wu P, Cui Z, Gan Z, Liu F, Three-dimensional resnext network using feature fusion and label smoothing for hyperspectral image classification: Sensors (Basel), 2020; 20; 1652

20. Xia J, Chen H, Li Q, Ultrasound-based differentiation of malignant and benign thyroid Nodules: An extreme learning machine approach: Comput Methods Programs Biomed, 2017; 147; 37-49

21. Liu T, Xie S, Yu J, Classification of thyroid nodules in ultrasound images using deep model based transfer learning and hybrid features: Book classification of thyroid nodules in ultrasound images using deep model based transfer learning and hybrid features, 2017; 919-23

22. Sollini M, Cozzi L, Chiti A, Kirienko M, Texture analysis and machine learning to characterize suspected thyroid nodules and differentiated thyroid cancer: Where do we stand?: Eur J Radiol, 2018; 99; 1-8

Related articles Order reprints Share article Share by email

Figures

Figure 1. Pathways of experiments. Our experimental pathways mainly included three parts. (A) Data desensitization, removal of the sections of the patient’s personal information in the images. (B) Training and validation of ensemble learning for classification of thyroid nodules. In the segmentation part, the nodule area was manually marked and used to train the segmentation model. ROI and mask were extracted by the segmentation model. Then, three weak models were trained and combined to obtain an advanced classification model. (C) Comparison experiments with radiologists and other deep learning models, and external validation experiment. We then compared performance of the classification model with that of three ultrasound radiologists and four state-of-the-art deep learning models. Finally, we conducted an external validation using an independent dataset.

Figure 2. The multistep cascade experiment pathway of EDLC-TN. (A) The process of extracting ROI and mask. First, the boundary was cut off (a). Second, the nodule area was segregated (b). Then, the mask image of the thyroid nodule was depicted (c). Finally, ROI was segmented (d). (B) The process of classifying images by ensemble learning model. After obtaining the ROI and its corresponding mask, three classification models were trained and combined to obtain an advanced classification model. ROI was put into models and got the final classification result through the voting method.

Figure 3. Performance of the EDLC-TN in identification of thyroid cancer in different datasets. (A) Performance of the EDLC-TN on the training dataset. The accuracy, sensitivity and specificity were 93.70%, 93.19%, and 94.01%, respectively. (B) Diagnostic performance of the EDLC-TN and four other state-of-the-art machine learning algorithms. The EDLC-TN demonstrated the highest value for AUC (0.941, 95% CI: 0.935–0.946), sensitivity (93.77%), specificity (94.44%), and accuracy (98.51%). (C) The performance of EDLC-TN on the external validation dataset. The EDLC-TN achieved an accuracy of 95.76%, with a sensitivity of 95.88%, a specificity of 93.75% and an AUC of 0.979 (95% CI: 0.958–0.992).

Tables

Table 1. Number of training and testing images from four datasets.

Table 2. Demographic data and image information for all patients from four medical centers.

Table 3. Comparison of the diagnostic performance of EDLC-TN with radiologists.

Table 4. Comparison of the diagnostic performance of EDLC-TN with other four state-of-the-art algorithms.

Table 1. Number of training and testing images from four datasets.

Table 2. Demographic data and image information for all patients from four medical centers.

Table 3. Comparison of the diagnostic performance of EDLC-TN with radiologists.

Table 4. Comparison of the diagnostic performance of EDLC-TN with other four state-of-the-art algorithms.

Supplementary Table 1. The algorithm for finding the upper and lower boundaries of a nodule.

Supplementary Table 2. ROI extraction algorithm structure.

Supplementary Table 3. Classification algorithm structure.

Supplementary Table 4. Test results of weak and strong classifiers in the diagnosis of thyroid nodules.

In Press

Clinical Research
Body Weight and Insulin Resistance Indicators Among Children

Med Sci Monit In Press; DOI: 10.12659/MSM.951434

Clinical Research
Comparison of Radiographic Cervical Sagittal Alignment Parameters in Patients With Nonspecific Neck Pain, D...

Med Sci Monit In Press; DOI: 10.12659/MSM.952950

Clinical Research
Combined Fibrinogen and Urinary α1-Microglobulin as Predictors of Respiratory Tract Infection in Children w...

Med Sci Monit In Press; DOI: 10.12659/MSM.951066

Database Analysis
Evaluation of Salivary Total Oxidant Status (TOS) and Total Antioxidant Status (TAS) in Orthodontic Patient...

Med Sci Monit In Press; DOI: 10.12659/MSM.952052

Most Viewed Current Articles

17 Jan 2024 : Review article 14,175,576
Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

DOI :10.12659/MSM.942799

Med Sci Monit 2024; 30:e942799

0:00

13 Nov 2021 : Clinical Research 3,756,620
Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

DOI :10.12659/MSM.932788

Med Sci Monit 2021; 27:e932788

0:00

14 Dec 2022 : Clinical Research 2,465,966
Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

DOI :10.12659/MSM.937990

Med Sci Monit 2022; 28:e937990

0:00

16 May 2023 : Clinical Research 708,651
Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...

DOI :10.12659/MSM.940387

Med Sci Monit 2023; 29:e940387

0:00

Ensemble Deep Learning Model for Multicenter Classification of Thyroid Nodules on Ultrasound Images

Abstract

Background

Material and Methods

Results

Discussion

Conclusions

Figures

Tables

References

Figures

Tables

In Press

Clinical Research Body Weight and Insulin Resistance Indicators Among Children

Clinical Research Comparison of Radiographic Cervical Sagittal Alignment Parameters in Patients With Nonspecific Neck Pain, D...

Clinical Research Combined Fibrinogen and Urinary α1-Microglobulin as Predictors of Respiratory Tract Infection in Children w...

Database Analysis Evaluation of Salivary Total Oxidant Status (TOS) and Total Antioxidant Status (TAS) in Orthodontic Patient...

Most Viewed Current Articles

17 Jan 2024 : Review article 14,175,576 Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

13 Nov 2021 : Clinical Research 3,756,620 Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

14 Dec 2022 : Clinical Research 2,465,966 Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

16 May 2023 : Clinical Research 708,651 Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...

Your Privacy

Clinical Research
Body Weight and Insulin Resistance Indicators Among Children

Clinical Research
Comparison of Radiographic Cervical Sagittal Alignment Parameters in Patients With Nonspecific Neck Pain, D...

Clinical Research
Combined Fibrinogen and Urinary α1-Microglobulin as Predictors of Respiratory Tract Infection in Children w...

Database Analysis
Evaluation of Salivary Total Oxidant Status (TOS) and Total Antioxidant Status (TAS) in Orthodontic Patient...

17 Jan 2024 : Review article 14,175,576
Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

13 Nov 2021 : Clinical Research 3,756,620
Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

14 Dec 2022 : Clinical Research 2,465,966
Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

16 May 2023 : Clinical Research 708,651
Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...