Can ChatGPT Provide Patient-Friendly and Reliable Information on Cervical Cancer Screening? A Study of ChatGPT-Generated Information in Polish

Adriana A. Michalska; Małgorzata M. Stefaniak; Joanna Gotlib-Małkowska

doi:10.12659/MSM.947992

03 July 2025: Database Analysis

Can ChatGPT Provide Patient-Friendly and Reliable Information on Cervical Cancer Screening? A Study of ChatGPT-Generated Information in Polish

Adriana A. Michalska^{ABCDEF 1}, Małgorzata M. Stefaniak

^{BCDF 2}, Joanna Gotlib-Małkowska

^{ADE 1*}

DOI: 10.12659/MSM.947992

Med Sci Monit 2025; 31:e947992

Authors information Article notes Copyright and License information

0 Comments

Add Comment

Abstract

0:00

BACKGROUND: Cervical cancer (CC) mortality remains a global health problem, and women’s awareness of the need for regular CC screening is insufficient. In the era of rapid development of artificial intelligence (AI), large language models (LLMs) such as ChatGPT may become an important source of information about health, including women’s health and screening. The aim of the study was to assess the quality of ChatGPT-generated medical information on cervical cancer screening (MICC_GPT).

MATERIAL AND METHODS: A total of 196 MICC_GPT items were assessed. Objective readability was assessed on a scale of 1 to 7 using Jasnopis.pl programming. Subjective assessment of the MICC-GPT on a scale of 1 to 5 was made by 4 independent expert reviewers.

RESULTS: The average difficulty level of the MICC_GPT was 5.314, and full understanding of the MICC_GPT requires the target audience to have completed at least 15 years of formal education. In the opinion of the expert reviewers, most of the MICC_GPTs (71%) met the criteria of relevance, accuracy, completeness, clarity, coherence, and appropriateness.

CONCLUSIONS: ChatGPT-generated information in Polish on the topic of cervical cancer screening is accurate, detailed, comprehensive, clear, coherent and appropriate for the target audience, confirming the great potential of ChatGPT in health education, as well as promotion of screening tests and cervical cancer prophylaxis. To reduce the level of difficulty and increase the accessibility of the MICC_GPT for less educated end-users, the MICC_GPT needs to be simplified, especially in terms of specialized medical terminology.

Keywords: Artificial Intelligence, Evidence-Based Medicine, Health Policy, Humans, Uterine Cervical Neoplasms, Female, Poland, Early Detection of Cancer, adult, Middle Aged, Mass Screening, Generative Artificial Intelligence

Introduction

THE AIM OF THE STUDY:

The aim of the study was to assess the quality of ChatGPT-generated medical information on cervical cancer screening (MICC_GPT), provided in Polish language. We performed an objective assessment of the readability of MICC_GPT and a subjective assessment of the MICC_GPT content, following the RACCA model criteria of relevance, accuracy, completeness, clarity, coherence, and appropriateness. The ChatGPT-4omni Plus chatbot model (model GPT-4o) was used.

Material and Methods

STUDY DESIGN:

The study was conducted at Warsaw Medical University, Poland, between 2 September 2024, when the pilot phase began, and 5 November 2024, when the evaluation of the content generated by ChatGPT was completed.

THEORETICAL FRAMEWORK:

The study followed Sallam et al’s guidelines for good practice in healthcare education research and practice using content generated by artificial intelligence algorithms, including LLMs [13]. As suggested by Sallam et al, there are 9 aspects to consider when designing a study and describing its results: (1) the design of the LLM model used to generate the content, (2) the methods of evaluation of the output data (objective vs subjective assessment), (3) the exact time and date of the LLM model testing and output generation, (4) the transparency of the input data, (5) the scope of the input data, (6) the randomness of the input data, (7) individual factors affecting the consistency of evaluation of input and output data, (8) the number of queries performed, and (9) prompt design [13].

FORMULATION OF CERVICAL CANCER SCREENING QUESTIONS:

The authors analyzed recent international scientific publications on women’s knowledge of CC screening [14–18], identified 5 key thematic areas, and created an initial database of 76 questions on CC screening: (1) general information about the test (16 items), (2) preparation for the test (15 items), (3) specimen collection for testing (16 items), (4) interpretation of the test results (16 items), (5) management of an abnormal test result (13 items) (Raw Data 1 Selection of Questions for the Study on Cervical Cancer Screening).

PILOT STUDY:

A pilot study was conducted on 2 September 2024 using the ChatGPT-4omni chatbot (GPT-4o model). From all the CC screening raw questions (no specific prompts) to be used in the main part of the study, one of the co-authors (JGM) selected 6 items: 3 on general information about the test and 3 on test preparation.

The questions were formulated with the simple text-based instruction method, which is most commonly used in conversations with LLMs by people unfamiliar with advanced prompt engineering techniques [19,20].

A total of 30 ChatGPT-generated responses were analyzed for this pilot study. Five responses were generated for each of the 6 questions, with the response generated in a new ChatGPT conversation window each time to avoid model memory and in-context learning effects. A preliminary assessment of the quality of the ChatGPT-generated responses and medical information on CC screening was performed by the first co-author of the publication (AM).

The pilot study aimed to identify potential issues related to the formulation of questions or the quality of responses that might require modification before the main study started. The preliminary quality assessment of the ChatGPT-generated medical information also aimed to determine whether the responses were: (1) consistent with medical knowledge, (2) clear and understandable, (3) complete, and (4) useful for patient education. It would also identify satisfactory and unsatisfactory responses before the actual study began. Analysis of the pilot-generated content did not reveal any need for question modification.

EXPERT REVIEW OF THE USEFULNESS OF CC SCREENING QUESTIONS:

Seven out of 76 questions were rejected as unsuitable following a preliminary assessment of their usefulness by an expert midwife (MS), a professor with 14 years’ experience and substantial academic achievements in midwifery.

Next, 20 licensed expert midwives further assessed the relevance of the questions. The sampling was purposive, as the aim was to recruit experienced midwives recognized as experts in performing cervical cytology. The selection criteria included: (1) professional experience in conducting cervical screening, (2) completion of specialized training in cervical cytology, and (3) active involvement in clinical practice related to cervical cancer prevention. Midwives who met these criteria were identified through professional networks and invited to participate in the study. The selection process was designed to ensure that the participants had a high level of expertise and practical competence in cervical cytology.

On 4 October 2024, the experts anonymously rated the usefulness of the remaining questions by completing a review form available online. The rating was done on a 5-point scale: 1 – definitely not useful, 2 – rather not useful, 3 – no opinion, 4 – rather useful, 5 – definitely useful. Each expert reviewer evaluated their own separate set of ChatGPT-generated answers. No single expert assessed all responses included in the study. This approach was chosen to distribute the workload and ensure an efficient evaluation process.

Items receiving at least 50% of the ‘definitely useful’ responses or at least 50% combined ‘rather useful’ and ‘definitely useful’ responses, were included in the subsequent stage of the study. Twenty questions did not meet the expert criterion for inclusion, and 49 items were finally included in the main part of the study.

The list of all 76 questions, together with information on the exclusion of particular items at each stage of the survey is presented in Raw Data 1 Selection of Questions for the Study on Cervical Cancer Screening.

THE OROCESS OF CONTENT GENERATION BY CHATGPT:

The answers to the 49 questions ultimately included in the study were generated by ChatGPT between 7 October and 11 October 2024. The study used the ChatGPT-4omni Plus chatbot model (model GPT-4o), with no personalization, and with ChatGPT learning and customization disabled. The responses were generated by the same researcher (JGM) as in the pilot study. The same computer and IP number were used.

The answers to each of the 49 questions were generated 4 times, starting with Question 1. This resulted in a total of 196 answers. To avoid learning effects of the ChatGPT model, each answer was generated in a new conversation window. When ChatGPT generated 2 answers to choose from simultaneously, both answers were stored in the database. An Excel spreadsheet (Raw Data 2) was used to record all questions and answers. The data were then cleaned by removing special characters and formatting marks (### and **) used to tag some of the words selected by ChatGPT.

An analysis of all 196 ChatGPT-generated responses confirmed that none of the items contained misleading, harmful, or potentially dangerous medical information.

READABILITY OF CHATGPT-GENERATED RESPONSES TO CC SCREENING QUESTIONS:

Analysis of the readability of the MICC_GPT was carried out using software available at https://www.jasnopis.pl/, ona scale from 1 (an easy-to-understand text, for those with basic education) to 7 (very difficult-to-understand text, for those with a doctoral degree or expert knowledge of the field). Jasnopis.pl is an online software tool used in our study to objectively assess the readability level of the ChatGPT-generated texts in Polish. It analyzes text complexity and assigns a readability score on a scale from 1 (very easy to understand, equivalent to elementary school level) to 7 (very difficult to understand, requiring approximately 15 years of formal education – equivalent to a bachelor’s or engineering degree). Jasnopis was used solely for this purpose, providing an objective measure of how understandable the generated content is for people of different education levels. The FOG index indices were also analyzed: headwords (words whose standardized/dictionary entry forms have 4 syllables or more), run-on words (words whose running forms, as they appear in the text, have 4 syllables or more), and low-frequency headwords (difficult words those whose keyword forms have 4 syllables or more).

CONTENT ANALYSIS OF THE QUALITY OF CHATGPT-GENERATED RESPONSES TO CC SCREENING QUESTIONS:

For the content quality analysis of the MICC_GPT, 4 further independent expert midwives were invited as expert reviewers (ERs). They had not been involved in the initial survey item selection.

EXPERT REVIEWER CHARACTERISTICS:

All ERs have a university degree in midwifery, a master’s degree in midwifery, are professionally active, and have a licence to practise midwifery in Poland. The detailed characteristics of ERs are presented in Table 1.

The ERs were unaware that the content was generated by ChatGPT. They used the RACCCA framework to evaluate various aspects of information quality: information reliability theories, information quality research in information systems, text readability and comprehensibility evaluation, text linguistic quality evaluation, context analysis, and style evaluation. There are a number of criteria within this framework: relevance refers to whether the answer directly addresses the query; accuracy is a criterion of precision and consistency with scientific data; completeness indicates whether the answer is comprehensive and provides all essential information; clarity indicates whether the answer is clear, unambiguous, logical and well-structured; coherence analyzes whether the answer is free from contradictions, is logically structured, thematically consistent, and flows smoothly; and appropriateness indicates whether the answer is appropriate in terms of the context, form, and intended audience (https://www.linkedin.com/pulse/frameworks-effective-prompting-using-raccca-clear-models-wadsworth-sjpke/).

A 5-point Likert scale was used for all criteria, with 1 representing complete failure to meet the criterion and 5 signifying full compliance. Standardization of scoring was ensured by providing all reviewers with detailed written instructions and a clear description of each evaluation criterion.

ChatGPT generated answers to 49 questions. Each answer was generated 4 times, resulting in a total of 196 responses (49 questions ×4 answers). This created 4 unique sets of answers, 1 for each expert reviewer. Each expert received their own unique set of 49 answers for evaluation. Each ER obtained a link to their own evaluation form and rated 49 separate MICC_GPTs. The evaluation of the MICC-GPTs took place between 27 October and 5 November 2024.

STATISTICAL ANALYSIS:

The initial analysis involved the extraction of 17 linguistic features of text readability. To ensure data consistency, the numerical values of each feature were standardized. The analysis of the effect of these features on text difficulty was carried out in 2 steps.

In the first step, machine learning techniques were applied using Microsoft Azure cloud computing to identify the main predictors of text difficulty. The most relevant features were selected using the Voting Ensemble algorithm. In the second step, an ordinal regression model was built based on the selected predictors, which allowed quantitative assessment of the impact of each predictor on readability.

The quality and reliability of the Voting Ensemble model was assessed through cross-validation and the calculation of Nagelkerke prediction error rates (RMSE, MAE) and pseudo-R2. The significance of the predictors was assessed using a matrix of regression coefficients and a weight distribution analysis with box plots. For the ordinal regression model, the odds ratio (OR) and 95% confidence interval (CI) were calculated for each predictor.

STATISTICA™ 13.3 (Tibco Software) and Microsoft Azure software were used for the calculations. The level of statistical significance was set at α=0.05.

Results

Objective Analysis

OBJECTIVE ANALYSIS:

None of the ChatGPT-generated responses included in our study contained any potentially harmful information, such as recommendations against cervical cancer screening or advice that could negatively impact patient safety.

The average MICC_GPT difficulty level was 5.314, and full comprehension of the MICC_GPT required the target audience to have participated in formal education for at least 15 years (eg, to have completed a bachelor’s or engineering degree).

The FOG index for technical terms was 12.07 on average, indicating texts of intermediate difficulty. In contrast, the FOG index for low-frequency complex headwords was higher (13), suggesting challenges for elementary-educated audience when reading specialized medical content (Table 2).

The vast majority of MICC_GPTs (82%) required the audience to have a university degree to fully understand the text (Table 3).

INFLUENCE OF SELECTED LINGUISTIC FEATURES ON READABILITY:

The results of the analysis using the Voting Ensemble technique indicate a high explanatory power of the model, with an R² value of 0.695, which means that 69.5% of the variance in text difficulty was explained by the included predictors. Prediction error rates, such as RMSE (0.429) and MAE (0.332), showed low deviations of predicted difficulty levels from the observed values. Among the predictors, ‘Percentage of verbs’ stands out as the only one with a negative effect on the level of difficulty, while ‘Number of difficult words,’ ‘Percentage of nouns,’ and ‘Average word length [syllables]’ had a significant positive effect on the level of difficulty.

The ordinal regression model showed a good quality of fit (R2N=0.625). The overall test of the model (χ2=268, df=6, P<0.001) confirmed its significance and adequacy. The regression analysis revealed the following significant predictors of text difficulty: the percentage of verbs (negative effect, OR=0.662), the number of difficult words (OR=1.073), the percentage of nouns (OR=1.337), the average word length expressed in syllables (OR=17886), the percentage of difficult words (OR=1.248), and the FOG index for complex headwords (OR=1.753). All these variables had statistically significant impact on text difficulty (Table 4).

CONTENT ANALYSIS OF CHAT-GPT GENERATED RESPONSES TO CC SCREENING QUESTIONS: In the opinion of the ERs, most MICC_GPTs met (31%) or fully met (40%) the assessed criteria of relevance, accuracy, completeness, clarity, coherence, and appropriateness (Table 5).

Individual reviewers differed in their assessment of the MICC_GPTs: Expert Reviewer 2 rated the largest number of MICC_GPTs (38%) as failing to meet the criteria. As each expert reviewer assessed a different set of 49 responses, it was not possible to statistically analyze inter-rater reliability.

RELEVANCE:

An assessment of the relevance of the MICC_GPTs showed that the responses met or fully met the criterion. More than 77% of the responses received the highest rating (4 or 5 on a 5-point scale), suggesting that ChatGPT generally provides relevant information. Discrepancies were observed between the rating of Expert Reviewer 2 and the other expert reviewers (Raw Data 3).

ACCURACY:

The assessment of the accuracy of the MICC_GPTs was also largely positive. As many as 71% of the responses were rated as meeting or fully meeting the accuracy criterion, indicating a high level of accuracy and adherence to scientific fact. Significant differences were noted between the rating of Expert Reviewer 2 and the other expert reviewers, particularly for responses rated as failing the criterion (Raw Data 3).

COMPLETENESS:

The analysis of the completeness of the MICC_GPTs showed a high degree of compliance with expert expectations for most of the responses. In total, 75% of the responses were rated as either meeting (4) or fully meeting (5) the completeness criterion, indicating the ability of the model to provide comprehensive and sufficient information. Expert Reviewer 2 rated a significant percentage of responses as not fully informative, which may indicate areas for further improvement (Raw Data 3).

CLARITY:

The results for the clarity of the ChatGPT-generated responses showed mostly positive ratings. Approximately 72% of the responses were found to meet (4) or fully meet (5) the clarity criterion. This shows that the information was generally clear and understandable to the audience. However, as with the other criteria, there were differences in the ratings between Expert Reviewer 2 and the other expert reviewers (Raw Data 3).

COHERENCE:

The evaluation of the coherence of the ChatGPT-generated content showed most responses to be logical and direct. Approximately 70% of the responses were rated as either meeting (4) or fully meeting (5) the coherence criterion. The results indicate the model’s ability to generate responses that are clear and free of inconsistencies. However, as with the other criteria, discrepancies were observed in the ratings of particular expert reviewers (Raw Data 3).

APPROPRIATENESS:

The assessment of the appropriateness of the ChatGPT-generated responses identified more than 68% of the responses as meeting (4) or fully meeting (5) the criterion, indicating the model’s ability to generate content tailored to the context and expectations of the audience. However, a significant proportion of responses were rated as not meeting (2) or completely failing (1) the criterion (Raw Data 3).

Discussion

LIMITATIONS:

Our results are not without limitations. The expert reviewers’ assessment of MICC-GPT quality was subjective in nature, and their professional experience may have had a significant influence on their judgement. In addition, as each reviewer assessed a different set of 49 MICC-GPTs, it was not possible to measure inter-rater reliability.

FURTHER DIRECTIONS FOR ON-GOING RESEARCH:

Further research should focus on improving communication between end-users and ChatGPT, particularly in terms of tailoring responses to patients’ varying levels of education and medical knowledge. Key areas for analysis include communication effectiveness and evaluation of query formulation techniques (simple text-based instructions) for users unfamiliar with advanced prompt engineering, language simplification and analysis of the impact of simplifying medical terminology on comprehension of responses and patient health decision-making, personalization of content, and analysis of ChatGPT’s ability to tailor responses to individual needs such as age, medical knowledge, or communication preferences.

Given the critical role of the human papillomavirus (HPV) vaccine in cervical cancer prevention and its frequent targeting in anti-vaccination discourse, future research should include comparative analyses of different large language models (LLMs), such as ChatGPT, Gemini, or Claude, in providing information on this topic. Such studies would allow for a comprehensive evaluation of the accuracy, completeness, and potential biases in the information these models generate, particularly concerning vaccine safety and efficacy. Understanding how various LLMs address controversial topics, including misinformation commonly found in anti-vaccination narratives, is essential for assessing their suitability as tools for public health education. Further exploration in this area could support efforts to enhance the reliability of AI-generated content and promote informed decision-making among patients and healthcare consumers.

Conclusions

PRACTICAL IMPLICATIONS:

The findings of this study highlight the potential for responsible integration of ChatGPT into midwifery education, patient counseling, and public health campaigns focused on cervical cancer screening. In midwifery education, ChatGPT can serve as a supplementary tool to enhance students’ understanding of screening guidelines and patient communication strategies by providing accessible explanations and generating patient-friendly educational materials. For patient counseling, ChatGPT may assist healthcare professionals in preparing clear and consistent information about the importance, process, and outcomes of cervical cancer screening, supporting informed decision-making. In public screening campaigns, ChatGPT can be utilized to draft educational content, such as FAQs or social media posts, tailored to different literacy levels, thus promoting awareness and improving participation rates. To ensure safety and accuracy, all ChatGPT-generated materials should be reviewed and validated by qualified healthcare professionals before dissemination.

Tables

Table 1. Detailed characteristics of expert reviewers.

Table 2. Readability analysis of ChatGPT-generated answers in Polish to CC screening questions: mean readability score and mean FOG index in the Jasnopis.pl app.

Readability analysis of ChatGPT-generated answers in Polish to CC screening questions: mean readability score and mean FOG index in the Jasnopis.pl app.

Table 3. Detailed analysis of readability of MICC_GPTs using Jasnopis.pl software.

Table 4. Ordinal regression parameters for predictors of readability.

Table 5. Expert reviewers’ overall assessment of the quality of the MICC-GPTs.

References

1. International Agency for Research on Cancer, World Health Organization, Global Cancer Observatory, Cancer Today: GCO Online Early https://gco.iarc.who.int/today/en

2. European Commission, Cancer screening statistics [Internet]: Eurostat Statistics Explained, 2024 [cited 2024 Nov 28]. Available from: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Cancer_screening_statistics#Cervical_cancer_screening

3. United Health Foundation, America’s Health Rankings analysis of CDC, Behavioral Risk Factor Surveillance System: National highlights: 2022 Health of Women and Children Report America’s Health Rankings https://www.americashealthrankings.org/learn/reports/2022-health-of-women-and-children-report/national-highlights

4. Bruni L, Serrano B, Roura E, Cervical cancer screening programmes and age-specific coverage estimates for 202 countries and territories worldwide: A review and synthetic analysis: Lancet Glob Health, 2023; 11(7); e1011

5. Wu S, Jiao J, Yue X, Wang Y, Cervical cancer incidence, mortality, and burden in China: A time-trend analysis and comparison with England and India based on the global burden of disease study 2019: Front Public Health, 2024; 12; 1358433

6. Mijiti Y, Yusupu H, Liu H, Survey on cervical cancer knowledge and its influencing factors among 2,578 women in Shache county, Kashi, China: BMC Womens Health, 2023; 23; 246

7. Thiel de Bocanegra H, Dehlendorf C, Kuppermann M, Impact of an educational tool on young women’s knowledge of cervical cancer screening recommendations: Cancer Causes Control, 2022; 33(6); 813-21

8. Al Yahyai T, Al Raisi M, Al Kindi R, Knowledge, attitudes, and practices regarding cervical cancer screening among Omani women attending primary healthcare centers in Oman: A cross-sectional study: Asian Pac J Cancer Prev, 2021; 22(3); 775-83

9. Bach RL, Wenz A, Studying health-related internet and mobile device use using web logs and smartphone records: PLoS One, 2020; 15(6); e0234663

10. Bujnowska-Fedak MM, Węgierek P, The impact of online health information on patient health behaviours and making decisions concerning health: Int J Environ Res Public Health, 2020; 17(3); 880

11. Scarano Pereira JP, Martinino A, Manicone F, Bariatric surgery on social media: A cross-sectional study: Obes Res Clin Pract, 2022; 16(2); 158-62

12. Shahsavar Y, Choudhury A, User intentions to use ChatGPT for self-diagnosis and health-related purposes: Cross-sectional survey study: JMIR Hum Factors, 2023; 10; e47564

13. Sallam M, Barakat M, Sallam M, A preliminary checklist (METRICS) to standardize the design and reporting of studies on generative artificial intelligence–based models in health care education and practice: development study involving a literature review: Interact J Med Res, 2024; 13; e54704

14. Rezaie-Chamani S, Mohammad-Alizadeh-Charandabi S, Kamalifard M, Knowledge, attitudes, and practice about Pap smear among women referring to a public hospital: J Family Reprod Health, 2012; 6(4); 177-82

15. Al Ghamdi NH, Knowledge of human papilloma virus (HPV), HPV vaccine, and Pap smear among adult Saudi women: J Family Med Prim Care, 2022; 11(6); 2989-99

16. Stefanek A, Durka P, Poziom świadomości kobiet na temat profilaktyki raka szyjki macicy: Polski Przegląd Nauk o Zdrowiu, 2014; 1(38); 29-38 [in Polish]

17. Osowiecka K, Yahuza S, Szwiec M, Students’ knowledge about cervical cancer prevention in Poland: Medicina (Kaunas), 2021; 57(10); 1045

18. Deguara M, Calleja N, England K, Cervical cancer and screening: knowledge, awareness and attitudes of women in Malta: J Prev Med Hyg, 2021; 61(4); E584-92

19. Törnberg P, How to use llms for text analysis: arXiv preprint arXiv, 2023; 2307; 13106

20. Freyer N, Kempt H, Klöser L, Easy-read and large language models: On the ethical dimensions of LLM-based text simplification: Ethics and Information Technology, 2024; 26(3); 50

21. Hermann CE, Patel JM, Boyd L, Let’s chat about cervical cancer: Assessing the accuracy of ChatGPT responses to cervical cancer questions: Gynecol Oncol, 2023; 179; 164-68

22. Patel JM, Hermann CE, Growdon WB, ChatGPT accurately performs genetic counseling for gynecologic cancers: Gynecol Oncol, 2024; 183; 115-19

23. Sengupta P, Dutta S, Chakravarthi S, Comparative efficacy of ChatGPT 3.5, ChatGPT 4, and other large language models in gynecology and infertility research: Gynecol Obstet Clin Med, 2023; 3; 203-6

24. Krückel A, Brückner L, Psilopatis I, Evaluation of ChatGPT’s potential in tailoring gynecological cancer therapies: In Vivo, 2024; 38; 1649-59

25. Allahqoli L, Ghiasvand MM, Mazidimoradi A, Diagnostic and management performance of ChatGPT in obstetrics and gynecology: Gynecol Obstet Invest, 2023; 88; 310-13

26. Khromchenko K, Shaikh S, Singh M, ChatGPT-3.5 versus Google Bard: Which large language model responds best to commonly asked pregnancy questions?: Cureus, 2024; 16(7); e65543

27. Stalp JL, Denecke A, Jentschke M, Quality of ChatGPT-generated therapy recommendations for breast cancer treatment in gynecology: Curr Oncol, 2024; 31; 3845-54

28. Onder CE, Koc G, Gokbulut P, Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy: Sci Rep, 2024; 14; 243

29. Johnson CM, Bradley CS, Kenne KA, Evaluation of ChatGPT for pelvic floor surgery counseling: Urogynecology, 2024; 30; 245-50

30. Riedel M, Kaefinger K, Stuehrenberg A, ChatGPT’s performance in German OB/GYN exams – paving the way for AI-enhanced medical education and clinical practice: Front Med, 2023; 10; 1296615

31. Peled T, Sela HY, Weiss A, Evaluating the validity of ChatGPT responses on common obstetric issues: Potential clinical applications and implications: Int J Gynecol Obstet, 2024; 166; 1127-33

32. Yurtcu E, Ozvural S, Keyif B, Analyzing the performance of ChatGPT in answering inquiries about cervical cancer: Int J Gynecol Obstet, 2025; 168(2); 502-7

33. Levin G, Brezinov Y, Meyer R, Exploring the use of ChatGPT in OBGYN: A bibliometric analysis of the first ChatGPT-related publications: Arch Gynecol Obstet, 2023; 308; 1785-89

34. Bachmann M, Duta I, Mazey E, Exploring the capabilities of ChatGPT in women’s health: Obstetrics and gynecology: NPJ Women’s Health, 2024; 2; 26

35. Temel MH, Erden Y, Başgcıer F, Information quality and readability: ChatGPT’s responses to the most common questions about spinal cord injury: World Neurosurgery, 2024; 181; e1138-e44

36. Gawey L, Dagenet CB, Tran KA, Readability of information generated by ChatGPT for hidradenitis suppurativa: JMIR Dermatology, 2024; 7; e55204

37. Fahy S, Niemann M, Böhm P, Assessment of the quality and readability of information provided by ChatGPT in relation to the use of platelet-rich plasma therapy for osteoarthritis: J Pers Med, 2024; 14(5); 495

38. Roster K, Kann RB, Farabi B, Readability and health literacy scores for ChatGPT-generated dermatology public education materials: Cross-sectional analysis of sunscreen and melanoma questions: JMIR Dermatology, 2024; 7; e50163

39. Beyoğlu MM, Kaya E, Karabulut E, Assessment of the quality, readability, and usefulness of ChatGPT-generated medical information for ten common cancer types: Universal Access in the Information Society, 2024

Introduction Material and Methods Results Discussion Conclusions References

Related articles Order reprints Share article Share by email

Tables

Table 1. Detailed characteristics of expert reviewers.

Table 2. Readability analysis of ChatGPT-generated answers in Polish to CC screening questions: mean readability score and mean FOG index in the Jasnopis.pl app.

Table 3. Detailed analysis of readability of MICC_GPTs using Jasnopis.pl software.

Table 4. Ordinal regression parameters for predictors of readability.

Table 5. Expert reviewers’ overall assessment of the quality of the MICC-GPTs.

In Press

Clinical Research
Analysis of the Clinical Characteristics and Endoscopic Features of Phytobezoar-Induced Ulcers and Gastric ...

Med Sci Monit In Press; DOI: 10.12659/MSM.952191

Clinical Research
Effect of Indirect Co-Culture With Gingival Mesenchymal Stem Cells on Cytokine Secretion in Primary Oral Sq...

Med Sci Monit In Press; DOI: 10.12659/MSM.952439

Clinical Research
Comparison of Sleep Architecture in Individuals Aged 65 to 80 Years With and Without Mild Cognitive Impairm...

Med Sci Monit In Press; DOI: 10.12659/MSM.952493

Clinical Research
Effects of Single-Bout Endurance Exercise Intensity on Peripheral Neurotrophic Factors in Patients With Isc...

Med Sci Monit In Press; DOI: 10.12659/MSM.952089

Most Viewed Current Articles

17 Jan 2024 : Review article 14,176,514
Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

DOI :10.12659/MSM.942799

Med Sci Monit 2024; 30:e942799

0:00

13 Nov 2021 : Clinical Research 3,760,677
Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

DOI :10.12659/MSM.932788

Med Sci Monit 2021; 27:e932788

0:00

14 Dec 2022 : Clinical Research 2,466,264
Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

DOI :10.12659/MSM.937990

Med Sci Monit 2022; 28:e937990

0:00

16 May 2023 : Clinical Research 708,906
Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...

DOI :10.12659/MSM.940387

Med Sci Monit 2023; 29:e940387

0:00

Can ChatGPT Provide Patient-Friendly and Reliable Information on Cervical Cancer Screening? A Study of ChatGPT-Generated Information in Polish

Abstract

Introduction

Material and Methods

Results

Objective Analysis

Discussion

Conclusions

Tables

References

Tables

In Press

Clinical Research Analysis of the Clinical Characteristics and Endoscopic Features of Phytobezoar-Induced Ulcers and Gastric ...

Clinical Research Effect of Indirect Co-Culture With Gingival Mesenchymal Stem Cells on Cytokine Secretion in Primary Oral Sq...

Clinical Research Comparison of Sleep Architecture in Individuals Aged 65 to 80 Years With and Without Mild Cognitive Impairm...

Clinical Research Effects of Single-Bout Endurance Exercise Intensity on Peripheral Neurotrophic Factors in Patients With Isc...

Most Viewed Current Articles

17 Jan 2024 : Review article 14,176,514 Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

13 Nov 2021 : Clinical Research 3,760,677 Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

14 Dec 2022 : Clinical Research 2,466,264 Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

16 May 2023 : Clinical Research 708,906 Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...

Your Privacy

Clinical Research
Analysis of the Clinical Characteristics and Endoscopic Features of Phytobezoar-Induced Ulcers and Gastric ...

Clinical Research
Effect of Indirect Co-Culture With Gingival Mesenchymal Stem Cells on Cytokine Secretion in Primary Oral Sq...

Clinical Research
Comparison of Sleep Architecture in Individuals Aged 65 to 80 Years With and Without Mild Cognitive Impairm...

Clinical Research
Effects of Single-Bout Endurance Exercise Intensity on Peripheral Neurotrophic Factors in Patients With Isc...

17 Jan 2024 : Review article 14,176,514
Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

13 Nov 2021 : Clinical Research 3,760,677
Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

14 Dec 2022 : Clinical Research 2,466,264
Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

16 May 2023 : Clinical Research 708,906
Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...