A team of medical researchers from the Icahn School of Medicine at Mount Sinai conducted a study on artificial intelligence (AI) chatbots, claiming that "generative large language models are autonomous practitioners of evidence-based medicine (EBM)." The study, published on arXiv, involved testing various consumer-facing large language models, including ChatGPT 3.5 and 4, Gemini Pro, LLaMA v2, and Mixtral-8x7B. The models were tasked with suggesting evidence-based medical treatments for test cases, and ChatGPT 4 achieved the highest accuracy.


Key Points:

  • Study on AI Chatbots: Researchers from the Icahn School of Medicine at Mount Sinai conducted a study on AI chatbots, testing their ability to suggest evidence-based medical treatments for various cases.

  • Large Language Models Tested: The study involved testing various large language models, including ChatGPT 3.5 and 4, Gemini Pro, LLaMA v2, and Mixtral-8x7B, to determine their effectiveness in practicing evidence-based medicine.

  • ChatGPT 4's Performance: ChatGPT 4 demonstrated the highest accuracy among the tested models, reaching 74% accuracy in all cases. The researchers concluded that large language models can function as autonomous practitioners of evidence-based medicine.

  • Mitigating Information Overload: The study suggests that large language models can help mitigate information overload for clinicians by performing tasks such as ordering tests, interpreting investigations, or issuing alarms, allowing human experts to focus on physical care.

  • Limitations and Ethical Considerations: The researchers acknowledge the limitations of large language models, including the potential for hallucinations (fabricating nonsense) and ethical considerations regarding their integration into clinical workflows.

  • Undefined Claims: The paper includes claims about the models' reasoning abilities and their potential implications for artificial general intelligence without providing clear definitions or expansions on these assertions.

Conclusion: The study conducted by researchers from the Icahn School of Medicine at Mount Sinai suggests that large language models, particularly ChatGPT 4, can function as autonomous practitioners of evidence-based medicine. However, the study raises questions about the undefined claims regarding the models' reasoning abilities and the potential benefits of general chatbots in a clinical evidence-based medicine environment. Ethical considerations and limitations, such as the risk of hallucinations, remain important factors to address in the integration of AI chatbots into medical practice.


(TRISTAN GREENE, COINTELEGRAPH, 2023)