Getting a paper published in Science is a highlight of many researchers’ careers. But for internist and clinical artificial intelligence researcher Adam Rodman, it’s also been a source of some agita.
On Thursday, Rodman and his colleagues published a compilation of experiments, including one using real-world data from a Boston emergency department, that show a large language model from OpenAI can outperform physicians in case-based diagnostic and clinical reasoning evaluations. To Rodman, the paper’s co-senior author, it’s a response to a gauntlet thrown down in Science in 1959. That paper “described how you would know that a clinical decision support system was capable of doing diagnosis better than humans,” he said. “And they can do it.”
But as generative AI tools like chatbots are heavily marketed — both to patients and clinicians — it makes him worried that the science experiments, all based on simulated and historical cases, will be misconstrued as proof of AI’s safety and efficacy when used to treat real patients.


