Hospitals Implement Transcription Tool Utilizing OpenAI Model Prone to Hallucinations

A few months ago, my doctor demonstrated an AI transcription tool used to record and summarize patient meetings. While my summary was satisfactory, researchers highlighted serious concerns about OpenAI’s Whisper, the technology behind many of these tools. Reports indicate that Whisper can sometimes fabricate information entirely.

Nabla, a company that utilizes Whisper for its medical transcription tool, claims to have transcribed around 7 million medical conversations, serving over 30,000 clinicians and 40 health systems. They are reportedly aware of Whisper’s tendency to "hallucinate" and are actively working to address the issue.

A study conducted by researchers from Cornell University, the University of Washington, and other institutions revealed that Whisper generated hallucinations in about 1% of transcriptions. These hallucinations included sentences with inappropriate or nonsensical content, often occurring during silences in recordings. This phenomenon is especially relevant when transcribing speech from individuals with aphasia, a language disorder characterized by interrupted communication.

Allison Koenecke from Cornell University shared examples of these inaccuracies, which included fabricated medical terms and phrases that might typically be found in online videos, such as “Thank you for watching!” OpenAI has previously utilized extensive YouTube content to train its models.

The findings of this study were presented at the Association for Computing Machinery FAccT conference in Brazil in June, although it is unclear if the research has undergone peer review.

In response to these concerns, OpenAI spokesperson Taya Christianson stated that the organization is committed to reducing hallucinations and improving their technology. They have established usage policies prohibiting Whisper’s application in high-stakes decision-making contexts and have provided guidance to avoid its use in high-risk domains.

This highlights the ongoing challenges in ensuring AI reliability, particularly in sensitive areas like healthcare.

Most people like

Find AI tools in YBX