Study: Human notes best AI scribes across the board


April 17, 2026

3 min read

Key takeaways:

  • Notes taken by humans were rated higher than AI scribe tools in all five cases studied.
  • AI can draft documentation, but it requires thorough editing and should not substitute notes from clinicians.

SAN FRANCISCO — Human-generated notes were deemed higher-quality and preferable to AI scribes in a new study that debuted today at the annual American College of Physicians meeting.

Ashok Reddy, MD, MS, an associate professor of medicine at the University of Washington School of Medicine and a health services researcher at U.S. Department of Veterans Affairs Puget Sound, presented the findings in ACP’s plenary session.



artificial intelligence

Human-generated notes were deemed higher-quality than and preferable to AI scribes in a new study. Image: Adobe Stock

“We all know the problem. Every one of us has probably stayed late at the clinic — or, more likely, headed home to finish our notes at night. Ambient AI scribes promise to help,” Reddy said. “These tools have rolled out fast outside of the VA and show promise in reducing documentation time. However, almost every study done has focused on a single vendor or adopting institution, with little focus on the quality of the note.”

So, Reddy and colleagues conducted a cross-sectional evaluation of notes that were generated from five invented clinical cases likely to appear in primary care and compared notes from 18 human notetakers and 11 AI scribe tools. Thirty human raters evaluated the cases with the modified Physician Documentation Quality Instrument (PDQI-9), which operates on a 5-point Likert scale and measures 10 areas of note quality, scoring each case on a maximum of 50 total points.

The researchers found that, in all five cases, notes from humans received higher overall modified PDQI-9 scores than notes that were AI-generated. Three were statistically significant. They saw the biggest difference in the acute low-back pain case, where human responses scored 43.8 (95% CI, 37.4-50.3) and AI scored 20.3 (95% CI, 15.4-25.2), with a difference of 23.5 points (95% CI, 29.2 to 17.9).

The pooled domain analysis also revealed that AI scored lower across all 10 measures, but the largest deficits related to usefulness (1.03 [95% CI, 1.61- to .44]), organization (1.06 [95% CI, 1.65 to 0.47]) and thoroughness (1.23 [95% CI, 1.82 to 0.65]).

“As we continue to deploy these tools in clinical practice, we should be doing more quality evaluations in different clinical environments to better understand the tool’s benefits and its limitations,” Reddy said.

The researchers wrote that, even though AI scribes “hold promise for reducing clinical burden, independent, vendor-neutral evaluations of note quality are essential before large-scale clinical deployment.”

“AI scribes should be regarded as tools for generating draft documentation that requires review and editing, rather than as a substitute for clinician-authored notes,” Reddy and colleagues concluded. “Rigorous and ongoing evaluation of [AI note] quality is essential to ensure that these tools enhance rather than compromise the quality of clinical care.”

Reddy also commented on the concern of clinician de-skilling, noting a similarity in the way that many newer physicians now struggle to communicate without technology in a crisis.

“We have to think of [AI scribes] as a tool that we use and really identify its limitations when we use it. Without reviewing the note, I do think there will be a degradation in quality of the assessments and plans,” Reddy said. “We really need to think of it as a tool where we can identify how do we use this to critically review the assessment and plan, synthesize the information, and really understand its limitations in terms of biases or errors or hallucinations.”

In a related editorial, Aaron A. Tierney, PhD, a collaborative scientist at Kaiser Permanente Division of Research, and Kristine Lee, MD, associate executive director of virtual medicine, technology and innovation at the Permanente Medical group and an internal medicine physician in the San Francisco area, wrote that future analyses of clinical note quality should include patients, especially considering “that as many as two-thirds of patients reference clinical documentation in the form of after-visit summaries to facilitate their self-care and remember next steps to manage their health after their clinical visit.”

“Regardless of the future of documentation, it is imperative that we ensure that AI-assisted notes are digestible for patients, are written at the appropriate reading level, and contain the information essential to patients,” Tierney and Lee wrote. “If these documents do not become more patient-centered, we could face a future in which clinicians and patients each have their own AI scribes, which would likely impair trust building in clinicians, the health care system and health AI tools.”



<

Leave a Reply

Your email address will not be published. Required fields are marked *