The FDA’s new AI assistant, “Elsa,” which was launched to improve the speed of drug approvals, is reportedly creating fake studies, raising concerns about the reliability of AI in critical roles. According to a CNN report from July 23, this tool has been found to be unreliable for important tasks, leading to increased workload for human reviewers.
This situation at the FDA’s headquarters in Maryland sharply contrasts with the previous administration’s enthusiastic support for AI technology. It underscores the dangers of using untested technology in significant government functions, a concern that is becoming more common in the tech sector.

The AI was introduced with much excitement, with HHS Secretary Robert F. Kennedy Jr. proclaiming, “The AI revolution has arrived.” However, FDA employees have expressed to CNN that Elsa often misrepresents research and demands constant oversight, which undermines its intended purpose.
FDA Commissioner Dr. Marty Makary dismissed the internal worries, stating, “I have not heard those specific concerns, but it’s optional. They don’t have to use Elsa if they don’t find it to have value.” This response, however, overlooks the crucial issue of reliability for a tool meant to enhance efficiency and expedite vital reviews.
Inside the agency, the reality appears more troubling. One employee indicated that the AI is not trustworthy for any task that requires verification, noting that it “hallucinates confidently.” Another worker expressed frustration over the additional workload, saying, “I waste a lot of extra time just due to the heightened vigilance that I have to have.”
The root of this extra work lies in Elsa’s inherent limitations. Staff members pointed out that it cannot access many important documents, such as confidential industry submissions, rendering it ineffective for the essential scientific work of evaluating drug safety and efficacy. When asked simple questions, it often provided incorrect answers.
Jeremy Walsh, the agency’s head of AI, acknowledged the technical challenges, admitting, “Elsa is no different from lots of large language models and generative AI. They could potentially hallucinate.” This issue, where AI produces confident yet entirely false information, is a significant flaw affecting many current models.
This problem is not exclusive to the FDA’s tool. Even leading commercial models face similar issues. OpenAI, for example, noted in its safety data that its newer models, o3 and o4-mini, paradoxically generate false information more frequently on some tests compared to earlier versions.
Researchers believe this occurs because models are rewarded for delivering correct final answers, prompting them to create plausible-sounding but fabricated steps to reach those conclusions. This issue is worsened when a model lacks access to its prior reasoning, leading it to invent elaborate justifications when questioned about its logic.
Experts have begun to caution against the premature deployment of this technology. Dr. Jonathan Chen, a professor at Stanford University, bluntly described the situation, stating, “It’s really kind of the Wild West right now. The technology moves so fast, it’s hard to even comprehend exactly what it is.”
His evaluation highlights the growing divide between the optimistic promises of AI in government and the dangerous reality of using untested, unreliable systems for decisions that affect public health.
The FDA’s problems are part of a larger trend of high-stakes failures in the industry. In May 2025, the legal team of AI company Anthropic had to apologize after its Claude AI fabricated a legal citation in a copyright case. The presiding judge remarked on the significant difference between a missed citation and an AI-generated hallucination.
A month earlier, users of the AI code editor Cursor encountered a support bot that invented a fake company policy, prompting a public apology from the co-founder due to user backlash.
This issue also affects consumer products. A bug in Gmail’s AI has led to incorrect translations of German emails, resulting in significant content alterations.
T-Online’s Editor-in-Chief, Florian Harms, stated, “For the journalistic reputation and credibility of serious media, such text manipulations are devastating,” emphasizing the harm to professional credibility.
These recurring failures have prompted some experts to warn against excessive reliance on AI technology. An analyst from Sauce Labs remarked after the Cursor incident that simply informing users that “this response was generated by AI” is unlikely to restore user trust.
This perspective indicates that the industry is gradually realizing that basic disclaimers and transparency measures cannot resolve the deeper issues of trust and reliability.
The series of high-profile errors is driving a broader reassessment within the industry. A study on AI in call centers found that it often created more issues for human agents than it resolved.
Analyst firm Gartner has even revised its predictions, now forecasting that half of organizations will abandon plans to reduce customer service staff in favor of AI.
The prevailing view is shifting towards a hybrid model, where AI complements rather than replaces human expertise. This practical approach acknowledges the current limitations of technology and recognizes that the costs associated with AI mistakes can outweigh the advantages of automation.
The Elsa incident coincided with the White House’s announcement of its “AI Action Plan,” which aims to promote deregulation to accelerate AI development. This push for rapid, unchecked innovation conflicts with the evidence of the technology’s unreliability.
The plan emphasizes reducing “bureaucratic red tape” and reversing previous orders on AI risk management, which could hasten the deployment of tools like Elsa, despite clear signs of their flaws.
Experts caution that rushing to implement AI without adequate oversight poses significant risks. Dr. Jonathan Chen, who studies AI in clinical settings, warned, “It’s really kind of the Wild West right now. The technology moves so fast, it’s hard to even comprehend exactly what it is.”
His assessment illustrates a landscape where technological advancements have significantly outpaced the establishment of safety protocols and regulatory frameworks.
The FDA has issued a statement acknowledging the difficulties associated with using generative AI. However, the situation with Elsa serves as a powerful reminder of the dangers linked to the hasty deployment of AI in government.
Other Stories You May Like