What AI Hallucinations Are and Why They Matter for Lawyers
The phenomenon that threatens to undermine AI-assisted legal research—and how to protect your practice.
AI hallucinations represent one of the most significant risks in legal technology adoption. When an AI system fabricates case citations, invents judicial holdings, or misrepresents statutory language, the consequences for attorneys range from professional embarrassment to sanctions, malpractice claims, and lasting reputational damage.
Understanding the mechanics of hallucinations—and their prevalence in legal AI tools—is no longer optional knowledge. It is a core competency requirement.
Defining the Problem
An AI hallucination occurs when a large language model generates content that appears authoritative but is factually incorrect or entirely fabricated. In legal contexts, hallucinations manifest in two primary forms:
Incorrect responses: The AI describes the law incorrectly or makes factual errors about holdings, procedural history, or statutory language.
Misgrounded responses: The AI describes the law correctly but cites a source that does not actually support its claims—or does not exist at all.
Both failure modes can slip past initial review because the output appears well-reasoned and properly formatted. The citations look legitimate. The legal reasoning seems sound. The danger lies precisely in this surface-level plausibility.
The Stanford Study: Quantifying the Risk
In 2024, researchers at Stanford's RegLab and Human-Centered AI institute conducted a preregistered empirical evaluation of legal AI research tools—the first rigorous study of its kind. The findings challenged vendor marketing claims head-on.
Key results:
- Lexis+ AI hallucinated on 17% of queries
- Westlaw AI-Assisted Research hallucinated on 33% of queries—nearly double the rate
- Lexis+ AI achieved accurate responses 65% of the time
- Westlaw achieved accurate responses only 42% of the time
These figures stand in stark contrast to vendor marketing. Prior to the study, LexisNexis claimed Lexis+ AI delivered "100% hallucination-free linked legal citations." A Thomson Reuters executive asserted that retrieval-augmented generation "dramatically reduces hallucinations to nearly zero."
The Stanford researchers concluded that "the providers' claims are overstated."
For context, general-purpose LLMs like ChatGPT hallucinate on legal queries between 58% and 82% of the time. So specialized legal AI tools do offer meaningful improvement—but "better than ChatGPT" is a low bar for professional legal research.
Real-World Consequences
The Mata v. Avianca case remains the leading precedent on AI misuse in legal proceedings. In 2023, Steven Schwartz of Levidow, Levidow & Oberman submitted a brief containing six fabricated case citations generated by ChatGPT. When asked to verify the citations, ChatGPT reaffirmed their authenticity.
Judge P. Kevin Castel found the lawyers acted with "subjective bad faith" and imposed sanctions under Rule 11. He described one of the AI-generated legal analyses as "gibberish" and called it "an unprecedented circumstance."
The case count has grown substantially since then. A database maintained by HEC research fellow Damien Charlotin now documents over 300 identified instances of AI hallucinations in court filings worldwide. Notable subsequent cases include:
-
February 2024 (Massachusetts): Judge Brian Davis sanctioned attorneys for submitting AI-generated fictitious citations, warning of "the tendency of some attorneys and law firms to utilize AI in the preparation of motions, pleadings, memoranda, and other court papers, then blindly file their resulting work product in court without first checking."
-
Utah (Richard Bednar): Attorney sanctioned for submitting a brief with fake ChatGPT citations, ordered to pay attorney fees, refund client fees, and donate $1,000 to a legal nonprofit.
-
July 2025 (Johnson v. Dunn): A federal court declared that "monetary sanctions are proving ineffective at deterring false, AI-generated statements of law in legal pleadings"—signaling potential escalation in judicial responses.
Why Hallucinations Occur
Large language models are probabilistic text generators. They predict the most likely next word in a sequence based on patterns in their training data. They do not "know" facts in any meaningful sense—they generate statistically plausible continuations.
This architecture creates several failure modes in legal contexts:
Training data limitations: Models may have incomplete or outdated legal information, leading them to fill gaps with plausible-sounding fabrications.
Pattern matching over reasoning: The model may generate a citation format that looks correct without verifying that the cited case exists or holds what it claims.
Confidence without accuracy: Models cannot distinguish between high-confidence correct outputs and high-confidence incorrect outputs. Both feel equally authoritative.
Retrieval-augmented generation (RAG) limitations: Even systems that retrieve from authoritative legal databases can fail when the retrieval step returns incomplete or incorrect documents, or when the model misinterprets what it retrieves.
The Path Forward
Understanding hallucination risk does not mean abandoning AI tools. It means adopting them with appropriate safeguards:
-
Independent verification is mandatory: Every citation, quotation, and factual claim requires checking against primary sources. This is not a best practice—it is a professional obligation.
-
Understand tool-specific error rates: Different tools hallucinate at different rates and in different ways. The Stanford study provides baseline data; your own evaluation of tools in your practice areas will be more relevant.
-
Recognize that RAG is not a complete solution: Retrieval-augmented systems reduce but do not eliminate hallucinations. The 17% hallucination rate for Lexis+ AI demonstrates that even purpose-built legal research tools with proprietary databases still fail.
-
Build verification into workflow: Citation checking cannot be an afterthought. Structured verification protocols must be embedded in the research-to-filing process.
-
Maintain documentation: Record when AI tools are used and what verification steps were taken. This documentation may prove essential if questions arise later.
Key Takeaways
- AI hallucinations in legal research are not edge cases—they occur in 17-33% of queries even with specialized tools
- Vendor claims of "hallucination-free" AI have been empirically disproven
- Courts have imposed sanctions ranging from fines to fee refunds, with increasing judicial frustration at recurring violations
- Every AI-generated citation, quotation, and legal conclusion requires independent verification against primary sources
- The duty of competence now includes understanding AI limitations—this is a continuing obligation as the technology evolves
Sources
[Stanford HAI: Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools]
This preregistered empirical study tested legal AI tools from LexisNexis and Thomson Reuters, finding hallucination rates of 17-33%. The study challenged vendor marketing claims of "hallucination-free" legal AI and demonstrated that even specialized tools with retrieval-augmented generation still produce significant error rates. Read Full Study →
[Mata v. Avianca, Inc. - Federal Court Opinion]
Judge P. Kevin Castel imposed sanctions on attorneys who submitted six fabricated case citations generated by ChatGPT, finding they acted with "subjective bad faith." The case established that AI hallucinations in court filings constitute sanctionable conduct under Rule 11. Read Case Summary →
[ABA Formal Opinion 512: Generative AI Tools]
The American Bar Association's July 2024 ethics opinion establishes that lawyers using AI must understand the technology's limitations, verify all outputs, and maintain competence as the technology evolves. Uncritical reliance on AI output without appropriate verification may violate the duty of competence. Read Full Opinion →
[Journal of Empirical Legal Studies (2025): Published Study Results]
The peer-reviewed publication of the Stanford hallucination study in the Journal of Empirical Legal Studies provides the full methodology and updated findings, confirming that Westlaw's AI-Assisted Research hallucinates at nearly double the rate of Lexis+ AI. Read Published Paper →

