A Thousand Lies: How the Legal Profession Proved the Case for AI Competence

By Liga Paulina — TwinLadder

Executive Summary

Alex Blumentals has watched organisations adopt transformative technologies for thirty years. He has never seen anything like this.

What you are about to read is not a technology report. It is the story of the most comprehensively documented competence failure in any profession, anywhere in the world.

Between late 2022 and early 2026, nearly 1,000 instances have been catalogued where lawyers, barristers, solicitors, and legal professionals submitted court filings containing fabricated citations generated by artificial intelligence. The incidents span six countries, four continents, and every level of the legal profession — from solo practitioners to partners at the 42nd largest law firm in the United States.

These are not stories about bad lawyers. They are stories about a profession that adopted a technology without building the competence to use it. And they are, inadvertently, the most powerful evidence base in existence for why the European Union's Article 4 of the AI Act — requiring AI literacy before deployment — was not only right, but urgently necessary.

The legal implications are precise and severe. Every one of these cases involves violations of professional conduct rules that have existed for decades. What changed is not the rules. What changed is that a technology arrived that made it trivially easy to break them without realising you were doing so.

As our CTO Edgars observes, the technology itself was never designed to be a research tool. These systems generate plausible text. Plausibility and truth are different engineering problems. Every case in this paper flows from that single confusion.

This white paper tells the story through the lens of competence failure. It examines the human cost and the regulatory landscape from the perspective of corporate law and what compliance actually requires. Edgars explains why the machines produce convincing falsehoods, as an engineer who builds verification systems. And Alex makes sense of what this means for organisations, for the profession, and for the proactive European approach to AI governance that the American experience has now vindicated.

— Liga Paulina, March 2026

I. The Human Cost

Let me tell you about Steven Schwartz. Not as a case study. As a person.

Steven Schwartz had been practising law for over three decades when a case called Mata v. Avianca crossed his desk in 2022. Roberto Mata had been struck by a metal serving cart on an Avianca flight and had filed a personal injury claim. Schwartz needed to oppose a motion to dismiss, and he turned to a tool that millions of professionals were discovering at the same time: ChatGPT.

ChatGPT gave him case citations. They looked right. They had proper case names, docket numbers, the names of real judges. When Avianca's lawyers told the court they could not locate several of the cited cases, Schwartz did something that reveals everything about the competence gap: he went back to ChatGPT and asked whether the cases were real. ChatGPT assured him they "indeed exist" and "can be found in reputable legal databases such as LexisNexis and Westlaw."

He relayed this to the court.

Every cited case was fictitious. Judge P. Kevin Castel described the legal analysis as "gibberish." The $5,000 sanction that followed was accompanied by an order requiring Schwartz and his colleague Peter LoDuca to write individual letters of apology to each judge falsely identified as the author of a fabricated opinion.

But here is the sentence that matters most. In his testimony, Schwartz said he was "operating under the false perception that ChatGPT could not possibly be fabricating cases on its own."

A thirty-year lawyer. Not incompetent. Not lazy. Simply unaware that the tool he was using generates text probabilistically rather than retrieving information from a database. He did not know the most basic fact about the technology he was relying on for professional work.

Edgars puts it bluntly: that sentence — "could not possibly be fabricating" — is the most expensive misunderstanding in AI history. And it keeps happening. Not because people are foolish. Because the interface looks exactly like a search engine, and nobody told them it is not one.

Now consider Amir Mostafavi. In September 2025, the California Court of Appeal published its opinion in Noland v. Land of the Free explicitly as "a warning" to the profession. Mostafavi had submitted appellate briefs in an employment dispute where nearly all legal quotations were fabricated. Twenty-one of twenty-three citations contained quotations or topics that did not appear in the cited cases. Several cases did not exist at all.

When confronted, Mostafavi said he "had not been aware that generative AI frequently fabricates or hallucinates legal sources." The court fined him $10,000 and referred him to the California State Bar, then established what should have been obvious from the start: "No brief, pleading, motion, or any other paper filed in any court should contain any citations — whether provided by generative AI or any other source — that the attorney responsible for submitting the pleading has not personally read and verified."

This was the first published appellate opinion on AI-hallucinated citations in California. It should not have been necessary.

If there is one case that keeps me awake at night, it is Goldberg Segalla.

In December 2025, Cook County Judge Thomas Cushing imposed $59,500 in sanctions — the largest single AI-hallucination penalty to date — against law firm Goldberg Segalla and attorney Larry Mason. The case involved lead paint poisoning claims against the Chicago Housing Authority that had already resulted in over $32 million in judgments.

Attorney Danielle Malaty had used ChatGPT to help research a brief, which cited a fabricated Illinois Supreme Court case called "Mack v. Anderson." Investigation uncovered at least fourteen additional instances where Goldberg Segalla attorneys had fabricated quotes from decisions or misrepresented case outcomes. Mason signed the brief without checking the citations. Malaty was fired.

Here is why this case matters more than the sanction amount suggests. Danielle Malaty had previously written publicly about AI ethics in legal practice. She knew, at an intellectual level, that AI could hallucinate. She did not have a workflow that turned that knowledge into practice.

As Alex points out, this is the distinction he keeps returning to. Awareness is not competence. Knowing that AI can hallucinate, conceptually, while continuing to use it without verification — that is the gap. It is the gap between reading about nutrition and actually eating well. Between knowing you should check your blind spot and checking it every time. Competence is not knowledge. It is practice that has become habit.

And then there is Mr Dayal in Australia. In a family law matter, the solicitor tendered a list and summary of authorities that had been generated by AI, containing inaccurate citations and summaries. His explanation to the Victorian Legal Services Board was five words: "Did not fully understand how the research tool worked."

Five words that could serve as the epitaph for a thousand cases.

The Board stripped his right to practise as a principal lawyer, prohibited him from handling trust money, and imposed two years of supervised practice with quarterly reporting. It was the first regulatory sanction against an Australian lawyer for AI misuse.

These are not stories about technology failing. Technology performed exactly as designed. These are stories about humans who did not understand what they were using, and a profession that did not ensure they would.

II. Why Machines Lie

To understand every case in this paper, you need to understand a single concept — and Edgars explains it without jargon.

A large language model does not know things. It does not look up information. It does not have a database of court decisions that it searches through when you ask a legal question.

What it does is predict the next word.

Given a sequence of text, the model calculates, based on statistical patterns learned from billions of documents, which word is most likely to come next. Then it generates that word, adds it to the sequence, and predicts the next one. And the next. And the next. Millions of times per response.

This is why the output reads so well. The model has learned the patterns of legal writing — case name formatting, citation conventions, the rhythm of judicial reasoning. When you ask it for a case supporting a particular legal proposition, it generates text that looks exactly like a case citation because it has seen hundreds of thousands of case citations during training. It produces the pattern of a citation without any mechanism to ensure the citation corresponds to something real.

So when Steven Schwartz asked ChatGPT to confirm whether the cases existed, and it said yes — as Edgars explains, it was predicting that "yes, these cases exist" was the most probable response to that question, given the context. The model is not checking a database. It is generating the most statistically likely continuation of the conversation. And in a conversation where someone asks "are these cases real?", the statistically likely answer in the training data is some variation of "yes."

This is why confidence and correctness are unrelated in large language models. The model does not become less fluent when it is wrong. It does not hedge or qualify fabricated content. A hallucinated case citation reads identically to a real one because the same statistical process generated both.

The Stanford data makes this concrete.

In January 2024, Stanford's RegLab found that general-purpose large language models hallucinate on 69% to 88% of legal research queries. Not occasionally. Not in edge cases. In the overwhelming majority of responses. A lawyer using ChatGPT, Claude, or Gemini for legal research is working with a tool that fabricates content more often than it produces accurate content.

"But what about the specialised tools?" is the question that comes up most often. Fair question. In May 2024, Stanford's follow-up study tested the commercial legal AI platforms. LexisNexis Lexis+ AI hallucinated on 17% of queries. Thomson Reuters' Westlaw AI-Assisted Research reached 33%.

One in six queries from LexisNexis. One in three from Westlaw. These are the platforms that legal professionals trust as authoritative. And both vendors had made remarkable marketing claims beforehand. LexisNexis claimed "100% hallucination-free linked legal citations." A Thomson Reuters executive said retrieval-augmented generation "dramatically reduces hallucinations to nearly zero." Stanford's researchers concluded diplomatically that "the providers' claims are overstated."

Retrieval-Augmented Generation: better, not solved

The specialised legal platforms use a technique called Retrieval-Augmented Generation — RAG. Instead of relying solely on the model's training data, a RAG system first searches an actual database of real documents, retrieves relevant passages, and then feeds those passages to the language model as context for generating its response.

This is genuinely better. It grounds the model's output in real sources. But it introduces new failure modes. The retrieval step can return the wrong documents. The model can misinterpret or misquote retrieved passages. It can blend information from multiple retrieved sources in misleading ways. And it can still hallucinate details that appear nowhere in the retrieved context.

RAG reduces hallucination rates from catastrophic to merely dangerous. A February 2025 benchmarking study found Harvey Assistant achieving 94.8% accuracy on document Q&A — impressive, but that remaining 5.2% matters enormously when the output is going into a court filing.

As Alex summarises: the best tools available to the profession are wrong between one in six and one in twenty times. And the tools most people actually use are wrong more often than they are right. Edgars confirms — and the tools do not tell you which responses are the wrong ones.

III. The Acceleration

The trajectory is what makes this a crisis rather than a collection of anecdotes.

Metric	Value	Source
Documented hallucination cases globally	~979	Charlotin Database, early 2026
Cases in California alone	52+	Charlotin Database
Rate of new incidents (late 2025)	2-3 per day	Jones Walker analysis
Largest single sanction	$59,500	Goldberg Segalla / CHA (Dec. 2025)
Countries with documented cases	6+	US, UK, Canada, Australia, France, Israel
State bars issuing AI guidance	30+	Paxton AI analysis

In 2023, the problem seemed like a curiosity. Mata v. Avianca made headlines because it was novel — a lawyer sanctioned for AI-generated fake citations. Zachariah Crabill became the first attorney suspended in the United States for AI hallucinations when Colorado imposed a ninety-day suspension after he used ChatGPT to draft a motion with fictitious citations and then blamed a legal intern. These were treated as cautionary tales. Isolated incidents. They were not.

In 2024, the database grew to dozens. Stanford published its landmark studies quantifying the hallucination rates. In British Columbia, a lawyer in Zhang v. Chen was ordered to pay costs after submitting a brief citing two fabricated ChatGPT-generated cases. The geographic spread began.

By 2025, the curve went exponential. Hundreds of documented cases. Sanctions became routine. The rate reached two to three new incidents per day by autumn. Morgan & Morgan — a firm with thousands of employees — was sanctioned $5,000 after attorney Rudwin Ayala used the firm's own proprietary AI platform, MX2.law, to generate eight nonexistent cases for a products liability filing in Wyoming. The MyPillow defamation case produced another $6,000 in sanctions when attorneys Christopher Kachouroff and Jennifer DeMaster filed a motion with more than two dozen errors, including hallucinated cases.

The acceleration tells us something important, as Alex argues: this is not a problem that self-corrects. The profession has had three years of warnings, sanctions, published opinions, bar guidance, and front-page coverage. The rate is still increasing. If naming and shaming were going to solve this, it would have worked by now.

In June 2025, the High Court of England and Wales entered the picture. In Ayinde v. London Borough of Haringey, a barrister and solicitor submitted five fabricated cases in a judicial review — including one purportedly from the Court of Appeal. The barrister denied using AI. The court found her explanation "incoherent" and concluded there were only two possibilities: deliberate fabrication, or AI use with "untruthful evidence" about the source. Both counsel were referred to their regulators.

In Canada, the pattern replicated across four provinces. Ontario, Quebec, British Columbia, and Alberta all saw cases — including a self-represented litigant in Quebec fined $5,000 and an Ontario lawyer who admitted being "not comfortable with technology such as generative AI" and was "shocked" when cited cases could not be found.

January 2026 opened with the Seventh Circuit warning that AI large language models generate "output that is fictional, inaccurate or nonsensical" — extending the verification duty even to pro se litigants. In Pennsylvania, at least thirteen Commonwealth Court cases contained confirmed or suspected AI hallucinations in 2025, with a judge questioning during oral argument whether an attorney's brief contained fabricated quotations. This is particularly alarming because appellate courts set binding precedent — fabricated authorities at this level could corrupt the law itself.

Nearly a thousand cases. Six countries. Every level of the profession. And the curve is still climbing.

IV. Two Approaches to a Crisis

There are, broadly, two responses to this epidemic playing out simultaneously on opposite sides of the Atlantic, as Alex frames it. One is reactive. The other is proactive. And the evidence from nearly a thousand cases tells us which one works.

The American Approach: Sanction After the Fact

The United States has responded to AI hallucinations through its traditional mechanisms: judicial sanctions, ethics opinions, and a patchwork of state-level guidance.

In July 2024, the American Bar Association issued Formal Opinion 512 — its first formal ethics opinion on generative AI. The opinion is thorough. It addresses competence under Model Rule 1.1, communication under Rule 1.4, confidentiality under Rule 1.6, and fees under Rule 1.5. It tells lawyers they must understand the "capacity and limitations" of generative AI and periodically update that understanding.

What it does not do is tell them how. It creates an obligation without a pathway.

By December 2025, the ABA Task Force Year Two Report acknowledged that AI "is no longer an abstract or experimental technology for lawyers — it is rapidly becoming core infrastructure for law practice, courts, legal education and access-to-justice efforts." This is true. It is also an acknowledgment that the guidance came years too late.

The state-level response is a patchwork. New York requires two annual CLE credits in practical AI competency. Pennsylvania mandates AI disclosure in court submissions. California's published opinion in Noland establishes a verification duty. Over thirty states have issued various forms of AI guidance. None of it is coordinated. None of it imposes a consistent standard. And none of it prevented the acceleration from continuing after each new guidance was issued.

The fundamental problem with the American model is that it punishes after the harm has occurred. A lawyer submits fabricated citations. Opposing counsel or the court catches the error. A sanction follows — weeks or months later. The client has already been harmed. The court's time has been wasted. The justice system's integrity has been compromised. The sanction is a post-mortem, not a prevention.

And, as Edgars notes, the sanctions themselves are inconsistent. Schwartz paid $5,000. Goldberg Segalla paid $59,500. Crabill was suspended. Mostafavi was fined and referred. There is no correlation between severity and consequence — which means there is no clear deterrent signal.

The American approach assumes that rational professionals will modify their behaviour when they see others sanctioned. Three years of evidence disproves this assumption. The rate of new incidents accelerated throughout 2025 despite continuous high-profile sanctions. The profession is not failing to respond to incentives. It is failing because individual sanctions cannot solve a systemic competence gap.

The European Approach: Require Competence Before Deployment

Article 4 of the EU AI Act, effective February 2, 2025, takes a fundamentally different approach. It requires that providers and deployers of AI systems ensure their staff have "an adequate degree of AI literacy" — before the systems are used, not after something goes wrong.

The difference is structural. Article 4 does not wait for harm. It creates an affirmative obligation: if you deploy AI, you must ensure the people using it understand what it is, what it can do, and what it cannot do. The EU AI Office's guidance makes clear that this literacy must account for the person's technical knowledge, experience, education, and the context in which the AI will be used.

This is not bureaucratic overreach. It is a direct response to precisely the pattern documented in this white paper. Every single case — from Mata v. Avianca through the Pennsylvania appellate filings — involved a person who lacked adequate understanding of the AI system they were using in their professional context.

Article 4 would not have prevented every case. But it would have required every law firm, every legal department, every court, and every AI vendor serving legal professionals to ask: have we ensured our people understand this technology well enough to use it safely? The honest answer, for most of the profession, would have been no. And that answer, under Article 4, triggers an obligation to act — before the filing, before the sanction, before the harm.

Alex is precise about what he argues here. He is not arguing that European regulation is perfect, or that Article 4's language could not be clearer, or that enforcement will be seamless. He argues something simpler: the proactive approach is structurally superior to the reactive approach. The evidence is overwhelming. Three years of American sanctions have not slowed the rate of AI hallucination incidents. The curve is still climbing. If you want to prevent competence failures, you need to require competence. Punishing incompetence after the fact is not working.

Italy has gone further. Law 132/2025 made it the first EU member state to codify mandatory AI disclosure requirements for lawyers, with civil liability and fee dispute consequences for non-compliance. The UK occupies a middle ground — the Bar Council warned in January 2024 that "blind reliance on AI risks incompetence or gross negligence," and the Law Society's May 2025 guidance effectively codifies AI literacy as a baseline professional competence. But neither has the binding, systemic character of Article 4.

V. The Competence Argument

Alex steps back from the cases and the regulations to identify a deeper pattern — one that explains why the problem has been so resistant to the usual fixes.

Every documented AI hallucination case in the legal profession is, at its core, a Level 0 failure on the TwinLadder framework. The lawyer did not understand what a large language model is. Not at a technical level — nobody needs to understand transformer architectures. At a functional level. They did not understand that the tool generates text rather than retrieving information. That confidence and correctness are unrelated in the output. That verification against primary sources is not optional but essential.

This is not a hard thing to learn. It takes perhaps two hours of structured training. But it has to happen before the tool is used, and it has to be specific enough that the practitioner changes their behaviour.

The Morgan & Morgan case illustrates why policy alone fails. The firm had an AI policy. It had trained employees not to use AI in the manner that produced the sanctioned filing. Attorney Rudwin Ayala used the firm's own proprietary platform anyway, without verifying the output, and three attorneys signed the brief. Judge Kelly H. Rankin declined to sanction the firm itself, noting the policy existed. But the policy did not prevent the harm. Competence would have.

The professional conduct rules were already clear. Rule 1.1 requires competence. Rule 3.3 prohibits misleading the court. Rule 8.4 prohibits conduct prejudicial to the administration of justice. These rules have existed for decades. Not one of them prevented any of the nearly thousand cases in this paper. Rules tell you what to do. They do not teach you how.

There is a concept I keep returning to: competence debt. AI automates the entry-level tasks — document review, legal research, first-draft analysis — where junior lawyers have traditionally built the pattern recognition and judgment that allows them to catch errors. Simultaneously, it creates outputs that look authoritative and require exactly that pattern recognition to evaluate.

The profession is building a generation of practitioners who can direct AI tools but who may lack the foundational expertise to evaluate what those tools produce. The hallucination epidemic is not the disease. It is the first symptom. The disease is a growing gap between the competence the profession needs and the competence its current training pathways produce.

The TwinLadder framework maps this directly:

Level 0 — AI Literacy is where Article 4 draws the mandatory floor. Understanding what AI is, how it generates outputs, and why verification is essential. Every sanctioned lawyer in this paper failed at Level 0.

Level 1 — Professional Twin means being able to use AI within your professional domain with appropriate verification, critical evaluation, and disclosure. This is where the profession needs to reach — and where almost no current training programme takes it.

Level 2 — Operational Twin means systematically incorporating AI into workflows with governance, quality controls, and continuous improvement. A handful of firms — perhaps 10% of Am Law 100 with formal AI governance boards — are working toward this.

Level 3 — Ecosystem Twin means building organisational capability that extends beyond individual tools and adapts to the evolving technology landscape. This remains aspirational.

The solution, as Alex argues, is not more rules. It is not harsher sanctions. It is not better AI tools, though those help. The solution is building genuine competence — the kind that becomes habit, that changes how people work, that makes them instinctively verify before they trust. Article 4 is the legal framework for requiring this. The TwinLadder is a framework for delivering it.

VI. What Good Looks Like

Edgars wants to be practical here. The cases are alarming. The data is clear. But people need to know what to do. Here is what verification actually looks like when it is done properly.

Step 1: Understand the tool's capabilities and limitations. Before using any AI system for legal research, know its documented accuracy rates. If you are using a general-purpose LLM, expect the majority of legal citations to be fabricated. If you are using a specialised RAG-based tool, expect accuracy between 67% and 95% depending on the tool and task.

Step 2: Treat every AI output as a draft, not a source. AI generates hypotheses about what the law might say. Your job is to confirm or reject those hypotheses against primary sources. Every case name, every citation, every quotation, every holding must be independently verified.

Step 3: Verify against primary sources. Go to the actual reporter, the actual database, the actual statute. Read the case. Confirm the holding matches what the AI told you. Check the quotations word by word. If you cannot find the case, it probably does not exist.

Step 4: Document your verification. Keep a record of what you checked and when. This protects you if questions arise later and creates an institutional practice that others can follow.

From a compliance perspective, firms need three things. First, a written AI use policy that classifies tasks by risk level — some firms use a traffic-light system where high-risk uses like court filings require dual-lawyer verification. Second, training that goes beyond awareness to build actual verification habits. Third, disclosure protocols for when and how to inform courts and clients about AI use — Italy already requires this by statute, and it is only a matter of time before other jurisdictions follow.

The UK's Law Society guidance effectively codifies AI literacy as a professional baseline. The SRA has not issued AI-specific rules, but the existing Codes of Conduct — requiring accuracy, prohibiting misleading the court, maintaining professional standards — apply fully to AI use. The Ayinde referrals demonstrate enforcement intent.

At the organisational level, as Alex observes, good looks like treating AI competence the way we treat other critical professional capabilities — not as a one-time checkbox, but as an ongoing programme. The leading firms — 80% of Am Law 100 have established AI governance boards, some firms like White & Case have invested in privately licensed, legally trained language models rather than general-purpose tools — are building infrastructure. But governance boards exist in only about 10% of all law firms. The gap between the top of the market and the rest is vast.

The technology will improve. A February 2025 benchmarking study found some tools achieving accuracy above 94%. But even at 95% accuracy, one in twenty research queries produces fabricated content. Vendor improvements are necessary and welcome. They are not sufficient. Human verification remains the essential safeguard, and human verification requires human competence.

VII. The Case for Action

We began this paper with a number: nearly a thousand. Let me close by saying what that number means.

It means we have the most extensively documented evidence base for competence failure in any profession, in any sector, anywhere in the world. Not theoretical risk. Not projected impact. Documented cases, with names and docket numbers and sanctions and careers ended and clients harmed.

It means that the American model of reactive sanctions — punish after the harm, hope the example deters others — has been tested at scale and has failed. Three years in, the rate is still accelerating.

It means that Article 4 of the EU AI Act, which requires AI literacy before deployment, was not premature regulation. It was prescient. The legal profession's experience since 2022 is the strongest possible vindication of the proactive European approach to AI governance.

For any law firm, legal department, or professional services organisation deploying AI tools without structured competence training, the risk profile is now inescapable. Court sanctions ranging from $1,000 to $59,500 per incident. Attorney suspension or disbarment. Malpractice liability. Regulatory referrals. Reputational damage that follows a published court opinion forever. And from August 2025, EU AI Act civil liability for failing to ensure adequate AI literacy of staff.

These are not theoretical consequences. They have all occurred. Repeatedly. In multiple jurisdictions. To experienced professionals at respected institutions.

From an engineering perspective, Edgars says it plainly: the technology is not going to solve this for you. AI tools will improve. Hallucination rates will decline. RAG systems will get more accurate. But they will not reach zero. And as long as a nonzero percentage of AI outputs contain fabricated content, the human in the loop must be competent enough to catch it. Building that competence is not optional. It is the only reliable safeguard.

The legal profession did not set out to become the case study for AI competence failure. But it has become one — the most comprehensive case study imaginable. The evidence is in. Nearly a thousand data points, across six countries, spanning three years. The question is no longer whether AI competence training is necessary. The question is whether your organisation will build that competence proactively, or whether it will become the next case study.

Article 4 exists because of exactly this pattern. The TwinLadder exists to help organisations respond. And the thousand cases in between are the evidence that waiting is no longer an option.

This white paper is maintained by TwinLadder Research. Last updated March 4, 2026. For inquiries, contact research@twinladder.lv.