"57 Million Records, Zero Governance" -- When Ambition Outpaces Accountability

TwinLadder Casebook Series | TwinLadder | February 2026

The Hook

Sometime in 2024, a GP in England received a briefing about an AI model called Foresight. The model could predict patient health trajectories -- anticipating hospitalisations, identifying candidates for early intervention, mapping the probable course of chronic disease. It was, by any measure, an impressive piece of medical AI. It was also trained on GP data from their practice. Data from fifty-seven million patients across the country. Data that had been collected during the pandemic for COVID-19 research purposes.

Nobody had told them.

The GP had not been consulted about the use of their patients' data for AI training. The patients had not been informed. The professional bodies that represent general practitioners -- the British Medical Association and the Royal College of General Practitioners -- had not been made aware that COVID-era data collected under emergency provisions was being repurposed to train a generative AI model. The data sharing agreement existed. The governance framework did not.

In June 2025, the Joint General Practice IT Committee wrote to NHS England with four words that would pause the most ambitious public health AI project in British history: "serious concerns about lawfulness."

The Story

Foresight: The Promise and the Gap

Foresight was conceived as a foundation model for the National Health Service. Developed by researchers at University College London and King's College London under a data sharing agreement brokered by the British Heart Foundation Data Science Centre, the model was designed to function as an auto-complete system for medical timelines. Feed it a patient's history, and it predicts what happens next -- not with the crude pattern-matching of a decision tree, but with the probabilistic sophistication of a large language model trained on decades of clinical records.

The technical ambition was legitimate. The NHS holds one of the richest longitudinal health datasets in the world. The potential for earlier diagnosis, targeted intervention, and resource allocation guided by predictive intelligence is enormous. NHS England described Foresight as "ground-breaking" and positioned it as a centrepiece of the country's digital health strategy.

The data pipeline, however, told a different story. The fifty-seven million patient records used to train Foresight had been collected through the General Practice Data for Planning and Research (GPDPR) programme. That programme had itself been controversial -- paused in 2021 after public outcry over insufficient transparency. When the data flowed again, it did so under a data sharing agreement that the GP Profession Advisory Group had reviewed. But the agreement covered the general data-sharing framework, not the specific application to AI model training. Doctors who reviewed the general agreement were not specifically consulted about Foresight. The distinction between reviewing a framework and consenting to a specific use case is precisely where governance lives -- and precisely where it was absent.

The Joint GP IT Committee, representing both the BMA and RCGP, did not object to the concept of predictive health AI. It objected to the process. The committee stated that it had "serious concerns about the lawfulness of the data use for this project" and highlighted the "apparent absence of strict governance arrangements." It asked NHS England to refer itself to the Information Commissioner's Office so that the full circumstances could be understood. NHS England agreed to pause the project while its data protection officer conducted a review to confirm that GDPR principles were being upheld.

The pause was not a rejection of AI. It was a recognition that fifty-seven million people's medical histories had entered an AI training pipeline without the governance infrastructure that such a use demands.

Oracle Health in Sweden: When Data Corrupts

The NHS was not the only public health system discovering that digital ambition without governance infrastructure produces failure. In November 2024, the Vastra Gotaland region of Sweden -- responsible for healthcare across 1.7 million residents -- launched Oracle Health's Millennium electronic health records system. The project carried a budget of 2.1 billion Swedish kronor, approximately 190 million US dollars. It lasted three days.

Within seventy-two hours of going live, healthcare staff reported that the system was producing transcription errors in patient records. Words were missing from clinical notes. The integrity of the health record -- the single most critical document in patient care -- was compromised. Staff protested that patient safety could no longer be guaranteed. The system was halted, and the region reverted to its legacy infrastructure.

Sweden's Medical Products Agency opened an investigation. The regional authority filed an initial report with IMY, Sweden's data protection authority, acknowledging that the incidents involved personal data. The 190 million dollar deployment was permanently abandoned.

The Oracle Health failure in Vastra Gotaland was not an AI story in the narrow sense. It was a data governance story. The system failed not because the technology was incapable, but because the deployment lacked the governance mechanisms to ensure data integrity at scale. The most expensive health records system ever deployed in Scandinavia could not guarantee that a patient's record would contain the words their clinician had written.

The Chest Diagnostics Rollout: Governance at the Last Mile

Back in England, a parallel story was unfolding in diagnostic imaging. In 2023, NHS England launched a twenty-one million pound programme to deploy AI diagnostic tools across sixty-six hospital trusts, organised into twelve imaging networks. The tools were designed to prioritise urgent chest scan cases, highlight abnormalities for radiologist review, and support earlier detection of lung cancer.

The technology worked. The governance did not scale.

A study published in The Lancet's eClinicalMedicine in September 2025 -- one of the first rigorous evaluations of large-scale AI deployment outside a laboratory setting -- found that eighteen months after contracting was supposed to be complete, a third of the participating trusts were still not using the AI tools in clinical practice. Twenty-three of sixty-six trusts had the technology available but had not integrated it into their diagnostic workflows.

The barriers were not technical. Contracting had taken four to ten months longer than anticipated. Clinical staff, already operating under severe workload pressure, could not find time to engage with the selection process, support integration with local IT systems, or obtain local governance approvals. Some trusts were overwhelmed by the volume of highly technical documentation required to evaluate and approve the tools. Integration with ageing, heterogeneous IT infrastructure slowed progress further.

The study revealed a pattern that the Foresight case illuminated from a different angle: the NHS had the ambition and the funding to deploy AI at national scale, but the governance capacity at the point of implementation -- the individual trust, the individual clinician, the individual IT system -- had not been built. The technology arrived. The organisational competence to receive it had not.

Meanwhile, the safety frameworks meant to govern these deployments had not kept pace. Clinical risk standards DCB0129 and DCB0160, the regulatory backbone for digital health technology safety in the NHS, were designed for predictable, fixed-logic software -- systems that behave consistently and produce deterministic outputs. Modern AI systems adapt over time. They change their outputs based on prior input. They are probabilistic, not deterministic. A cross-sectional study published in the Journal of Medical Internet Research found that only 25.6 percent of deployed digital health technologies in the NHS were fully assured against these standards. The frameworks built for static software were being applied to dynamic AI, and the gap was growing.

Through the TwinLadder Lens

The TwinLadder framework defines four progressive levels of AI competence. Level 0 is the AI Literacy Foundation -- the baseline ability to critically evaluate what AI produces, to understand where it comes from, and to assess whether its outputs can be trusted. Level 1 is the Professional Twin. Level 2 is the Operational Twin. Level 3 is the Ecosystem Twin. Each level builds on the one below. The ladder is climbed, not skipped.

The NHS Foresight case is a Level 0 failure of the most fundamental kind.

Level 0, as described in the framework, means the baseline ability to critically assess AI-generated output. But it also means something more basic: knowing what data you have, where it came from, who consented to its use, and under what conditions it may be processed. Data governance is not a separate concern from AI literacy. It is the substrate on which AI literacy is built. You cannot critically evaluate AI output if you do not know what the AI was trained on. You cannot assess the reliability of a prediction if you cannot trace the provenance of the data that produced it.

The NHS had the data volume to build a world-class predictive health model. Fifty-seven million longitudinal patient records represent a dataset of extraordinary richness. What the NHS did not have was the governance infrastructure to ensure that the data's journey from clinical encounter to AI training set was transparent, lawful, and accountable. The data sharing agreement existed. The specific consent for this specific use did not. The professional bodies that should have been consulted were not. The patients whose records were processed were not informed.

This is what it means to attempt AI deployment without climbing the ladder. No amount of algorithmic sophistication compensates for ungoverned data. A model trained on fifty-seven million records is not more trustworthy than a model trained on fifty-seven thousand if neither can demonstrate that the data was collected, shared, and processed within a governance framework that patients, clinicians, and regulators can trust. The predictive power is irrelevant if the foundation is compromised.

The chest diagnostics rollout tells the same story from a different vantage point. The AI tools were available. The trusts that could not deploy them were not lacking technology. They were lacking the governance capacity -- the staff time, the procurement expertise, the IT integration capability, the local approval processes -- to receive the technology responsibly. The ladder cannot be climbed if the ground floor has not been built.

The TwinLadder framework predicts this outcome. An organisation that attempts to operate at Level 2 or Level 3 -- deploying operational AI across dozens of hospitals, training ecosystem-level models on national datasets -- without establishing Level 0 governance will produce exactly what the NHS produced: ambitious technology that stalls, pauses, or fails when it encounters the governance questions it should have answered before deployment began.

The Pattern

The NHS is not an isolated case. It is the most visible instance of a pattern that recurs across public sector AI deployments in Europe: ambitious technology, insufficient governance, and overworked staff who cannot engage with either.

In Sweden, beyond the Oracle Health debacle, the national Social Insurance Agency (Forsakringskassan) was found to be using a machine learning system that disproportionately flagged women, people with foreign backgrounds, low-income earners, and individuals without university degrees for fraud investigations. The system was suspended amid an investigation. In the United Kingdom, Amnesty International and Big Brother Watch documented that the Department for Work and Pensions' Universal Credit Advances model displayed statistically significant bias across age, nationality, relationship status, and reported illness -- while the DWP shielded its AI deployments from public scrutiny.

The pattern has three consistent elements. First, the data exists and the technical capability to process it exists, creating a gravitational pull toward deployment. Second, the governance infrastructure -- consent frameworks, bias auditing, transparency mechanisms, staff training -- does not exist at the same level of maturity. Third, the people who must implement, oversee, and be affected by the system are either too overworked to engage or too excluded to object.

The European Union's AI Act, which entered into force in August 2024 and will be fully applicable by August 2026, was designed to address precisely this pattern. The Act classifies AI systems used in public services, healthcare, and welfare as high-risk, imposing requirements for data governance, transparency, human oversight, and bias monitoring. The European Health Data Space, with phased implementation beginning in March 2026, introduces further obligations for interoperability, access controls, and detailed logging of electronic health records.

But regulation alone does not build competence. As of mid-September 2025, approximately a third of EU member states had not yet designated the national competent authorities required by the Act. The regulation exists. The institutional capacity to implement it is still under construction. This is the governance gap at continental scale -- and it mirrors the trust-level gap that stalled the NHS chest diagnostics rollout.

The lesson from every case in this pattern is the same. The technology is not the bottleneck. The governance is the bottleneck. And governance is not built by purchasing software or passing legislation. It is built by developing the organisational competence to know what data exists, where it came from, who has access to it, and whether its use is lawful, ethical, and transparent. That competence is Level 0. Without it, the ladder cannot be climbed.

The Lesson

Governance is not bureaucracy. It is not the paperwork that slows innovation down. It is the competence to know what data you have, where it came from, who consented to its use, and what happens when it moves from one purpose to another. It is the institutional capacity to answer, at any moment, the question: "Should we be doing this, and can we demonstrate why?"

The NHS did not lack data. It did not lack technical talent. It did not lack ambition. It lacked the governance infrastructure to ensure that fifty-seven million patients' records moved through a pipeline that was transparent, accountable, and lawful. The Oracle Health deployment in Sweden did not lack funding. It lacked the governance mechanisms to ensure that data integrity was maintained at the point of implementation. The chest diagnostics programme did not lack technology. It lacked the governance capacity at the trust level to receive that technology responsibly.

Every organisation considering AI deployment should begin not with the algorithm but with the inventory. What data do you hold? Under what legal basis was it collected? Who consented, and to what? What happens when the purpose changes? Who is responsible for answering these questions, and do they have the authority and resources to do so?

Start with the inventory, not the algorithm. Build the governance before the model. Establish Level 0 before reaching for Level 3. The ladder is climbed, not skipped -- and the first rung is knowing what you stand on.

Monday Morning Question: If a regulator asked you today to demonstrate the complete provenance chain for every dataset feeding your AI systems -- from original collection to current use, including every change of purpose along the way -- could you produce that documentation by Friday?

Sources

Digital Health Net, "AI project to predict health outcomes paused over GP data concern," June 2025. https://www.digitalhealth.net/2025/06/ai-project-to-predict-health-outcomes-paused-over-gp-data-concern/
The Lancet eClinicalMedicine, "Procurement and early deployment of artificial intelligence tools for chest diagnostics in NHS services in England: a rapid, mixed method evaluation," September 2025. https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(25)00414-6/fulltext
The Register, "Authorities probe Oracle Cerner project in Sweden," November 2024. https://www.theregister.com/2024/11/27/oracle_cerner_project/
Digital Health Net, "We need to act fast to close the NHS AI safety gap," August 2025. https://www.digitalhealth.net/2025/08/we-need-to-act-fast-to-close-the-nhs-ai-safety-gap/
Amnesty International, "UK: Government's unchecked use of tech and AI systems leading to exclusion of people with disabilities and other marginalized groups," July 2025. https://www.amnesty.org/en/latest/news/2025/07/uk-governments-unchecked-use-of-tech-and-ai-systems-leading-to-exclusion-of-people-with-disabilities-and-other-marginalized-groups/
Computer Weekly, "Swedish welfare authorities suspend 'discriminatory' AI model," 2025. https://www.computerweekly.com/news/366634703/Swedish-welfare-authorities-suspend-discriminatory-AI-model
Journal of Medical Internet Research, "Digital Health Technology Compliance With Clinical Safety Standards In the National Health Service in England: National Cross-Sectional Study," 2025. https://www.jmir.org/2025/1/e80076
BMA, "Tech adoption poses risk to NHS," 2025. https://www.bma.org.uk/news-and-opinion/tech-adoption-poses-risk-to-nhs
Pulse Today, "NHS England pauses 'ground-breaking' AI project following GP data concerns," June 2025. https://www.pulsetoday.co.uk/news/technology/nhs-england-pauses-ground-breaking-ai-project-following-gp-data-concerns/