The FCA has quietly changed the question. For two years it asked firms whether they had implemented the Consumer Duty. In 2026 it is asking something harder: are customers genuinely better off, and can you prove it? The regulator has moved to a supervision-led posture that intervenes earlier and expects boards to articulate, in their own words, how their firm delivers good outcomes. Treating the Duty as a paper exercise now carries consequences, not a follow-up letter.
That shift matters because it turns the Consumer Duty from a policy document into an operating loop: test a proposition before you launch it, learn from the outcomes once it is in customers’ hands, and evidence both to a board that can be held to account. Most banks can do parts of that loop. Very few run it as one motion. And two changes now arriving, agents that act for customers and money that moves as tokens, are about to make the loop both harder to run and more important to get right.
The loop the Duty actually asks for
Read past the language and the Duty describes a cycle, not a checklist.
Test. The FCA expects firms to test their products and their important communications against the four outcomes, with effort proportionate to their size and role. In its consumer understanding work it points to comprehension checks, A/B testing, surveys and frontline feedback, with randomised controlled trials as the rigorous end of the range, and it now expects a behavioural lens rather than a tick-box read of whether a disclosure was sent.
Learn. Once a product is in the market, firms are expected to find where consumers actually get stuck, drawing on call listening, complaints, chat transcripts, website analytics, drop-off data and surveys, and to monitor outcomes for different groups of customers, including vulnerable ones.
Evidence. Then they have to hold the records, the management information and the board reporting that show not just what the firm did but whether outcomes were good, and whether a change made to fix a poor outcome actually worked.
That last clause is where the loop usually breaks. In its reviews of board reports the FCA has repeatedly found firms leaning on repackaged data and process-completion metrics, and unable to demonstrate how improvements to outcomes were monitored after a change was made. The loop is meant to be one continuous thing. Most banks run it as three disconnected activities, by three different teams, at three different times.
How banks test today, and where it strains
Today the testing end of the loop is usually late and detached. A comprehension test runs on a near-final letter. A usability lab runs on a near-final journey. An A/B test runs on an email subject line. The work is often good, and the FCA has praised firms that improved understanding with clearer FAQs and explainer videos. But it is bolted onto finished assets, sampled, and separated from the decision the customer will actually face.
The learning end is lagging and process-shaped: dashboards that count steps completed, complaints that arrive after harm has landed, and red-amber-green ratings that, as the FCA found, can stay green while the underlying trend deteriorates. The evidencing end is reconstruction, a board report assembled by hand from data that was never built to answer the question.
The net effect is that a bank tends to learn an outcome is poor only after the product is built, when the cost of change is highest. The loop runs slowly and late, at exactly the moment the regulator wants it fast and early.
Agents change what you are testing
Now add an agent to the journey. When software acts on a customer’s behalf, the unit of the test changes. A comprehension test on a letter cannot tell you whether a customer understood what an agent did for them, whether they meant to allow it, or whether they could step in before it completed. The outcome surface moves from the screen the customer reads to the action the agent takes.
That is genuinely new, and it is hard to test with a focus group looking at static assets. It needs a journey that actually runs the agent under controls, so you can watch a real person meet an agent’s action and see whether they understood it and could stop it. The Consumer Duty questions do not go away when an agent appears; they move to where the agent acts, and most testing methods cannot follow them there.
This is the gap our Customer Outcomes proving ground is built to close, and it is why the foundation matters: our Family Wealth demonstration already runs an agent that proposes and a human who must approve before any material action, with the evidence recorded at each step. That is the Capability, Context and Consent discipline in a working demonstration on synthetic data. The structured outcome capture on top of it is, honestly, a direction we can take that prototype rather than a product we are claiming to ship. The point stands either way: you cannot test an agentic outcome on a PDF.
Tokens change what is at stake
Then there is the money itself. Value is starting to move as tokens. In May 2026 the FCA and the Bank of England set out a shared vision for tokenisation in UK wholesale markets. The FCA has selected firms to test stablecoin products in its regulatory sandbox, with a UK regime expected to follow rather than already in force. Tokenised sterling deposits are in industry pilots. None of this is settled retail law yet, and it should not be described as if it were, but the direction is set.
For the Consumer Duty, the interesting properties of tokenised money are speed, finality and programmability. Tokenised value tends to settle quickly and with strong finality; legal finality is the phrase that recurs through tokenised-deposit roadmaps. And it is programmable: a smart contract can move it when conditions are met. That reshapes the outcome questions. Can a customer understand a product whose money is programmable and whose settlement may be hard to reverse? Can they dispute, unwind or exit? What happens to a vulnerable customer when a payment is final the instant it is made?
There is a neat symmetry worth noticing. Programmable money is described in terms of who may initiate, approve, settle, reverse or dispute a movement, under thresholds and rules. That is consent and control, the same grammar an agent needs. The thing that makes tokenised money risky for a customer and the thing that makes it safe are the same primitive: an explicit, evidenced control over what may happen and who may stop it.
Agent plus token is the real frontier
The sharp case is the combination. Picture an agent moving tokenised value for a customer, quickly, programmatically, and with strong finality. That compresses the moment where a customer would understand, consent and be able to intervene into something close to instant, and possibly irreversible. It is precisely the kind of outcome the Consumer Duty exists to protect, and precisely the kind a letter-comprehension test will never reach.
Which is why the test-learn-evidence loop has to move to where the action happens. You cannot assess these outcomes from the outside, after the fact, on sampled assets. You assess them on a prototype that runs the agent and, in time, the tokenised rail under controls, with the metrics and the evidence built in, before the bank commits to build and launch.
Fold the loop into one motion, before launch
The answer is not a fourth testing team. It is to run test, learn and evidence as a single loop, earlier, on a prototype that already has the agent, the controls and the metrics in it, and to generate the evidence as you test rather than reconstruct it for a board afterwards. Define the outcome a proposition should produce and the groups it must hold for; put real people through the agentic journey; capture where understanding and consent break down; and let the evidence pack fall out of the run. That is evidence by construction, the same discipline that runs through our work on governed data and C3.
A short honesty note, because it is how we work. The prototype is a working platform, the same one running the demonstrations on this site, with its agent actions, approval gates and recorded evidence on synthetic data. It is a proving ground by design: it runs with test consumers, never with real customers, and it is not a production system. The structured outcome capture is configured, and where needed extended, for the proposition under test, and any tokenised rail is something we would add for an engagement. What a client takes away is a working specification, ready for whatever vendor and build they choose to go to market with. The tokenisation references above are to a developing regime, attributed to the FCA and the Bank of England, not a claim about live retail products. Consumer Duty is an FCA framework; we help firms test and evidence good outcomes, we do not certify compliance.
Where this leaves a board
The FCA has made its move. It wants outcomes, evidenced, and it is supervising for them now rather than waiting for harm to surface. Agents and tokens are arriving into exactly that environment, and they will make outcomes harder to predict and evidence harder to fake. A poor outcome from an agent acting on tokenised money is faster, more final, and less visible than a confusing letter ever was.
The banks that come through this well will be the ones that can test, learn and evidence in one motion, before they build, on a prototype that already has the agent and the controls in it. That is the cheapest place to find a poor outcome. It is also the only place to find it before a customer does.
Sources and attribution: FCA, Consumer Duty focus areas; FCA, consumer understanding good and poor practice; FCA, board reports good practice and areas for improvement; Bank of England and FCA, shared vision for tokenisation (May 2026); FCA, stablecoin regulatory sandbox. Tokenisation and stablecoin rules are a developing regime and are described as proposed, not settled law. Consumer Duty is an FCA framework; our role is to help firms test and evidence good outcomes, not to certify compliance.