Zero-copy data, and the bank spending €2 million a year on moving data around

· Carl Heaton · Infrastructure Commentary

BNP Paribas operates in 64 countries and spends, on its own estimate, up to €2 million a year on data copying, transformation, and reconciliation. The bank told The Stack on 27 May that adding a new data source to its analytics estate had been taking over a year, gated by approvals, mainframe constraints, and the slow process of building yet another copy of the data into yet another store. The fix was to stop copying.

The piece of news is small, but the lesson it carries is one that almost every SME with more than one business system runs into. The data sits in different places, the systems that need it run against different copies, and the cost of keeping the copies consistent grows non-linearly with the number of systems. By the time the firm is large enough to feel it, the cost is real and the rework is hard.

This filing covers what BNP Paribas actually did, why "zero copy" is a useful idea that is not a product, and what an SME can take from it.

What BNP Paribas did

The bank's transformation strategist Bruno Micaelli described the new approach as treating third-party data as "a strategic control point" rather than as input to be ingested. The execution partner is Denodo, a data-virtualisation vendor that has been quietly making the case for not-copying for over a decade. The shift was motivated by BCBS 239, the Basel Committee's data-aggregation standards for banks, plus the broader regulatory load of anti-money-laundering, know-your-customer, and risk-management reporting.

The architectural problem was familiar to anyone who has worked in a bank's data team. A mainframe core, layered with Java, a series of internal data warehouses, and an analytics estate that pulled copies of the same data into different tools for different teams. Each new data source required a new copy. Each new copy needed reconciliation against the source. Each reconciliation needed people, processes, and approvals. By the bank's own description, this had become "high cost, slow approval, and weak data quality".

The zero-copy idea is that consumers of data, dashboards, reports, AI models, query the data in place rather than ingest a copy. A virtualisation layer presents a federated view: the consumer queries one logical schema, the layer translates the query and sends it to the systems that actually hold the data, then combines the results. The data does not move. The query does.

This is not magic, and the trade-offs are real. Federated queries are slower than queries against a local copy. Joining data across systems requires network round-trips. Some queries are not possible without materialising intermediate results. The virtualisation vendor's job is to hide as much of this as possible; the architect's job is to know when the abstraction will break.

What BNP Paribas gets back is governance. The data lives in one authoritative place. The lineage is industrialised, meaning the question "where did this number come from" has a machine-answerable response. The quality controls run against the source rather than against twenty copies that have drifted. And, importantly for an AI project that we covered in the recent your staff are using AI, you're paying twice filing, the bank gets a clean, current dataset to feed into the AI tools rather than the year-old extract that was sitting in someone's notebook.

Why this is an SME idea, not just a bank idea

The BCBS 239 piece is bank-specific. The underlying problem is not.

Most SMEs run more than one business system within a year of being founded. The accounting tool, the CRM, the email marketing platform, the e-commerce store, the helpdesk, the project management tool. Each system holds an overlapping subset of the same customer, product, and transaction data. Most SMEs reconcile these by hand, periodically, when somebody notices that the numbers do not add up.

The cost of that hand-reconciliation is real but mostly invisible. It shows up as the finance director spending the last week of the month rebuilding the management accounts from CSV exports. It shows up as the sales team asking the support team for "the real customer list" because their CRM is wrong. It shows up as a bad number reaching the board pack, somebody noticing, and three people spending a day finding where the bad number came from. None of these are catastrophic. All of them sum to a meaningful tax on the business.

A small business does not need a data-virtualisation vendor to fix this. It needs the same insight BNP Paribas had: stop copying the data, and start treating the question "where does this number come from" as a first-class concern.

What a small business can copy from this

Three small moves get a non-bank-sized fraction of the benefit.

Identify the system of record for each piece of data, and write it down. This is the cheapest, most useful step. For each kind of data the business holds, customer master records, product master, transaction records, financial transactions, marketing consent, employee records, name the one system that is the source of truth. Every other system is a downstream consumer. The list takes an afternoon. The discipline lasts for years.

Push data out of the source, do not pull it back. When the CRM is the source for customer records, the accounting tool should subscribe to changes from the CRM, not periodically pull a refresh. The webhook-and-sync model that most modern SaaS supports is the small-business equivalent of BNP Paribas's federation. The tools that automate this are mature and cheap: Zapier, Make, Tray.io, Workato, n8n. The cost is somewhere between £20 and a few hundred pounds per month, depending on the volume.

Make the management dashboard query the live source, not a CSV. Most SME dashboards are built on a weekly export to a Google Sheet, with the inevitable consequence that the dashboard is out of date the moment somebody changes a record. The modern alternative is a tool like Metabase, Looker Studio, or PowerBI connected directly to the source database or API. The setup cost is real, half a day to a few days, but the dashboard is then always current and the question "is this the right number" stops being asked.

A fourth, larger piece is worth flagging. If your business runs on multiple systems and you cannot decide which one is the source of truth for a particular data point, that is not a tooling problem. It is a process problem. The right answer is for the owner-manager to decide, write it down, and tell every system owner what was decided. Tools cannot fix a missing decision.

Why the bigger story matters

The deeper reason zero-copy is getting attention now is that the data load on businesses has grown faster than the systems that hold it. Twenty years ago a bank had a few canonical systems and a data warehouse. Now it has hundreds of SaaS tools, dozens of cloud services, and a long tail of analytics platforms. The copying-and-reconciling cost grew with the number of pairs of systems, which grew non-linearly. A move to zero-copy is partly an admission that the old model has hit its limit.

For an SME, the same dynamic plays out on a smaller scale. The five tools the firm started with become twenty within a few years. The reconciliation cost grows from "someone deals with it" to "we hired a finance manager partly to deal with it" to "we are losing money to bad data". The right time to start managing the problem is before it becomes someone's full-time job.

The BNP Paribas piece is useful because it puts a number on the cost at scale: €2 million a year, just on moving the data around. The number for an SME is smaller in absolute terms and similar as a percentage of the bottom line. The insight is to spend a fraction of it now on the discipline of "one source per data type, push not pull, dashboards from the source" rather than to discover the cost in five years when reconciliation has become the work of two full-time people.

Stop copying the data. Or at least, copy it on purpose.

How Steelwise can help

Identifying the systems of record across the business, designing the push-not-pull integration, and replacing the weekly CSV dashboards with live ones is the kind of practical work we do with clients. Get in touch.

Further reading

← All filings