Article

The TSB Mainframe Migration Disaster: Planning for Guaranteed Failure

A call to action for organizations to request a demo of Nomain's platform.

The TSB Mainframe Migration Disaster: Planning for Guaranteed Failure

Disclaimer: Even though this blog post sounds harsh, we believe the people involved in the migraration project of TSB in 2018 acted in good faith and with the best of the business and customers in mind. They acted according to the best knowledge available at the time and with good intentions.

The Project Worth £62 Million in Fines

On April 22, 2018, TSB Bank embarked on what they described as "one of the most complex migrations in UK banking history." The project was planned according to all conventional wisdom available: three years of planning, 85 specialized subcontractors, multiple board-level reviews, third-party audits, and comprehensive testing frameworks.

Within hours of going live, a significant portion of the 5.2 million customers of TSB lost access to their accounts. Digital banking collapsed. Branch systems failed. And the chaos continued for weeks.

The final damage was a £62 million fine from regulators, over £32 million in customer compensation, a CEO resignation, and hundreds of millions worth of effort to fix the mess afterwards. The UK's Financial Conduct Authority (FCA) conducted a comprehensive investigation, and their findings reveal uncomfortable truths about how large-scale IT migrations often go down, and what it takes for them to succeed.

The Scale of Mainframe Migration

Mainframe migrations aren't just software updates, they're institutional transformations. When TSB set out to migrate from Lloyds Banking Group's platform to a new system called Proteo4UK, they weren't just moving code. They were attempting to crystallize and transfer decades worth of business logic, regulatory requirements, edge cases, and institutional knowledge embedded in systems serving 5.2 million customers.

The FCA report documents that the Proteo4UK Platform was a collection of 221 applications, most of which were off-the-shelf systems for easy plug-and-play action. Beyond COBOL, CICS, VSAM, there were more exotic languages like Easytrieve, Telon, and PL/1. Databses ranged from the usual Db2 to IMS and IDMS . There were complex batch jobs with intricate dependencies. Integration points with dozens of external systems. 

Each of these components represents years of business decisions, regulatory compliance requirements, and operational knowledge. Missing any connection creates cascading failures.

The Impossibility of Planning Everything Upfront

Crafting a master plan for 3 years and then just handing that plan to 85 contractors to get the job done. Easy Peasy?
Here's the uncomfortable truth that emerges from the TSB disaster: you cannot know in advance how long a mainframe migration will take or what it will cost.

TSB tried. They created an "Integrated Master Plan" (IMP) in March 2016 targeting November 2017 for migration, a deliberately ambitious two-year timeline that was publicly announced. The FCA report notes this timeline was "based on very little information" and described as "deliberately very ambitious" to act as a "forcing mechanism."

But then reality started to hit. By September 2017, TSB admitted they would miss the November deadline. They spent weeks re-planning, coming up with a "Defender Plan" with new target dates in Q1 2018. Then came multiple re-re-plannings, re-re-re-plannings etc, but they had already locked themselves into a public deadline before understanding what work remained.

The lesson: Having a targe is fine and necessary, but setting fixed timelines and budgets for migration before you truly understand the system is not planning, it's wishful thinking that creates pressure to cut corners later. 

The Third-Party Trap: 85 Contractors and Zero Control

One of the most striking findings in the FCA report is TSB's reliance on 85 third-party subcontractors managed through their outsourcing partner, SABIS. We’ve all been there, trying to coordinate projects across multiple partners, but a project with 85? That’s a recipe for failure right from day 1.

The lesson: Outsourcing the core migration work to a web of contractors means outsourcing knowledge, control, and most importantly, accountability. When things go wrong (and in complex migrations, things will go wrong) you need people who can make decisions immediately, understand the full context, and take ownership. You can't schedule a call with 85 subcontractors once it hits the fan.

How to Succesfully Migrate in the AI Era

The FCA report and succestories from the industry, combined with the Bank of England's operational resilience framework, shows what really works. Step 1 is abandoning the illusion of complete upfront planning and embracing a fundamentally different approach, especially in the age of AI.

1. Plan only as much as needed

The TSB disaster proves that comprehensive upfront planning is a myth. Despite three years of planning, the FCA found that "TSB prepared a plan in circumstances where it had not yet finished defining its requirements (what the system was supposed to do and how it was supposed to be)."

In the AI era, code is cheaper and quicker to make, making an agile approach to large projects more topical than ever. Plan enough to start intelligently, and accept that most learning happens when you actually begin the work. You'll discover things you never expected. You'll find business rules that nobody knew existed. You'll encounter edge cases that only emerge under real conditions.

Build your plan around a continuous learning cycle, not comprehensive prediction. Then, just get to work and move incrementally, learning as you go.

2. Context Is All You Need. 

Wether you are a human or an LLM you need the right context to make decisions or create anything usefull. With too little context, your decisions are not well-informed and with too much, you can’t see the forest for the trees. The same applies for entire organisations too: everybody should share the right context to make good decisions. This was one of the things that TSB failed to achieve, a shared and adequate context.

In my opinion, the right context for anyone working with a mainframe contains:

  • The vision: Why are we doing what we do
  • The system: How does the system really work
  • The users: What needs to customers and other users really have
  • Business logic and regulation: What compliance requirements and business processes are embedded in the code

Of course, a developer and a business analyst, need a different depths of understanding for each category, but everybody should share the basics. 

The problem is, it’s a lot of information to take in. Schedules are full and peoples working memories are constantly on the edge. So how can people take in all this infromation and comprehend it while completing their daily tasks? I think this is the most significant use-case for AI: distill humongous amounts of information for easy human consumption. Tools like Nomain, analyse your complete code base, enhance the information with what ever data you have available, and provide full and tailored context for everybody across the organisation. From CTO to junior developer. All they need to do is ask. 

3. Small Teams Can Achieve Amazing Things

As TSB demonstrated, having a massive amount of developers, doesn’t mean that you can achieve massive things in a short time. Mostly, it’s just a massive waste of money.

A good analogy for this are startups. They are very good in achieving great results with very limited resources. If you compare the average startup size now and prior to 2020, you can see that team sizes have steadily declined. In fact, it seems that a team of 15 (average size of a Series A startup in 2024), equiped with the right AI tools, can build tools that are worth tens of millions. 

Of course, a dozen developer’s will never be able to migrate a whole core banking system anywhere, but the logic is sound. The mentality should always be that a “special forces team” equiped with AI understanding and coding tools beats the approach of “just throwing bodies on the problem” every time.

It's also important to keep an in-house team in the core of all development. When you move incrementally, your team learns constantly, getting better and more efficient every day. With an in-house team, all that learning stays with you. If you use outsourced teams however, the organisational learning get’s disrupted by high employee turnover rates and you have a new team every year. 

4. Make Sure You Stay In Control

AI is a good servant, but a bad master. Sure, you can prompt AI to generate you an app, but when it comes to legacy systems with tens or hundreds of millions of LoC, you just can’t give the whole rewriting task to AI agents even though that sounds very tempting.

If you have ever tried out vibe coding, you know the fundamental problem. It’s great in the start, but once the code base grows and problems start to compound, it’s very difficult to keep moving ahead. As you yourself have no idea about the architecture or tech choises, it becomes harder and harder to explain to the AI agent what really is not working. As a result, you start fixing bugs manually and going through tons of code, like you would when you just rewrite the code yourself anyway.

My advise is, use AI always as a tool to increase the speed of humans, not to replace them. The same applies also to mainframe migration projects.

5. The Right Tools

The landscape for mainframe development tools is remarkably inconsistent. Some companies embrace the latest development and AI tools: GitHub Copilot, CI/CD pipelines, modern IDEs. Others use the tools from the 80’s. Their developers write code directly into the mainframe through terminal emulators: no assisted editing, no debuggers, just white or green text on a black screen. For organizations like these, adopting a modern code editor like VS Code would be already a good start. The report about TSB doesn’t include any information about developer tooling, but my guess is tooling wasn’t their greatest strength.

One interesting thing about modern AI tools is that most of them focus on code generation. It's an obvious use case, but I think they accelerate the wrong thing. Mainframe developers spend only 16% of their time actually writing code, meaning that these tools have been optimising a fraction of the activities that tech teams engage in.

Consider this: mainframe developers typically spend 3-7 days per feature just locating where in the codebase they need to change something. Most of the remaining time goes to meetings, aligning with business, and piecing together how the system works, leaving very little time for actual coding.

This is why the true AI efficiency breakthrough for mainframe development won’t come from generating code faster, it comes from understanding existing code faster and sharing that knowledge across the entire organization. That’s also the primary use-case for AI knowledge platforms like Nomian and that’s why they should be the first tool in your mainframe modernization toolkit.

Platforms like Nomain enable users to grasp the big picture and drill into details in a fraction of the time what was the case earlier. Users can ask questions about the codebase, locate functionality, diagnose bugs, discuss new features, and share insights, all in seconds rather than days or weeks.

Developers aren't the only beneficiaries. Business analysts, product owners, and other stakeholders need to make very technical decisions about the mainframe, but can't read mainframe code. A knowledge platform gives them superpowers. They can now assess how difficult a new feature might be to implement before interrupting the day of development teams, saving everyone's time and enabling better-informed decisions.

An AI knowledge platforms can analyze legacy codebases to:

  • Map business logic, dependencies, and data flows automatically, revealing connections that would take weeks to discover manually
  • Connect code to business rules and operational data, bridging the gap between technical implementation and business intent
  • Create accessible, current knowledge instead of documentation, ensuring critical understanding doesn't exist only in retiring experts' heads
  • Enable rapid system comprehension, eliminating weeks of code archaeology before teams can make changes

In essese, they bring back the tribal knowledge that has been leaking out of mainframe organisation, creating comprehensive, connected understanding of how your mainframe systems actually work.

The Path Forward

The TSB migration failed not because of insufficient planning or technical incompetence, but because of too much faith in upfront planning and trusting that throwing bodies at a problem will solve it. They set public deadlines before understanding the work. They outsourced the implementation to 85 subcontractors. They skipped testing to meet arbitrary dates. They put all of thei chips on red with a big bang migration with no rollback option.

Each of these decisions seemed reasonable at the time. They're what conventional project management recommended and what executives expected. They're what regulators thought they were supervising.

And they led to catastrophic failure.

The successful path in the AI era looks like this:

  1. Build your core team in-house with people who have complete context about your systems, business, users, and regulatory requirements.
  2. Create shared context across your entire organization to enable everyone to know what they are dealing with.
  3. Equip your team with modern AI tools like Nomain, Cursor and/or Github Copilot, but maintain control and human judgement throughout the project
  4. Plan only what you need to start intelligently, then learn and adapt as you actually start working.
  5. Migrate incrementally, start small, learn what works, then scale up

The bottom line for successful mainframe migration isn't more planning. It's better understanding, incremental progress, and the right tools to maintain clarity as you go.

Stop planning everything upfront. Start understanding everything deeply. Get Nomain to help your team see clearly. Then get to work. One piece at a time.

Ready to build the deep understanding your migration needs before you commit to a timeline? Learn how Nomain helps teams get ovet the communication over-head and mvoe faster here: www.nomain.com