All posts
legacy evolution

The Retirement Cliff: When Your COBOL Lead Walks Out, Your Risk Doesn't

In every bank we walk into, there is one engineer who knows where the bodies are buried. When they retire, the bank inherits a system nobody can change safely. The answer is not to replace the person. It is to capture what they know - before the cliff.

Ohad KotlerMarch 18, 202610 min

In every Tier-1 banking conversation we have, there is a moment where someone - usually the person closest to the system - leans forward and lowers their voice.

"There's one guy," they say. "He's been here twenty-five years. He's the only person who knows why we don't touch that batch on the 18th of the month."

A senior consultant at a banking-transformation consultancy put it more precisely: "There is one guy that has been there for the past 25 years and he knows them all before he retires. Which I'm also aspiring to." In every organization in his portfolio, he had already spotted the pattern - the old-timers who have to be in every meeting because no one else knows the blind spots, the legacy stuff, the system behavior the new management doesn't know anything about. "And the new guys and the consultants don't know anything about it. And usually when they find out, it is too late."

This is the retirement cliff. It is not a hypothetical succession-planning problem. It is a live risk inside almost every Tier-1 bank running mainframe or near-mainframe systems today. And it is the single most under-priced risk in modernization budgets.

The pattern is identical across every bank we've walked into

This is not a COBOL problem. It is not a German-bank problem. It is not a Tier-1 problem.

A senior modernization executive at a Tier-1 European bank described it to us in the language of accounting: "We have a bunch of legacy code that was written by a bunch of people who are, in what I like to call post-retirement, i.e. RIP. And they wrote it. They were massively good at their job. They didn't need any documentation, they didn't put any commentary into the code. They just coded it. If something went wrong, you went to them, they went in, they knew their code, they could fix it. And now they're gone. We don't know what it does anymore."

A senior engineer at a Tier-1 European bank, born a generation after the people who wrote the system he now has to change, said the same thing from the other side: "People have been working for their bank for 25 years. Some leave, they go into retirement, and they take the business with them - or like the knowledge. Me, as a young person who is trying to understand what has been done before - it's just difficult, because it's another generation, to be honest as well."

A service manager at a European insurance company raising an incident on a 25-year-old RPG system described the operational consequence: "There's a system, and that is ASIS for people internally. If we've got an incident that is written by programmers that are not yet available anymore - and documentation doesn't give the right answers."

A CITO at a European card-issuer subsidiary of a major bank made the structural point: "These systems are already in place for 25 to 27 years. Aging systems also basically means aging population, aging teams. So our teams are fairly skeleton as well. That's a challenge that we have for the upcoming period where we still need to do stuff."

,[object Object], Documentation captures what the system was designed to do. Tribal knowledge captures the dozens of times the system did something else and someone fixed it without writing it down. The first is recoverable from the spec. The second leaves with the person.

Why "reverse-document with AI" hasn't solved this

The senior modernization executive didn't tell us about the retirement problem and then ask us what to do about it. He told us what his team had already tried.

"We've done some work using AI to make sense of legacy systems. We've done reverse documentation of existing code. We sent some AI in there, said, 'go read the code, figure out what it does, tell us.' So we've done these sorts of things."

How did it go?

"Mixed. In some cases, actually, it's not bad. We had some success in generating a level of documentation on a number of our platforms. At the other end of it, we've also - the way I like to put it - we've had some systems we sent three or four AIs in there. They never came back."

This is what reverse-documentation tools do well: they produce something that looks like documentation. Plausible prose. Reasonable-sounding summaries. Code that, at a surface level, has been "explained."

What they don't do is produce what the retiring expert actually holds in their head:

  • The structural truth. Which programs are actually called by which, through what mechanism, carrying what data - not summarized, not approximated, traced.
  • The data lineage. Where this balance field comes from. Where it goes. Which seven programs read it. Which one writes it. Which JCL step has to run before another for the value to be valid.
  • The business rule, in context. Not "IF WS-ACCT-TYPE = 'S' AND WS-BALANCE > 10000" as a line of code, but "we apply the premium savings rate when a savings account exceeds ten thousand - and that rule exists because of a 2008 regulatory change that the retiring engineer remembers and the documentation does not."
  • The provenance. Every claim about the system, anchored back to the specific lines of code it came from. Wikipedia-style references. One click and you're in the source. If you can't verify it, you can't trust it. If you can't trust it, you can't sign off the change.

Reverse-documentation tools that produce text can give you the first surface of this. They cannot give you the rest. And as the same senior modernization executive told us - invoking Karl Popper directly - "It's the scientific principle: you can never prove something works through experiment, you can only prove that something doesn't work. Even if you used AI to generate the tests, theoretically you cannot prove to me through tests that it works."

The retiring expert isn't a documentation producer. They are a deterministic system model walking around with legs. The replacement has to be the same.

What the bank actually needs is a system model that doesn't retire

Three things have to be true on the day the senior engineer walks out:

1. The system has to be queryable. Not "searchable" - queryable. Someone who joins the bank next week, who has never seen the codebase, has to be able to ask "how does interest accrual work in this product line?" and get back not an LLM's interpretation but a traced, structured answer: these are the programs involved, in this order, reading this data, calling these external interfaces, executing these business rules. The senior consultant's framing landed because it matched what banks already have for business processes: "Aris but for systems. System process management, not business process management."

2. Every claim has to be verifiable, one click away. This is the single most load-bearing property. Every business rule the new engineer reads. Every dependency they trust. Every flow they consult before making a change. All of it has to be anchored back to the actual code lines it came from. The framing one of our engineers used with a banking customer is the right one: "It's like references on Wikipedia." You don't have to take any claim on faith. You can verify it in seconds. That property is what converts the model from "another AI tool we don't trust" into "the artifact that survived the retirement."

3. The model has to update as the system changes. A system model that was true on the day it was built and is stale six months later has just become tribal knowledge in a different form - except now no one knows which parts to trust. The replacement for the retiring expert is not a one-time document dump. It is a living blueprint that re-ingests when the code changes and tells the team what moved.

This is what we mean when we say Tweezr captures what the retiring engineer knows - not their personality, not their judgment about which 18th-of-the-month batches to leave alone, but the structural and behavioral truth that lets the next generation of engineers develop the same judgment in months, not decades.

Why this matters for the modernization budget specifically

Every modernization roadmap we have ever seen treats the retirement cliff as an HR risk on page 47 of the program plan.

It is not an HR risk. It is the primary determinant of whether the program succeeds.

Here is why. The COBOL programs that convert cleanly to Java with AWS Transform or watsonx Code Assistant for Z are the ones whose behavior is well-understood. The 30-40% that go green on the dashboard in month three. The programs that stall the program - the 20-30% that turn the project from "we'll be done in 24 months" into "we wrote down $80M of progress" - are the ones whose behavior was carried by the retiring engineers. Conversion tools cannot rescue what nobody can specify.

If you go into a modernization program without capturing the retiring engineer's model of the system first, you are not running a modernization program. You are running a controlled demolition with a deadline.

,[object Object], Whether the right answer for any given system is rewrite, evolve, or keep-and-modernize-around is a decision that depends on what the system actually does. You cannot make that decision before you have the model. And you cannot get the model after the person who holds it has retired.

What "before the cliff" looks like in practice

In the conversations where this lands hardest, we describe the work in three concrete deliverables - none of which require the retiring expert to write a single line of documentation:

A blueprint of what the system actually does. Capability map, business processes anchored to real entry points, business rules surfaced in human-readable form. Built from the code, not from interviews. The retiring expert reviews - they don't author. Their job is to catch the handful of places where the model is wrong, not to produce it.

A blast-radius answer for any proposed change. When the new engineer asks "what happens if I change the rounding rule in the interest-accrual program?" - the model returns the affected processes, the downstream consumers, the data flows that depend on it, and the source lines that prove it. The new engineer develops the same intuition the retiring expert has, against an artifact they can verify, not a heuristic they have to memorize.

A continuously updated source of truth. Code changes are re-ingested. The blueprint stays aligned with the system as it evolves. The model doesn't become stale tribal knowledge in a different form. It becomes the bank's permanent answer to "what does this system actually do."

Together, these three deliverables are what the retiring expert held in their head. They are also what every regulator now expects regulated banks to be able to produce on demand, regardless of who is in the building.

The 25-year-old engineer is not the problem to solve

We talk to a lot of banks. Almost every one of them, when this conversation lands, asks the same question: "How fast can we get this in front of our team?"

But there is a quieter version of the question. The senior consultant who described the retiring expert pattern - the one with twenty-five years and all the blind spots - said something we don't forget: "I'm also aspiring to be that retiree."

The people who have been holding these systems together for two and three decades are not adversaries of modernization. They are the keepers of the only complete model of the bank's operational reality. They want what they know to survive them. They want the next generation to be able to do the work without spending another twenty-five years rebuilding the same intuition from incidents and stack traces.

The retirement cliff is not a problem to defer. It is a deadline you don't get to negotiate. Every quarter you wait, the model gets more expensive to recover - and at some point on every legacy estate, the cost crosses from "manageable" to "irrecoverable."

The choice is not whether to capture what the senior engineers know. It is whether you capture it from them - or from the auditors who will be asking the same questions, with much less patience, after they retire.


Frequently Asked Questions

Doesn't reverse-documentation AI already solve this?

It solves a slice. Reverse-documentation tools produce plausible summaries from source code. They are useful for orientation. They are not useful for the kinds of decisions that the retirement cliff actually puts at risk - the ones that require knowing exactly which programs are affected by a change, with verifiable evidence, before the change ships. The Tier-1 European bank team that ran this experiment described the result honestly: mixed. Sometimes useful. Sometimes the AI went in and never came back. That is not a foundation a regulated bank can sign off on.

How is this different from interviewing the senior engineers and writing it down?

Interviews capture what the engineer can articulate. They miss what the engineer knows but can't articulate - the dozens of small decisions a person makes intuitively after twenty-five years, the heuristics that are easier to demonstrate than describe, the system behaviors that only show up under specific conditions the engineer hasn't thought about in a decade. The right model is built from the code (which doesn't forget) and reviewed by the engineer (who corrects, but doesn't have to produce). That inverts the burden. The engineer who has ten years left becomes a reviewer of a complete model, not the sole producer of a fragmentary one.

What if the senior engineer has already retired?

The model can still be built - the code is still there. What you lose is the highest-leverage validator: the person who would have spotted the few places where the deterministic model is wrong because of historical context that no longer appears in the code. Everything else can be recovered. But the recovery cost rises sharply, and the certainty drops. The right time to do this work is while the senior engineers are still in the building.

How does this fit into a modernization program already in flight?

It comes before it. Or earlier than wherever you are right now, if you have started without it. The 70% failure rate of large transformations is not driven by execution. It is driven by programs entering Phase 3 - the "wall" where the remaining hard programs reveal that nobody on the team knows what they do. A system-level model captured before the wall changes that phase from "we have to stop" to "we have a verified spec." If you are already inside the wall, the model is still the way out - but it is no longer the easier path.


Tweezr captures the system model that the retiring engineer was holding - built from the code, verified against the engineer while they are still in the building, kept current as the system evolves. If a senior engineer on your team is within five years of retirement, see how the blueprint is built or book a conversation to talk about your retirement-cliff timeline.

Related Posts