Business Continuity Strategies in Times of Conflict – Lessons identified from the financial services sector in the Ukraine.

Authorities and banks in Ukraine have taken a number of measures to ensure continued access to banking services in a time of prolonged conflict. Despite the war most banks have continued operating since the start of the invasion on February 24th 2022 and confidence in the financial system has been maintained1. The National Bank of Ukraine, some individual banks such as Raiffeisen Bank2 and MTB Bank along with other firms have shared their experiences to date and I wanted to consider what lessons could already be identified that would have wider applicability from an operational resilience perspective.

People and Process Continuity

At the start of the invasion, a priority for banks was the safety of their people. Some firms left it up to individual employees to decide whether to stay in Kyiv or relocate to safer areas in Western Ukraine or abroad. Companies activated flexible work policies, and developed plans for paying employees. Raiffeisen noted that approximately 50% of staff would normally work on-site in office but they moved to safer places inside and outside of Ukraine while other staff were already remote working. Raiffeisen provided financial assistance, additional remuneration, evacuation and accommodation as well as continuing to pay conscripted employees of the bank.

Resilient Power & Communications

One of the most interesting developments was the so called ‘Power Banking’ project34. Key bank branches for systemically important banks were identified that could operate through provision of independent power supplies (provided by diesel generators and batteries), satellite and secure communication channels (provided by Starlink terminals, for example) and redundant internet connections (enabled by investment in redundant communication routes and decentralised network architecture to avoid single points of failure). This has allowed banking services to continue during both planned and unplanned power outages. In the case of Raiffeisen Bank 120 branches (42%) were identified for upgrade under this programme5. Customers are informed about the availability, location and working schedules of these branches in cities and towns across the country.

Cloud Services & ‘Friendly’ Shoring

Another early continuity enabler was the regulator allowing banks to migrate their IT infrastructure to cloud services hosted outside of the country for the first time. Services could be hosted in the EU, UK, US or Canada. This move helped to ensure continuity of operations even if physical infrastructure in Ukraine was damaged. MTB Bank noted6, by way of example, the threat of missile attack at their headquarters as the main driver to establish an independent disaster recovery site for essential services outside of Ukrainian borders; customers could be reassured that their data was backed-up and protected. Raiffeisen Bank7 accelerated its cloud programme moving 1,000 servers to the cloud within 3 months without loss of service to customers through the hard work of its staff working around the clock.

Learning & Adapting from Earlier Events

Banks were also among those entities that were the targets for cyber attacks. The ability to withstand these attacks is attributed to preparations stretching back to 2014 when Crimea was annexed together with support from international partnerships (Microsoft, Amazon, and NATO-aligned cyber security teams) and rapid incident response teams. Several companies and the central bank8 have referenced the Crimean crisis in 2014 as the driver for developing more comprehensive business continuity arrangements9 around severe but plausible scenarios albeit with caution required on assuming the exact same scenario as 2014 in planning assumptions10. Firms also reference the learnings from Covid-19 pandemic which provided a widely-adopted capability for remote working and work transfer.

Wider Applicability

There are some unique aspects to operational continuity at time of war, the lessons identified from this situation have not been lost on other countries, it was interesting to read that other central banks, including the Bank of England and the Riksbank have directly engaged with the Ukrainian central bank to learn from the country’s experiences. While armed conflict is a scenario that many countries may not have to consider, there are scenarios which would draw upon the broad set of capabilities established and being matured in Ukraine, whether through responding and adapting to a pandemic, cyber attacks on critical infrastructure or even extended technical failures in critical national infrastructures.

Footnotes

  1. NBU and Banks Implement Action Plan to Ensure Banking System’s Uninterrupted Operation amid Long-Term Blackouts ↩︎
  2. Ukraine war: how do you keep a bank going during a conflict? – The Banker ↩︎
  3. NBU Annual Report 2023: From Strategy of Survival to That of Recovery ↩︎
  4. annual_report_2023_eng.pdf ↩︎
  5. Based on results for 2022 Raiffeisen Bank Ukraine became TOP3 for clients’ funds and one of the leaders in lending and FX operations | Raiffeisen Bank ↩︎
  6. MTB Bank mitigates crisis with Azure VMware | Microsoft Customer Stories ↩︎
  7. A cloud migration in wartime | McKinsey ↩︎
  8. Payments in wartime: the story of the National Bank of Ukraine | European Payments Council ↩︎
  9. Putting business continuity plans to the test in Ukraine – ADP ReThink Q ↩︎
  10. Business crisis management in wartime: Insights from Ukraine – Opatska – 2024 – Journal of Contingencies and Crisis Management – Wiley Online Library ↩︎

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Four Years Later: Where are we with operational resilience?

It has been just over four years since the last post on this blog, but the importance of operational resilience has been underscored several times in such a short period of time!

We had a global pandemic with unprecedented measures implemented by governments around the world introducing ‘lockdowns’ into pandemic planning assumptions. As noted back in a post from 2014, this threat was very much underrated and again, it has quickly disappeared from the list of top risks in many horizon scanning reports.

Geo-political and country risks have come to the fore in recent years presenting challenges for business continuity in its broadest sense; underscoring the importance of understanding country risk as posted earlier. A future post will look into how banks faced into the challenges of continuing to operate in the Ukraine in a war situation and the lessons identified for other countries when considering similar situations in their planning.

Third party resilience risk is one of the enduring topics and incidents such as the disruption caused by a Crowdstrike software bug have shown the potential contagion effects of closely coupled supply chains and common software platforms. Regulators in financial services have picked up on this concern with new regimes to address concentration risk, including a closer scrutiny of cloud service providers.

One interesting topic to explore will be how well operational resilience practices are set up to address this challenging environment, recognising that we are dealing with a ‘wicked problem’!

Posted in Uncategorized | Leave a comment

Operational resilience – enter the regulator

This week saw the long-awaited publication* of the regulatory authorities’ policy changes to drive greater levels of operational resilience into the financial services sector and better mitigate the impact on customers of future events such as the one at TSB, covered in an earlier post.

It’s important to note that regulators acknowledge that incidents will still happen, but if the impact on customers and the wider financial sector can be better contained, then this will be a big step forward. It will also start to align the efforts already practised for financial resilience with operational disruptions and move us closer to achieving broader, organisational resilience outcomes.

Identifying important business services and setting impact tolerances are two key requirements introduced by the regulations; both require an external perspective in terms of understanding the harm that customers can experience when critical services are not available or are substantially impaired, as well as the cascading effect of disruptions at one institution on the wider financial sector. These considerations complement traditional concerns of customer experience, increased costs and longer-term reputation impact.

Naturally, firms regularly exceeding impact tolerances during disruptions is something that the regulators want to avoid. To understand whether firms would be likely to exceed these limits before the event, they need to identify severe but plausible scenarios and test the business services against these scenarios.  Where the scenario causes the service’s impact tolerance to be breached, then vulnerabilities identified in the underpinning processes or resources need to be addressed and investments made to ensure that the overall business service can operate within impact tolerance.

The timescales set out for complying with the regulations would be pretty demanding from a cold start, especially given investment cycles in larger firms. There are only four years from publication of the new rules this week to get to the position of being able to demonstrate the capability to operate within impact tolerances in 2025.

While some of the changes may seem daunting, a lot of the elements that require better orchestration and optimisation under these new rules are already in place, whether business continuity management, IT resilience, operational risk management, supply chain management or cyber security practices. The regulatory agenda actually provides an opportunity to put operational resilience on a new footing with a fully engaged senior management, driving better business outcomes.

* FCA’s Building Operational Resilience Policy Statement 21/3, as an example.

Posted in Uncategorized | Leave a comment

A bridge too far – a view on the TSB IT migration based on the Slaughter & May report

At the end of the film, a bridge too far, the generals reflect on the root causes of the disaster of Operation Market Garden.  Many causes were identified (‘it was Nijmegen’, ‘it was the single road getting to Nijmegen’. ‘it was after Nijmegen’, ‘it was the fog in England’), in reality these all contributed to the magnitude of the failure but they ignored the real root cause: a culture from the top that led to an overly ambitious operation, characterised by one of the generals as ‘a bridge too far’.

Slaughter & May likewise call out the board decision to go for a big bang solution (single event migration) as their bridge too far moment.

Unfortunately, the report doesn’t really get behind why that call was made. The terms of reference for the review include the requirement to determine the role that financial commitments may have made to the decisions taken. The report is silent on this point in its conclusions, only indicating that there was a business case for the acquisition of TSB that seemed heavily predicated on migrating to a common Sabadell platform within a predetermined timetable; one which proved inflexible to change until very late into the programme.  However, the report does point to a number of cultural challenges: there is a reported over reliance on previous experience and a systemic lack of challenge; ‘the culture of presenting a confident face’ to fellow executives and non-executives is also called out by the report’s authors.

It is quite possible that some of the actions and behaviours observed in the Slaughter & May report are not unique to TSB, Sabadell or its ‘in-house’ IT supplier, SABIS. So, the event offers a learning opportunity. Indeed, it is likely to serve as an operational resilience case study for many years to come, just as its authors intended. It’s not one control that fails in these events:  it is the pyramid of controls that collapses from top to bottom and this case study can therefore support the development of severe but plausible scenarios to assess one’s own resilience in similar situations.

To this end, there are naturally some operational learnings to take away from this incident as well:

  • The programme ran out of time but the main migration event went ahead. The new platform was not ready to support TSB’s full customer base and SABIS was not ready to operate the new platform. The report highlights the pitfalls of ‘right to left planning’ (which occurred for the original plan and the re-plan without left to right validation). Short cuts were therefore made and decisions taken to get the platform live and then fix forward.
  • The importance of non-functional requirements (NFRs) was not well understood at senior levels compared with functional requirements. Functional requirements can be explained in customer terms quite readily. The report calls out some specific NFRs: the test environment was not like production environment and capacity management thresholds for channels were changed to pass the test.
  • Supplier management was lacking. SABIS, the in-house IT supplier, was not subject to the due diligence and governance of an external supplier, specifically for assessment of its capacity and capability.  Independent control testing was available but not reviewed. Monitoring of the quality of the early cut overs would have served as an early warning indicator. SABIS pursued a traditional supplier-customer relationship with its own supply chain rather than a more collaborative, shared outcome approach.
  • Pre-planned contingency arrangements proved inadequate. TSB rightly anticipated increased customer enquiries following the migration and increased BAU resources to this end. However, the magnitude of the disruption had not been anticipated even though 1st line essentially identified a scenario where they would struggle with multiple major incidents and multiple emergency changes happening at the same time in the period immediately post go live. However, during the incident itself key employees transferred within Sabadell to SABIS and TSB received additional external support, including from IBM, while the rest of the group continued to operate.
  • Risk management lacked cohesion and coherence. The report’s authors noted that 2nd and 3rd line did not co-ordinate to cover the breadth of risks and depth of assurance required. The ‘15 Guiding Principles’ model was conceptually a good decision support tool. However, in practice it was not independently assured. While the approach of generating ca600 page packs for review by board members shortly ahead of meetings did not lend itself to thorough review and created a dependency on executive narrative.

The post migration comments by executives in the TSB annual report highlight the tensions inherent in the operational resilience challenge, specifically, and more broadly risk management. While deep regret is expressed at the harm caused to their customers, executives note that the new platform is now providing the business with a competitive platform for the future and customers will benefit from an improved product offering and experience.  While perhaps not seeking vindication, it does remind us that the lower risk option of remaining in part or in whole on the older Lloyds platform would have restricted achievement of TSB’s business objectives. Over time this incident may well get rationalised as ‘right strategy, poor execution’ with an associated £300M unplanned cost.  After all, in a bridge too far, Operation Market Garden was seen as ‘90% successful’ by the field marshal.

Posted in Uncategorized | Leave a comment

Carillion – Lessons for Risk Management?

The liquidation of Carillion back in January 2018 made me reach for their last annual report covering 2016 and published in early 2017, just months before the profits warning in July 2017. I was interested in the risk management section of the annual report and in particular the risk heat map (see image). The gross risk levels looked pretty accurate in light of what unfolded but what made the senior management team, risk teams and auditors think that the residual levels were in any way accurate?

Posted in Uncategorized | Leave a comment

Resilience as a ‘wicked problem’

According to Rittel & Webber (1973) a ‘wicked problem’ refers to a complex problem for which there is no simple method of solution.  Further Camillus (2008) offers five characteristics of a wicked problem:

  • The problem involves many stakeholders with different values and priorities
  • The issue’s roots are complex and tangled.
  • The problem is difficult to come to grips with and changes with every attempt to address it.
  • The challenge has no precedent.
  • There’s nothing to indicate the right answer to the problem.

Camillus goes on to write that wicked problems are ‘the opposite of hard but ordinary problems, which people can solve in a finite time period by applying standard techniques. Not only do conventional processes fail to tackle wicked problems, but they may exacerbate situations by generating undesirable consequences.’

Everyone can agree that they want the benefits of resilience without being clear on how it can be delivered. Most can agree that it is the sum of a number of changing, interdependent parts both internal and external to the organisation but then the search for a framework begins and traditional management systems approaches which work well for well defined problems are not so easily extensible to wicked problems, for example ISO models are good at ‘feedback’ loops but can they create ‘feed-forward’ (Camillus) insights?

Recognising that you are dealing with a wicked problem is therefore ‘half the battle’ as one might say and the thinking by strategy academics such as Camillus is helpful in making this recognition.  The fun part starts with addressing the problem you’ve now recognised!

Posted in Uncategorized | Leave a comment

Is it time Business Continuity published its own control sets?

One of the advantages of moving away from business continuity (BC) and working in the field of information security for a period of time is the perspective it provides when contrasting the relative fortunes of embedding the respective professional practices in value chains across the world.  One area that I feel contributes to the comparative deficit of traction of business continuity compared to information security is the lack of defined controls whether in published guidance or standards.

The ISO 27001 standard comes with 114 controls defined and these are developed through additional guidance in ISO 27002.  This approach provides a common language across the value chain and enables the robust design of information security management systems and the ability to provide assurance across participants.  Contrast this with business continuity where each value chain member essentially develops their own controls and expends considerable time and effort ‘aligning’.

The challenge will be exacerbated with the adoption of operational resilience frameworks as the protective disciplines supporting the framework are asked to come together to determine the holistic resilience profile. BC will not have a unified approach.  It may be too late to tackle the lack of a common controls’ framework given recent revisions to BC standards have not addressed this omission. It will more likely fall to the emerging operational resilience standards to ensure that a comprehensive control set is made available to drive consistency and increased adoption of business continuity practices across the value chain.

Posted in Uncategorized | Leave a comment

Can ignoring supplier risk really bring your company down?

Sometimes it is useful to challenge one’s own assumptions and the importance of understanding and responding to supplier risk is one of those that seems obvious based on many well documented incidents of diminished supplier capacity and capability, but some may argue that, in most cases, the operational or reputational harm is short-term and rarely fatal in isolation from wider strategic factors.

To help work this through, I decided to take 30 minutes to write down as many scenarios as came to mind and evaluate whether any fatal scenarios were among them.  Here’s my list of scenarios where harm is incurred (in no particular order!)

  1. Supplier achieves lock-in and pushes up prices impacting cost base due to over-concentration of business
  2. Supplier is black-listed by a regulator/government causing service transition pain
  3. Supplier becomes a successful competitor, i.e. forward integrates
  4. Supplier has superior knowledge of risks and does not share with the customer, leading to lack of preparedness and extended disruption
  5. An unknown critical widget (including software) is revealed too late due to lack of visibility into supply chain
  6. Key supplier becomes insolvent
  7. Poor performance by supplier
  8. Supplier cannot resource the service required
  9. Change of control – supplier acquired by a company that has “issues”
  10. Supplier does not have competence to deliver the service
  11. Breach of regulation that is licence-affecting for the customer leading to sanction, financial loss and reputational harm
  12. Supplier changes strategy – the product or service bought is not longer attractive to them and they wind-down/divest; legacy issues in maintaining skilled support on the supplier side
  13. Too much focus on contract and ‘use of stick’ over relationship development and reputation leverage, i.e. not ‘customer of choice’ from supplier’s perspective
  14. Collusion (fraud) through sourcing process between customer representatives and the supplier
  15. Outsourcing a service with inadequate retained organisation leading to extended outages – only knowing about the risks when the incident occurs
  16. Country risk – offshoring and outsourcing to unfamiliar countries and locations exposed to significant political and environmental risks
  17. A new Y2K type bug
  18. Supplier lies about capability and contract is awarded (inadequate due diligence)
  19. 3rd party treats customer’s customers badly
  20. Key suppliers prioritise their other customers over you when they face severe disruption and would rather pay the penalties in the contract (taking a risk-based approach on their side)
  21. Force majeure event declared (correctly)
  22. Supplier insolvency and reputation harm to customer caused by customer terminating agreement
  23. Upstream Tier 2 supplier found to be more significant than Tier 1 supplier  – but too late as it fails to deliver
  24. A direct or indirect supplier is found to have poor working practices by the media/NGOs leading to regulatory/statutory breaches and/or reputational harm
  25. Supplier introduces unproven product into a production environment causing a major and extended outage
  26. New services can’t be launched because supplier cannot deliver (on time) or too difficult to exit from incumbent (risk of change is too great)
  27. Data breach involving customer data at third party or caused by third party’s poor controls

I’m sure there are many more!  How impactful these events are will naturally depend on context but I’m pretty sure that number 3 has the potential to be terminal.  Given the trend towards virtualised organisations and business model dependency on orchestrating third party relationships, it does seem reasonable to reaffirm that the capability to manage third party risk effectively is one that will be a key determinant of the success or failure of companies.

Posted in Uncategorized | Leave a comment

Checking for the exits on the way in…

No, this is not taken from the script of the Homeland TV series, but it is advice often given – but not often taken – when contracting with third parties. Good exit management is after all a key enabler of resilient supply chains.

The reasons for exiting a service or relationship do vary, so it is important to ensure that exit planning fits the nature of the service rather than use a one-size fits all approach. A good example of this is consideration given to how long and the type of resources required to introduce the current service being contracted: if the planning and transition of the new service takes 9-12 months when everyone is on-board and keen to deliver on time, then an exit period of 3-6 months under very different relationship circumstances is unlikely to be sufficient.  In short, the approach to exit, or exit strategy, needs to be thought through at time of sourcing not after sourcing has completed.  In this way the contract and operational relationship can be set up to support exit.

Another challenge in exit management is maintaining a current exit plan, one that captures sufficient detail of the service, especially an outsourced one, where you may want (or need) to have the option of bringing in-house.  This requires the exit plan to be developed in a reasonable period post contract and for it to be regularly reviewed and updated as changes to the service occur through BAU or run phase.

The advantage of this approach is that in a stressed or unplanned exit situation-  caused by financial failure of the supplier, poor performance, early termination, etc – there is a baseline plan that can support the exit, recognising that exit will need to be accelerated compared to the target plan.  This plays to the famous motto that while a plan may prove useless on first contact with the enemy, the activity of planning is invaluable.  From a resilience perspective, this is clearly a key factor, and should tie into the organisation’s business continuity planning.

Taking a formal approach to exit management sounds like common sense but it does tend to get put to the back of the queue when transitioning to a new service, and in some instances this failure of upfront thinking and action has proved to be expensive for the customer as per Hutchison 3G vs Ericsson

So, thinking seriously about exit only at the time of exit is an approach with many risks attached to it – mostly foreseeable ones, it has to be said.  Waiting until exit can actually mean that you stay with a provider or service that is not delivering the value you expect or need because the risk of change is greater. Even if the service is satisfactory, it may be the case that a wider strategy cannot be enabled with the current service construct and hence opportunities are missed by the business.  Learning a lesson from the spy novels would clearly be a step worth taking.

Posted in Uncategorized | Leave a comment

The rise and rise of Business Continuity…

I’ve just had the pleasure of reading through Yossi Sheffi’s latest book entitled The Power of Resilience having pre-ordered it on Amazon.  Aside from the accessible writing style and many case studies which are included within the book, what struck me most was the recurring reference to Business Continuity and the key role it plays in the supply chain resilience story.  What’s more, this position is relatively new in supply chain circles.  When I looked back at Sheffi’s earlier book from 2005, Resilient Enterprise, there is not a single reference to Business Continuity.

Sheffi’s book is not a “how to” guide for building resilience but it does identify relevant tools for the kit bag with case studies from, generally, larger corporations.   The new idea in the book from my perspective was regarding detectability lead times and how to interpret signals to allow time to improve lead times.  Sheffi notes that many events have negative lead times, i.e. you find out about the problem after the risk event has materialised – specifically product defects, but data breaches could be added to this list too.  In my mind identifying and implementing strong risk indicators is a key capability in order to develop operational resilience.  The promise that Sheffi offers is that this capability will turn unknown unknowns into known unknowns and known unknowns into known knowns.

In re-reviewing Resilient Enterprise, the limitations of current thinking on resilience are, however, exposed.  The poster child of the resilience movement at the time was Nokia mobile phones and the case study lauded their huge business advantage over rival Ericsson that they gained due to their resilience around the Philips semiconductor fire in Mexico.  But where is Nokia mobile phones today?  Gone.  So their operational resilience did not translate into sustainable business resilience, i.e. Nokia mobile phones was not able to manage the market disruption of Apple and co.  Perhaps, it is a bridge too far to stretch resilience thinking into the strategic layer and it should instead be firmly positioned as an operational capability linked tightly to execution.  I’m not convinced.

Posted in Uncategorized | Leave a comment