A team of academic writers and researchers explores the organisational considerations for software automation in CNI. With care and respect, they suggest, AI can help keep the lights on, water flowing and nuclear power plants working.

Critical National Infrastructure (CNI) covers thirteen sectors and comprises infrastructure deemed by the UK government to provide essential services, such as emergency services, transport, finance, and communications, as well as infrastructure that requires additional regulation to ensure public safety, such as civil nuclear and chemical sites. Building digital resilience in CNI is a priority for nation-states, and a fundamental function of digital resilience is the ability to withstand cyber threats.

Operational Technology (OT) systems control physical processes in many CNI sectors, and considerable work has been done to protect these systems from cyber threats. Much of this effort is ongoing, and CNI organisations continue exploring and deploying cyber prevention and detection mechanisms.

Significant advancement in these mechanisms is occurring in the application of machine learning algorithms for cyber incident prevention and detection, such as the k-means, a clustering algorithm, and Random Forest (RF), a decision tree-based algorithm. Consequently, machine learning algorithms are beginning to appear in commercial OT cyber security tools. Yet, the application of these technologies is viewed with considerable caution, which currently tends to block its application.

Additionally, the touted benefit of machine learning and future applications of artificial intelligence (AI) algorithms is their ability to process and interpret large quantities of data. This technology could facilitate partially or completely offloading tasks from a human practitioner to a machine—a system where the complete removal of human practitioners from tasks would become fully automated. However, most systems would not deploy full automation and rely on collaboration between the practitioner and the machine. The expectation of human-machine collaboration introduces additional considerations centred around how the technology, the organisation, and the practitioner are expected to interact.

What follows is a summary of considerations designed for CNI organisations with OT when they are considering adopting ML/AI-based solutions to enhance their digital resilience. These considerations are designed to assist these organisations in navigating the changing technological landscape by exploring topics that can help identify where ML/AI-based solutions can be applicable and where they should be avoided.

Safety, regulatory, and legal considerations and risk tolerance

Safety, regulatory, and legal considerations will affect an organisation's perceived risk tolerance, which in turn will ultimately affect the level of automation that can be deployed. Appropriately defining the level of risk processes and components can tolerate is important to enable automation for digital resilience, as this will guide the level of automation possible.

Motivation for automation, tool purpose, and the role of the human-in-the-loop

Increased automation can free up operators for other functions by reducing time spent on existing tasks or completely removing them from their workload. For this to be practical, tools must be chosen carefully to ensure they meet these business requirements and will be fit for purpose. Critically, when the operator is expected to function in the loop, the operator's role in this human-machine team should be clearly defined, specifically regarding expectations over whether the machine or operator is a final decision-maker or is expected to supervise machine decisions with an ability to intervene.

Understanding different system states or modes

The different states of an OT system need to be understood when deciding where or how to implement machine learning software; a nuclear power station will have other considerations when generating electricity versus when they are under outages. These states can influence many factors, such as motivations for introducing more automated software; planned outages where upgrades occur increase the risk of introducing cyber incidents into the system and, therefore, present an opportunity to introduce machine learning-based cyber prevention and detection software. While in electricity-generating mode, more significant data could be captured that could facilitate a baseline where abnormalities are more accurately identified, reducing false positives, which might be considered more critical.

Variation in automation levels

In CNI OT environments, fully automated systems are often perceived as unattainable. However, an increased level of automation can be applied to a specific task – it is not meant to be a catch-all category for a whole system. Therefore, systems and processes should be broken down as far as is practical, and each should then be assessed individually to determine whether an increase in automation is feasible.

Testing and assurance of machine learning algorithms and access to expertise

In CNI OT systems, safety risks to the public and employees must be managed. A primary means of managing these risks is through safety and security cases. Safety cases require rigorous testing, so organisations are confident that technology operates within their risk tolerance.

Currently, ML/AI-based software deployment for digital resilience is expected to conform with these existing mechanisms. However, ML/AI-based software is not necessarily deterministic, which makes predicting its potential effects on a system complex. This is often considered one of the most significant barriers to deploying software on CNI OT systems.

Not all ML/AI-based software works in the same way, and how each algorithm functions could affect how suitable it is to be deployed and the extent to which it should be tested. Hence, organisations need to develop and identify the type of testing required for different models and the affecting factors. An algorithm that uses a decision tree model, so is deterministic, and operates offline on an isolated data set pulled from a system poses potentially different challenges than a reinforcement learning model operating online directly in a system, even though they are both machine learning algorithms. Therefore, organisations should consider whether all machine learning algorithms must be tested equally or according to factors that vary the algorithm, its implementation, and its use.

Organisations will need the appropriate suitably qualified experienced person (SQEP) or a trusted partner to assist them with these decisions. This will allow the organisation to understand whether a rules-based approach or comparison to baseline would be sufficient for the organisation’s requirements. Or whether it would be a requirement for the software to operate offline or online on its isolated network, or whether having real-time access to data and being on the system's network is essential. 

Functionality and design of software for the end-user

One area of criticism of existing tools, even without ML capabilities, is that the end user often has difficulty accessing required information, such as accessing reports on an antivirus scanning tool, or not being presented with relevant information such as times and dates when signatures were updated along with antivirus scan times. Therefore, when designing or selecting a tool, it is essential to understand the end-user’s role and what information they rely on to complete their tasks. Then, the software needs to provide the practitioner with all the relevant information in an accessible manner. Ensuring the tools adequately support the end-user is essential for effective human-machine teams and, thus, successful software deployment to assist in increased automation.

Information overload and false positives

One of the touted benefits of implementing ML-based software is that it can help reduce information overload and, therefore, also reduce operator burnout. However, this reasoning typically assumes that the software is taking over an existing task of the operator or end-user. In OT, roles related to cyber security are relatively new, and it is often a responsibility given to employees in addition to their workload.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

Therefore, the introduction of such tools typically contributes to additional workload rather than reducing existing workloads, and this can lead to tensions between safety and security concerns, as these new additional tasks can be perceived as being distracting to a practitioner whose primary focus is often to ensure the safe operation of the system.

However, safety and security concerns in OT are intertwined when physical processes are concerned and hence. At the same time, introducing new technology has the potential to increase the tension between these two concerns; it can also be designed to minimise this tension. Ensuring algorithms are optimised to reduce false positives would significantly help alleviate this tension, and thus false favourable rates, and the extent to which increased automation may reduce practitioner workload – to minimise information overload - should be considered when looking to increase automation in cyber security protection, detection, and response.

Piloting new approaches

Carefully selecting several early use cases, whereby digital resilience would be improved along with a slight increase in automation, should be piloted first. These should initially be applied to low-risk or safe-to-fail use cases. Carefully selecting appropriate use cases for piloting new approaches will help facilitate trust in the technology from the users, organisation, and even the regulator, which will need to be built up from scratch with the adoption of new technologies.

Conclusion

We would argue that introducing software automation (to some degree) for digital resilience in CNI is ambitious but ultimately inevitable. However, today, the introduction of machine learning or AI-based software in CNI OT is often met with considerable caution as it is perceived to conflict with an organisation's ability to ensure the safe operation of its physical processes.

This caution is not misplaced but often blocks progress as the challenges usually seem too great. In the context of increased connectivity and digitalisation, introducing additional cyber threats and advancing digital resilience in CNI organisations is essential, and there is a role for some increase in software automation to facilitate this.

This process should be undertaken cautiously, with specific tasks in mind. It should ensure that expertise—either internal or trusted partners—is leveraged and that there is a comprehensive understanding of the affected systems, physical processes, safety, legal, and regulatory implications. By bringing together all these aspects, the organisation will be able to identify areas where software automation for digital resilience can be advanced.

Authors

Kelsey Collington, David Flynn FRSE MIET, Dimitrios Pezaros CEng FBCS FIET, University of Glasgow, Martin Gale CEng MIET, EDF and CINIF, and Andrew Deacon, Defence Science and Technology Laboratory.