With more AI systems being deployed into production, it becomes critical to ensure that the systems are secure and trustworthy. Here in AI Singapore, the SecureAI team is dedicated to developing processes and tools to support the creation of secure and trustworthy AI systems.
As shared in the previous article, one of the key ingredients to robust AI systems is process. Currently, operationalizable process guidelines are missing to guide organizations in developing, verifying, deploying, and monitoring AI systems.
To fill this gap, the SecureAI team has worked on developing a set of guidelines that draws upon AI Singapore’s experience in delivering 100E projects, and consolidates knowledge and best practices from the larger AI community – notably from the Berryville Institute of Machine Learning (BIML) Architectural Risk Analysis (ARA) and Google’s ML test score paper.
In this article, we will share an overview of our findings and how we operationalized them in the organization.
Engineering AI Systems Securely
An AI system is a specific type of software system. The field of software engineering has a relatively well-established set of best practices for the development of software systems. In comparison, the domain of AI engineering is in its infancy and the best practices are constantly being updated and improved.
The full life cycle of an AI system generally consists of the stages as shown in Figure 1.
The considerations for engineering an AI system can be grouped into one of the following four areas of focus: data, modelling, infrastructure, and monitoring. Each of these areas can pertain to one or more parts of the life cycle. The following are a selection of key considerations under each area, which we have identified to be important for the development of secure AI systems.
Data is a key part where AI projects differ from a traditional software project. Traditional software systems have their logic coded in their source code whereas AI systems rely on learning from the data provided. This means that any bias or compromise in the data can result in vastly different behaviors and unwanted outcomes in the AI system. Therefore, it is critical to ensure that data used is trustworthy and reliable.
As data is arguably one of the most important components of an AI system, there are many other considerations in this category. This includes, but is not limited to, checking for input-output feedback loops, proper representation of the problem space, data splitting methodology, avoiding unwanted bias from data processing, and ensuring privacy/anonymity.
The model or algorithm is typically what people think of when it comes to AI systems. The model chosen needs to be suitable for the complexity of the problem. It is also important to identify and verify assumptions associated with the models.
Beyond the choice of algorithm, model development is a complex process where many small decisions have to be made along the way that can potentially have a critical impact on the performance of the model. It is important to examine these choices systematically, for example, it is important to evaluate the sensitivity of hyperparameters, and whether the metric used for the machine learning task is appropriate.
Beyond basic functional requirements, an AI system can also be tested for non-functional requirements1 such as fairness, robustness, and interpretability. Robustness testing specifically, is an area of focus for the SecureAI team and we will be sharing in much greater detail about our work in this area in subsequent articles of the series.
Infrastructure supports the entire life cycle of the AI system. This is not limited to training and testing, but also to deployment and future enhancement of models. The infrastructure should facilitate the process of model training, model validation, and model rollback when needed.
It is important to have proper access control and versioning of the data, model, and code, for traceability, reproducibility, and security. The development and production environments should also be properly isolated.
The performance of an AI system could change in unexpected ways over time due to reasons such as changing trends or degradation of physical hardware, e.g. sensors which provide input data or computational devices that the model runs on. It is important to continually monitor the performance of the system to ensure that it meets requirements. The monitoring should be able to automatically alert the relevant teams when the performance deviates from expected, so that the necessary actions can be taken promptly, e.g. retraining of the model, updating of dependencies, maintenance of hardware and etc.
All of the four aforementioned aspects must be managed properly in order to ensure that the AI system is reliable and secure. This is not an exhaustive list but rather an introduction to the topic of secure AI engineering. For more in-depth discussion on the topic, interested readers may refer to the linked resources.
Operationalizing the Principles
In order to put the above principles into practice, the SecureAI team has developed the following process that involves a knowledge sharing and security review.
At the start of a project, the SecureAI team conducts a sharing about the common risks faced during the development and deployment of AI systems. The target audience are AI practitioners, engineers and the project stakeholders from both AI Singapore and partners from the industry.
The primary goal of the sharing is to ensure that everybody involved with the project understands the importance and implications of AI risks and are aligned with the goal of minimizing the risks.
It also enables the AI developers and engineers to proactively secure the AI system as they are developing it, as well as for practitioners from a traditional cybersecurity background to understand the security implications of deploying AI in their systems.
The project team is provided with a checklist that consists of questions that are designed to aid them in systematically identifying and mitigating potential risks in an AI system. Throughout the development process, the project team can refer to the checklist for guidance.
When the project team is ready, they can fill in the checklist with the details of their system design. Based on the responses, the SecureAI team provides an overall risk assessment and recommendations for mitigating potential risks. This process can be iterative in nature, to facilitate the development of more secure AI systems.
At the end of the project, the final version of the report will be handed over along with the project deliverables to the project sponsors.
Risk Control Checklist Examples
The questions in the checklist are organized into four sections reflecting the typical life cycle of an AI system as mentioned above (data, modelling, infrastructure, and monitoring). Sample questions and recommendations are shown in Table 1 and 2, respectively.
|Section||Question||Answer [Y/N/NA]||Elaboration [Please justify all answers, including ‘NA’]|
|Data||Is your dataset representative of the problem space?||Please describe the problem space that the ML system aims to address.
Please elaborate on how you have ensured that the distribution of the data is representative of the problem (e.g. data covers all intended operating conditions/target demographic, term frequency matches the natural distribution of the target corpus, classes are balanced). Please note down any constraints in obtaining a representative dataset, if any.
|Modelling||Have you ensured that your model is sufficiently robust to noise in the inputs?||Please elaborate on how the model was tested for robustness.|
|Infrastructure||Is your ML pipeline integration tested?||Please elaborate on how you have ensured that your full ML pipeline is integration tested and how often (e.g. automated test that runs the entire pipeline – data prep, feature engineering, model training and verification, deployment to the production servicing system – using a small subset of data, in regular intervals or whenever changes are made to the code, model or server).|
|Monitoring||Will any degradations in model quality or computational performance be detected and reported for the deployed model?||Please elaborate on how degradations of model performance in the production environment are detected and reported.|
|Checklist Section||Areas of Improvement||Recommendation|
|Modelling||The ability to explain the model is relatively low due to the application of a deep learning model.||Post-hoc explainers, like LIME or Grad-CAM, could be applied.|
|Infrastructure||The data and model artefacts are manually versioned with timestamps.||It is suggested that a proper model lifecycle management tool is used. This would help to keep an inventory of different models and their corresponding performance and model stage transition (e.g. promoting a model to production stage or roll-back from production).|
Following this process allows us to have more confidence that the AI systems developed in AI Singapore are secure and trustworthy. This checklist is continually improved based on feedback and experience from executing projects.
Hopefully, this article has given the reader an idea of how we practice secure AI engineering in AI Singapore. In the subsequent articles of this series, we will be diving into a focus area of SecureAI as mentioned in the ‘modelling’ section above: robustness testing. Stay tuned to learn more about the topic and our work in this area!