Ensuring robustness in AI systems against malicious attacks garners greater attention
Over the past few years, Artificial Intelligence (AI) has exploded in capability and is becoming increasingly common in our everyday lives. From Tesla’s Autopilot to wider adoption of voice assistance found on smartphones to diagnostics assistance for medical workers in medical imaging and more recently, in the detection of Covid-19.
However, not all adoptions went without unintended consequences. For example, a 2018 collision involving a Tesla killed a driver; Amazon’s AI recruitment system discriminated women in the job application process; or an AI cancer diagnosis exposed private information on whose data was used.
These accidents are all vivid reminders that, with more AI models deployed in systems impacting human life, it will be important to manage the unintended consequences that could become harmful. Otherwise, it will adversely impact the society’s trust towards AI and its applications, which will subsequently hinder the full realization of its benefit.
The benefits that AI brings is plenty and tremendously huge. According to a report by the McKinsey Global Institute published in September 2018, AI has the potential to incrementally add 16 percent or around $13 trillion to the global economic output by 2030, amounting to an average contribution to productivity growth of about 1.2 percent annually.
It is therefore critical to enable a growth-driven yet safeguarded adoption commonly referred to as Trustworthy AI.
Although there is no formal definition of Trustworthy AI, it is well accepted that a few qualities need to be taken into consideration when designing an AI system, including explainability, fairness, robustness, etc. Each of them would warrant a separate discussion. This article will focus on robustness, with future articles outlining the other principles to follow. In the following, we will elaborate on what is robustness in AI and how we can manage it, such that accidents are less likely to happen and adoption can grow based on trust in AI.
What is AI?
An AI system, to a large extent, is a software system but the key difference lies in the way it is built. With AI (specifically machine learning), we no longer write code to tell the system explicitly what to do. Instead, we “teach” the system by providing it with examples.
This process is called training and the examples that are provided to the AI system constitute the training data.
Through the training process, the system develops a model which is capable of completing a task, for example, the detection of road signs (for autonomous vehicles). The training process needs information on how to utilize the data, what type of model (e.g., decision tree or neural network) to use, etc. This information is usually codified in the training algorithm.
The model’s effectiveness and accuracy are then evaluated using a separate set of examples called the testing data.
When the model achieves the required level of performance, it is deployed into production together with a set of logic that enables it to interact with other system components or the external world.
An AI system thus comprises three main components: data, model and code.
Robustness against potential attacks
Traditional software systems are often secured by measures such as establishing a security perimeter to prevent intruders from gaining access to the system and writing secure code to prevent exploits such as SQL injection. With AI systems, however, there are two additional dimensions that expand the attack surface.
The first has to do with the data that is used to train the AI model. The volume and quality of the data are key to the effectiveness and accuracy of the AI model. This data is usually collected from sources that lie outside the security perimeter of the system, and could potentially expose the system to a new suite of attack vectors.
For example, attackers could inject bad data into the training data and cause the AI to extract wrong patterns from the manipulated data. According to a recent study by MIT, the 10 most cited AI datasets are loaded with label errors. For example, ImageNet: the canonical image recognition dataset, has an estimated label error rate of 5.8%. The researchers also looked at the 34 models whose performance have previously been measured against the ImageNet test set. They re-evaluated each model against 1,500 examples where the labels were found to be wrong. It was found that the models, which did not perform well previously on the original incorrect labels, are now performing much better after the labels were corrected. This essentially suggests that the label errors have a negative impact on the model performance.
Now, imagine in the case of the road sign detection in autonomous vehicles, the attacker could deliberately inject a large percentage of stop signs that are labelled wrongly as speed limit. If this data poisoning is undetected and the model extracts the wrong pattern, the AI model in the autonomous vehicle would potentially misclassify the stop sign as a speed limit sign which could result in life threatening consequences.
Another dimension is the AI model itself. When training an AI model, the focus is usually on its generalization capability, i.e., how well it performs with unseen data. Using the road sign example, the model should be able to detect road signs on the road even if they may look different from those in the training data, e.g., discoloration of the sign due to the sun or the growing of moss and vines on the signboard.
To achieve this, the model is trained on one set of training data and then evaluated on a set of testing data which it had not been exposed to during the training phase. The testing data is usually derived from the same distribution as the training data, so that the evaluation could provide an unbiased estimation of the AI model’s performance. A fundamental assumption here is that all future unseen data will be from a similar distribution as the training data (in-distribution generalization).
However, in the real world, another type of unseen data is more common – unseen data that is statistically different from the training data. This type of unseen data can undermine the robustness of AI models and cause them to be brittle.
The inability to handle this type of unseen data can have serious implications, such as the Tesla autopilot system failure mentioned earlier. This showed a real-world example of the impact of unseen data.
There are attacks targeting to exploit these vulnerabilities, which are called evasion attacks. An evasion attack happens when the AI model is given a carefully perturbed input that is undifferentiable from its original copy to a human but completely confuses the AI model (See Figure 2). For example, an attacker could attempt an evasion attack by erecting a road sign designed to mislead the AI model, e.g., a sign that is interpreted as a higher speed limit than allowed on a particular road. It is also reported that a similar evasion attack can be launched by flashing a beam of laser on the road sign, causing the AI model to misclassify trolleybus as amphibian and parking sign as soap dispenser.
To enhance the robustness of AI models and ensure that they can be deployed safely, the models will need to be trained to go beyond in-distribution generalization and work better with unseen data that is statistically different from the training data (out-of-distribution generalization) as well.
The key to Robust AI
Three key ingredients, people, process, and technology will play a big role in making AI robust.
First, we will need to train people with good knowledge of how AI works and where it can go wrong.
AI Singapore has been playing a significant role in this aspect with its flagship talent programme, the AI Apprenticeship Programme (AIAP)®. AIAP sets out to train AI professionals for the industry using a combination of deep-skilling training and hands on experience with a real-world project, all with an emphasis on self-directed learning.
After completing the AIAP® programme, the apprentices will be well equipped with the necessary skill set to tackle the challenges of developing and deploying AI systems in the industry.
AI Singapore also has other programmes for less technical audiences like AI for Everyone (AI4E)®, AI for Kids (AI4K)®, AI for Students (AI4S)®, and AI for Industry (AI4I)® to prepare the future generations of AI talents.
In addition, AI Singapore is working together with the national cybersecurity community to equip cybersecurity professionals with new AI skillsets. To that end, AI Singapore has signed a collaboration MOU with the International Information System Security Certification Consortium (ISC)2 Singapore Chapter to augment cybersecurity professionals with AI proficiency.
Second, as many organizations are starting to move AI use cases from pilot to production, there are more concerns about the unintended consequence of AI systems. There are emerging attempts to regulate AI systems, e.g., EU’s proposed AI regulation.
Nevertheless, an operationalizable process guideline is still missing to guide organizations in developing, verifying, deploying, and monitoring AI systems. To this end, AI Singapore is working with a number of partners from both the public and private sector to define a technical reference which focuses on aspects like robustness and resilience, transparency, explainability, data protection in AI systems, and how to evaluate these qualities.
We are also putting great effort in making those process guidelines operationalizable by incorporating them into our MLOps pipeline. Last but not least, technology-wise, we will also need to build new tools to make the best practices and standards we are setting up more operationalizable. For example, AI Singapore is currently working with its collaborator from the NTU Cybersecurity Lab to develop a tool which is able to evaluate a trained AI model’s out-of-distribution generalization. Every time a new model is trained a report will be generated prior to deployment which outlines the model’s ability to perform to new and unseen data in order to estimate its readiness for the real world.
Ensuring the robustness of AI systems is trickier than securing traditional software due to the additional aspects of data and model training. These vulnerabilities can lead to new attacks such as data poisoning and model evasion. With more AI models being deployed in systems where there is significant impact to human life, these attacks could have wider security and safety implications that have to be addressed with the right combination of people, process and technology. In this article, we have presented a quick overview of what AI Robustness is and what AI Singapore is working on. In the upcoming articles of this series, we will delve deeper into the details of our endeavours regarding the related principles of trustworthy AI.
- Tesla Autopilot: https://www.tesla.com/en_SG/autopilot
- Software 2.0: https://karpathy.medium.com/software-2-0-a64152b37c35
- Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks: https://arxiv.org/abs/2103.14749. Accessed on 28 April 2021
- DARTS: Deceiving Autonomous Cars with Toxic Signs: https://arxiv.org/abs/1802.06430. Accessed on 28 May 2021.
- Adversarial Laser Beam: Effective Physical-World Attack to DNNs in a Blink: https://arxiv.org/abs/2103.06504. Accessed on 17 Jun 2021.
- These are just two of the many possible attacks to AI systems. NIST has published a taxonomy and terminology of possible attacks in https://csrc.nist.gov/publications/detail/nistir/8269/draft.