Optimism in the Face of Adversity

By LI Haoyang 2020.12.17 | 2021.1.9

Content

Optimism in the Face of AdversityContentOptimism in the Face of Adversity - 2020Geometric View of Adversarial RobustnessClassification FrameworkAdversarial RobustnessGeometric Insights of RobustnessWhy Geometry Matters ?DiscussionUnderstanding Deep Learning Through the Lens of Adversarial RobustnessAdversarial Robustness and GeneralizationDynamics of LearningApplications of Adversarial Robustness in Machine LearningInterpretability of Deep Neural NetworksTransfer LearningRobustness to Distribution ShiftsOther applications of adversarial robustnessFuture Research and Open QuestionsInspirations

Optimism in the Face of Adversity - 2020

Guillermo Ortiz-Jimenez, Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard. Optimism in the Face of Adversity: Understanding and Improving Deep Learning through Adversarial Robustness. arXiv preprint 2020. arXiv:2010.09624

This is a review. Most of these notes are directly taken from the original paper.

In this article, we provide an in-depth review of the field of adversarial robustness in deep learning, and give a self contained introduction to its main notions.

The goal of this article is to provide readers with a set of new perspectives to understand deep learning, and to supply them with intuitive tools and insights on how to use adversarial robustness to improve it.

Geometric View of Adversarial Robustness

Classification Framework

A deep neural network classifier can be described as a function , parametrized by a set of variables , which maps any input vector to a label .

Assume the weight space , the input space and the output space , the role of a classifier is to partition the whole input space into a set of regions with a classification label.

The decision boundary of the classifier can then be defined as the set of points that lies at the intersection of two classification regions with different labels.

Given a data distribution over pairs , the goal of a learning algorithm is to find a classifier that maps any input to a label , such that its expected risk on is minimized, i.e.

where defines a suitable loss function between the output of a network and the true class label .

In practice, we only have a set of training samples . The problem is usually approximated with empirical risk minimization, i.e.

The difference between the expected risk and empirical risk attained by a classifier is known as the generalization gap of a classifier.

Modern feed-forward neural networks are formed by the composition of multiple layers where the output of layer depends only on the output of the previous layers, i.e.

where can be any general differentiable or subdifferentiable mapping parametrized by some weights .

It normally consists of the combination of linear operators (e.g., convolutions) and pointwise non-linearities (e.g., rectified linear units (ReLU)).

In classification settings, the entries of the output of the last layer with are generally referred to as logits.

Logits are normally mapped to a set of probabilities using a softmax operator, i.e.

and the predicted class of a neural network classifier is the index of the highest estimated probability

In fact, for most neural networks used in practice, a closed-form analysis of their properties, such as the analytical characterization of their classification regions, is not possible with our current mathematical tools.

A useful way to study the properties of the decision boundary of a neural network is to visualize its cross-section with some two-dimensional plane.

Given two orthonormal vectors , such that , and a data sample , we can visualize the decision boundary of a neural network in the vicinity of by plotting

for some specific values of and . These cross-sections varies a lot depending on the choice of and .

Besides, a notable feature of most neural networks trained on high-dimensional datasets is that, in most random cross-sections, the decision boundary appears relatively far from any typical data sample, as shown in Figure 3(d).

This can be made more rigorous if one studies the robustness of a neural network to additive random noise, i.e. the probability that a given data sample perturbed by a random vector is classified differently

Indeed, for most neural networks used in practice one needs to add noise with a very large variance to fool a classifier [6].

A similarity that suggests that, despite their complex structure, neural networks create simple decision boundaries in the vicinity of data samples.

Adversarial Robustness

Surprisingly, for virtually any and we can always find some adversarial perturbations, which suggests that there always exist some directions for which the decision boundary of a neural network is very close to a given data sample.

An adversarial perturbation is defined as the solution to the following optimization problem

where represents a general objective function and denotes a general set of constraints that characterize the perturbations. The perturbed samples is referred to as adversarial examples.

Different objective and constraints lead to different adversarial examples:

In the adversarial robustness literature, the algorithms that try to solve (8) are typically referred to as adversarial attacks.

Hence, most attacks only obtain an approximate solution to this problem. Surprisingly though, such approximate solutions are usually easy to find using first-order optimization methods [27]–[30].

The adversarial robustness of a classifier can be defined as the worst-case accuracy of a neural network subject to some adversarial perturbation, i.e.

In particular, the value of – more generally, the size of the constraint set in (8) – reflects the strength of the attacker, and, in combination with the choice of metric, e.g., -norm, determines the threat model of an adversary.

Measure by attack gives an upper bound of adversarial robustness, i.e. the adversarial robustness cannot be higher than it.

One can also measure robustness of a neural network as the average distance of any data sample to the decision boundary of a network, i.e.

In this geometric formulation, robustness becomes purely a property of the classifier and it is agnostic to the type of algorithm used to craft adversarial perturbations.

Still, one can verify the robustness of a classifier by checking if a safe distance exists between its decision boundaries and all data samples. That is, a classifier is certifiably -robust (in an -sense) if it always outputs a constant label in an ball of radius around any typical data sample.

Measure by distance to the boundary is attack agnostic and gives a lower bound of adversarial robustness, i.e. the adversarial robustness is at least it.

Adversarial training takes the attack process into the objective, i.e.

but solving it introduces much more complexity in the training.

Alternatively, there exist multiple constructive methods that try to improve robustness by maximizing (12) or (13). These algorithms are generally known as adversarial defenses.

In fact, adversarial training [27], [29], [30], a method that augments/replaces the training data with adversarial examples crafted during training, is up to this date one of the most effective adversarial defense methods.

Nevertheless, its greater computational cost – sometimes up to fifty times higher than standard training – makes it impractical for many large-scale applications.

Adversarial training, i.e. (14) is the best defense method approved in literature. But it's computationally super expensive.

Moreover, it has empirically been shown that performing adversarial training to improve the robustness to a certain type of perturbations, e.g., does not improve robustness to other types of attacks, [48], making the system vulnerable to other threat models [49], [50].

Adversarial training can only increase robustness in the training threat model.

Geometric Insights of Robustness

We define the local geometry of a neural network as the geometric properties of the input space of a deep classifier in an -neighborhood of a data sample .

The characteristics of most adversarial examples are linked to the local geometry of deep classifier by construction, i.e. they are found in the vicinity of clean examples.

For example, the success of adversarial attacks based on first-order methods to approximate (8) demonstrates that deep classifiers are relatively smooth and simple, at least in the vicinity of data samples.

Indeed, even if popular adversarial attacks methods like FGSM [29], DeepFool [27], C&W [28], or PGD [30] are prone to get trapped in local optima in non-convex settings, their relative success demonstrates that the local geometry of deep neural networks is, in practice, approximately smooth3 and free from many irregularities (see Figure 3 and Figure 4b).

Local minimal of adversarial perturbations work well, it indicates that the decision boundary is relatively smooth and simple.

The aforementioned attacks can, therefore, be used to identify points lying exactly on the decision boundary, i.e., minimal adversarial examples.

Surprisingly, these normals are very correlated among different data points, and the set of all perturbations of a network only spans a low-dimensional subspace of the input space [53], [54].

This explains the intriguing observation that adversarial examples transfer well between different architectures, i.e., the adversarial perturbation on a sample of a network is likely to fool also another network trained on the same dataset.

Classifiers learn the same dataset in a similar way, it brings the transferability of adversarial examples.

It naturally leads to the investigation of curvature as a central property of the local geometry of a neural network.

The loss landscape of a neural network is a function that maps any input sample to , where is usually taken as the true label of .

Using this function we can approximate the local geometry of a neural network using a second-order Taylor decomposition around and write

where denotes the gradient of the loss with respect to the input, and its second-order derivative, or Hessian.

By studying the terms in this decomposition for a trained neural network, we can discover two main things.

The minimal perturbations are very aligned with the gradients in most networks, making the attack methods based on first-order derivatives effective.

Furthermore, it has been shown [54], [55] that the principally curved directions of a deep classifier are also aligned, in the same way that the normals to the decision boundary are correlated among data points.

This explains the existence of universal adversarial perturbations, i.e.

where controls the misclassification probability of the perturbations.

The connection between curvature and UAPs stems from the fact that most of the energy of UAPs is concentrated in the subspace spanned by the shared directions of high curvature.

The existence of universal adversarial perturbations indicates that the decision boundaries for most classes are curved in a similar way....

Why Geometry Matters ?

In fact, the “true” robustness is intrinsically a geometric concept, as it necessarily implies that the distance between the data points and the decision boundaries should be large.

They think that a geometrically robust model is a truly robust model, since the decision boundary is pushed away from the examples significantly.

This is a little too harsh. In practice, a model can be viewed as robust as long as in an economical profitable time, the attackers cannot find an adversarial example, or any adversarial example crafted by the attackers have a semantically different meaning from the original example.

The first – and perhaps expected – feature to notice is that adversarially trained networks seem to create boundaries that are further apart from most data samples [27]6.

However, the most interesting geometric change is the fact that the decision boundaries of adversarially trained networks exhibit a lower mean curvature than that of standard models [59] (see Figure 4a).

Adversarially trained models push the decision boundary away from data points by construction and also makes the decision boundary smoother.

Indeed, certifiable adversarial defenses, like randomized smoothing [63]–[65], also implicitly regularize curvature by averaging the decision of a classifier on randomly perturbed samples.

Making the decision boundary smoother also increases adversarial robustness.

This “local flatness” property has been very important in the design of computationally efficient adversarial attacks in challenging settings like the or .

The local linearity can also be exploited for better attack methods.

Discussion

Many properties of adversarial examples actually suggest that their existence is tightly linked to the way neural networks learn. Interestingly, this also means that adversarial examples may reveal important information about the inner workings of deep neural networks.

Informally speaking, one can see humans and neural networks as two classifiers with good generalization capacity that leverage very different information.

Indeed, neural networks have good test accuracy, but the existence of adversarial examples is a very strong evidence that the way neural networks generalize is by exploiting different features of the data than those humans use.

Understanding Deep Learning Through the Lens of Adversarial Robustness

Adversarial Robustness and Generalization

How is it possible that neural networks generalize so well, when their outputs are so sensitive to perturbations?

Why do most adversarial defenses, such as adversarial training, hurt performance on the clean test set?

Despite the current debate, however, there is a general consensus about the tight connection between the adversarial robustness and the generalization phenomena.

Dynamics of Learning

Understanding how a neural network evolves during training is a complicated yet crucial problem that has lately attracted a lot of attention.

Applications of Adversarial Robustness in Machine Learning

Adversarial robustness is not only relevant to security or theoretical understanding of deep networks, it has had a significant impact on many other fields of machine learning, such as anomaly detection [117], [118], privacy [119]–[121], or fairness [122]–[127].

Interpretability of Deep Neural Networks

Interpretability and adversarial robustness are heavily related, not only because both fields make an extensive use of geometry, but also because the recently discovered connection between adversarial examples and features has opened the door to a completely new framework for interpretability.

Transfer Learning

Transfer learning [153], [154] refers to the common practice in deep learning by which a neural network trained on one task (source) is adapted to boost performance on another related task (target).

Here, the main assumption – which has been validated in practice [155]–[157] – is that the learned representations of the source model can be translated, or transferred, to the target model via fine-tuning.

Robustness to Distribution Shifts

Normally trained neural networks are brittle. Not only in the adversarial sense, but also their performance suffers when their test data is slightly different than the one used for training.

In the machine learning community, the robustness to this type of general transformations is known as robustness to distribution shifts or out-of-distribution generalization.

Naturally occurring shifts can be caused due to multiple reasons such as common corruptions [166], or dataset collection biases [167] (see Figure 14).

Robustness to out of distribution is more naturally meaningful and is a line of much long history.

Adversarial robustness can act as a strong proxy for this task, allowing to compute lower bounds on the robustness of these systems, but also improve general robustness to a wide range of shifts when tuned properly.

Other applications of adversarial robustness

In the context of anomaly detection, adversarial perturbations have been used to generate synthetic anomalous samples which do not belong to the data manifold [117], [118].

Use adversarial examples as anomaly.

In data privacy, adversarial techniques have also been exploited, for example, to identify sensitive features of a sample which can be vulnerable to attribute inference attacks [119].

Use adversarial robustness as feature filters.

However, it has also recently been argued that adversarial robust models are more vulnerable to membership inference attacks.

Adversarially trained model brings robustness and learn the dataset better, it must contain more information that may be exploited by membership inference attacks.

In the context of fairness, adversarial machine learning has recently found multiple applications [122]–[126], from the use of adversarial training to favour invariance towards racially-, or genderly-, biased features [122]; to the formulation of different notions of fairness, such as the right to an equally robust machine learning prediction [127].

Adversarial training brings some invariance for sure....

In particular, the ability of adversarial defenses to induce invariance to certain transformations of the data, and hence, learn better representations of its features, is making adversarial robustness a fundamental piece of the deep learning toolbox.

Future Research and Open Questions

Whether it is possible to obtain robust models that perform on par with standard ones in terms of accuracy, even in the strongest attack setting, is still an open problem.

How to get a robust and accurate model?

Factoring in the effect of adversarial examples in the development of new generalization bounds for deep learning is, therefore, an important line of research.

What's the generalization bounds in the presence of adversary?

Then, in relation to constructive methods, the important question of how and why do standardly trained neural networks choose, and prefer, non-robust features of a dataset remains widely open.

Why a standard model does work through the way we want?

However, obtaining models that are robust to a diverse set of naturally occurring distribution shifts stays an open problem.

Naturally adversarial examples are not well handled yet....

Also, it is important to emphasize that adversarial training is a computationally expensive procedure. For this reason, despite its ability to learn more robust representations, it is not widely adopted outside of the computer vision applications.

How to get a cheaper robust model?

Nevertheless, if we only judge the performance of robust models based on their vulnerability to certain adversarial attacks, e.g., with a specific value of , we might overlook some other aspects of robustness beyond merely security concerns.

Using attack as the metric may overlook other benefits of the defense methods.

In general, it is important that the research on adversarial robustness should be steered towards defining novel set of benchmarks that also test the performance of adversarially robust models in other applications such as transfer learning, interpretable machine learning, and image generation.

Inspirations

This is a nice tutorial about adversarial learning, from the origin of adversarial examples to the various applications of adversarial robustness.

The following points are most interesting to me: