こんにちは、futabatoです。

今回は、One pixel attack for fooling deep neural networks ( Su, Jiawei, Danilo Vasconcellos Vargas, and Kouichi Sakurai., 2017 )の論文に目を通したので、論文メモとしてBlogに残しておきます。

One pixel attack for fooling deep neural networks

論文の概要

著者: Su, Jiawei, Danilo Vasconcellos Vargas, and Kouichi Sakurai.
年度: 2017
論文URL: https://arxiv.org/abs/1710.08864
被引用数: 1611
タグ: Blackbox, Untargeted Attack

One Pixel Attackは差分進化(Differential Evolution)に基づいて1pixelだけ変化させてAdversarial Attackを行える(Semi-)Blackbox型の攻撃手法です。 FGSMのようにすべてのpixelに摂動を加えずに少数のpixelに着目して、修正の強さには限度を設けないものになっています。

Fig. 1. One-pixel attacks created with the proposed algorithm that successfully fooled three types of DNNs trained on CIFAR-10 dataset

Abstract

Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector. In this paper, we analyze an attack in an extremely limited scenario where only one pixel can be modified. For that we propose a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE). It requires less adversarial information (a black-box attack) and can fool more types of networks due to the inherent features of DE. The results show that 67.97% of the natural images in Kaggle CIFAR-10 test dataset and 16.04% of the ImageNet (ILSVRC 2012) test images can be perturbed to at least one target class by modifying just one pixel with 74.03% and 22.91% confidence on average. We also show the same vulnerability on the original CIFAR-10 dataset. Thus, the proposed attack explores a different take on adversarial machine learning in an extreme limited scenario, showing that current DNNs are also vulnerable to such low dimension attacks. Besides, we also illustrate an important application of DE (or broadly speaking, evolutionary computation) in the domain of adversarial machine learning: creating tools that can effectively generate low-cost adversarial attacks against neural networks for evaluating robustness.

既存研究と比べてどこがすごい？

1pixelだけを変更するだけでUntargeted Attackが成功できる。
推論結果の確率ラベルのみを必要とするSemi-Blackbox型の攻撃手法で、targetクラスの確率ラベルの値を増加させることに直接フォーカスするため、既存のアプローチよりもシンプルである。
より多くのDNNモデルを攻撃できる柔軟性を持つ。

技術や手法のキモはどこ？

差分進化(Differential Evolution)に基づいてにAdversarial Perturbationsを生成する。

DEは最適化に勾配情報を用いないため目的関数が微分可能であることや既知であることを必要としない。したがって、勾配に基づく手法と比較して、より広い範囲の最適化問題に利用することができる。Adversarial Examplesの生成にDEを用いることで、主に以下のような利点がある。

大域的最適解の発見確率が高い
targetシステムからの情報は少なく済む
シンプルで使用する分類器に依存しない

どうやって有効だと検証した？

Kaggle CIFAR-10（通常のCIFAR-10と少し違い一部ノイズが乗っていたり画像処理が施されているため、より一般的なシナリオをシュミレートできる）を対象に、All Convolution Network、Network in Network、VGG16をターゲットモデルとして攻撃した。
(オリジナル)CIFAR-10を対象に、All Convolution Network、Network in Network、VGG16をターゲットモデルとして攻撃した。
ImageNet（ILSVRC 2012のテストデータ）を対象にBVLC AlexNetをターゲットモデルとして攻撃した。

議論はある？

Original-Target Class Pairs

Fig. 6. Heat-maps of the number of times a successful attack is present with the corresponding original-target class pair in one, three and five-pixel attack cases.

赤(縦)、青(横)のindexは、それぞれOriginal ClassとTarget Classを示す。0から9までの数字は、それぞれ、airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truckを表す。

それぞれのヒートマップ行列はほぼ対称になっていて、dog (5) → cat(3), cat(3) → dog(5) のような騙しやすいペアが存在している。dogとcatのペアが他のClassのペアと比べて容易に攻撃が成立するのは直感的にも理解できる。しかし、ship (8)とairplane (0)のペアは、ship (8) → airplane (0)簡単でも、airplane(0) → ship(8)は同じように成立しているわけではない。

これは、決定境界線の形と自然画像が境界にどれだけ近いかに起因している可能性がある。つまり、境界の形状が十分に広大であれば、境界から遠く離れたところに自然画像がある場合はそこからAdversarial Imagesを作り出すことは難しい。逆に、境界の形状が細長く、自然画像が境界に近い場合はAdversarial Imagesを作ることは容易だが、その逆は難しい。

Adversarial Imagesを作りやすいClassは、悪意のあるユーザーに悪用され、システム全体が脆弱になる可能性がある。しかし、今回のケースではネットワーク間で例外が共有されていないことがわかるため、今回のような攻撃を受けても脆弱性が悪用される可能性は低いと思われる。

Adversarial Perturbation

FGSMの論文にて多次元のpixel値に対する小さな摂動が蓄積されて出力に大きな変化をもたらすという仮説があったが、本論文では1pixelだけを変更するだけで攻撃に成功したので、その仮説は自然画像が摂動に敏感である理由を説明するために必ずしも必要でないことを示唆した。

1pixelだけを変更するだけで、異なるネットワーク、異なる画像サイズを通じて一般化されることがわかった。さらに、DEの反復回数を増やすか初期候補解のセットを大きくすることで攻撃の成功率はさらに向上するはず。DEの代わりにCo-variance Matrix Adaptation Evolution Strategyを使ってもよいかもしれない。