こんにちは、futabatoです。

今回は、Explaining and Harnessing Adversarial Examples (Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy., 2015)の論文に目を通したので、論文メモとしてBlogに残しておきます。

https://arxiv.org/abs/1412.6572arxiv.org

Explaining and Harnessing Adversarial Examples

論文の概要

著者: Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy.
年度: 2015
論文URL: https://arxiv.org/abs/1412.6572
被引用数: 11296
タグ: Whitebox, Untargeted

Adversarial Example系でおそらく一番有名な論文です。
FGSMを使ってパンダをテナガザルに誤分類させてるみんな大好きなやつですね。

Figure 1: A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedy et al., 2014a) on ImageNet.

Abstract

Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.

既存研究と比べてどこがすごい？

Neural Networkが敵対的な摂動に脆弱なのはその線形性にあると仮説を立て、十分高次元な空間における線形性がAdversarial Exampleを引き起こす可能性があると主張した。

技術や手法のキモはどこ？

本論文で提案されているFGSM(Fast Gradient Signed Method)は、Nueral Networkの勾配を利用してAdversarial Exampleを生成するアルゴリズムになっている。

FGSMはWhiteBox型の攻撃となっている。Adversarial Attackを行う際には、モデルそのものは修正できないため、学習済みモデルのパラメータθは定数として扱えるので入力データxを調整することでLossを大きくすることができる。画像の各PixelがLossにどれくらい関与しているかを求めてそれに応じて摂動を追加していく。勾配は backpropagationを使って簡単に算出することができる。

どうやって有効だと検証した？

FGSMを適用した例として、ImageNetで学習したGoogLeNetにAdversarial Attackすることでパンダからテナガザルへ誤分類できていることを示している。

MNISTを使って、エラー率を敵対的学習なしの0.94%から敵対的学習(FGSMに基づく正則化)ありの0.84%に減少させることができ、汎化性能の向上が確認できた。

議論はある？

特定の点ではなく、摂動の方向が最も重要である。
線形モデルは敵対的な摂動に対抗する能力がない。隠れ層がある構造だけが敵対的摂動に耐えられるように訓練すべきである。
RBF(Radial Basis Function)ネットワークはAdversarial Examplesに対して耐性がある。
アンサンブルはAdversarial Examplesに対して耐性がない。

次に読むべき論文は？

最近Twitterに流れてきたSmooth Adversarial Trainingにて、ReLUのnon-smoothな性質がAdversarial Trainingを阻害していると主張しているそうなので気になっている。
FGSMに少し手を加えるとBlackbox型の攻撃手法としても使えるらしい。どの論文にそれが書かれているのかわからないが、見つけたら読んでおきたい。

最後までご覧いただきありがとうございました。