こんにちは、futabatoです。

今回は、Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks( Zheng, Zhihao, and Pengyu Hong., 2018)の論文に目を通したので、論文メモとしてBlogに残しておきます。

Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks

論文の概要

著者: Zheng, Zhihao, and Pengyu Hong.
年度: 2018
論文URL: https://proceedings.neurips.cc/paper/2018/hash/e7a425c6ece20cbc9056f98699b53c6f-Abstract.html
被引用数: 81
タグ: unsupervised learning, defense

DNN分類器の本質的な特性をモデル化することにより、Adversarial Exampleを検出する戦略を提案してます。提案・実装されたI-defenderは、自然データを与えられたDNN分類器の隠れ状態分布をモデル化して、それを用いてAdversarial Exampleを検出することができます。

Figure 1: Hidden state distribution examples

Abstract

It has been shown that deep neural network (DNN) based classifiers are vulnerable to human-imperceptive adversarial perturbations which can cause DNN classifiers to output wrong predictions with high confidence. We propose an unsupervised learning approach to detect adversarial inputs without any knowledge of attackers. Our approach tries to capture the intrinsic properties of a DNN classifier and uses them to detect adversarial inputs. The intrinsic properties used in this study are the output distributions of the hidden neurons in a DNN classifier presented with natural images. Our approach can be easily applied to any DNN classifiers or combined with other defense strategy to improve robustness. Experimental results show that our approach demonstrates state-of-the-art robustness in defending black-box and gray-box attacks.

既存研究と比べてどこがすごい？

既存の研究では未知の攻撃(adversarial patterns)に悩まされている。Defense-GANも時間的コストがあるうえ、性能はGANの品質に依存し、複雑なタスクの訓練は困難である。

本手法は攻撃手法を知る必要がなく、敵対的なサンプルを用いて分類器を学習させる必要もない。

技術や手法のキモはどこ？

隠れ状態空間の次元は入力空間の次元よりもはるかに低いことが多く、隠れ状態分布は入力分布よりもはるかにモデル化しやすいと言える。

自然データを提示したDNNの隠れ状態分布を、IHSD(intrinsic hidden state distribution)と呼んでいる。I-defenderは、敵対的な入力がIHSDの低密度に横たわる隠れ状態を生成する傾向があるため、分類器のIHSDを使用して敵対的な入力を拒否している。

DNN分類器の隠れ状態分布の近似にはGaussian Mixture Modelを使っている。

どうやって有効だと検証した？

MNIST, Fashion-MNIST, CIFAR-10を使ってFGSM, DeepFoolなどの攻撃手法から防御した。
攻撃タイプ別に整理して結果を評価した。

議論はある？

用途にとっては隠れ状態の分布を近似するためにGMMを他のより適切なモデルに置き換えることができる。入力ではなくDNNの隠れ状態をモデル化しているため、テキスト等他のモダリティにも応用が可能。

次に読むべき論文は？

弱点らしき記述が見つけられなかったので、この論文を引用しているものから知見を得たい。

最後までご覧いただきありがとうございました。

今日をどう過ごそうか

Twitter @01futabato10

Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks

Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks

論文の概要

Abstract

既存研究と比べてどこがすごい？

技術や手法のキモはどこ？

どうやって有効だと検証した？

議論はある？

次に読むべき論文は？