こんにちは、futabatoです。

今回は、Physical Adversarial Examples for Object Detectors (Song, Dawn, et al., 2018 )の論文に目を通したので、論文メモとしてBlogに残しておきます。

Physical Adversarial Examples for Object Detectors

論文の概要

著者: Song, Dawn, et al.
年度: 2018
論文URL: https://arxiv.org/abs/1807.07769
被引用数: 261
タグ: Physical, Object Detection, Targeted Attack, Disappearance Attack

画像認識モデルを騙す摂動は物理的なオブジェクトによっても作り出すことができます。

自動運転に代表されるようなリアルワールドで使用されるディープラーニングモデルにこういったセキュリティのリスクがあるのはとても興味深いですよね。

Figure 7: Sample frames from our attack videos after being processed by YOLO v2. In the majority of frames, the detector fails to recognize the Stop sign.

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial examples-maliciously crafted inputs that cause DNNs to make incorrect predictions. Recent work has shown that these attacks generalize to the physical domain, to create perturbations on physical objects that fool image classifiers under a variety of real-world conditions. Such attacks pose a risk to deep learning models used in safety-critical cyber-physical systems. In this work, we extend physical attacks to more challenging object detection models, a broader class of deep learning algorithms widely used to detect and label multiple objects within a scene. Improving upon a previous physical attack on image classifiers, we create perturbed physical objects that are either ignored or mislabeled by object detection models. We implement a Disappearance Attack, in which we cause a Stop sign to "disappear" according to the detector-either by covering thesign with an adversarial Stop sign poster, or by adding adversarial stickers onto the sign. In a video recorded in a controlled lab environment, the state-of-the-art YOLOv2 detector failed to recognize these adversarial Stop signs in over 85% of the video frames. In an outdoor experiment, YOLO was fooled by the poster and sticker attacks in 72.5% and 63.5% of the video frames respectively. We also use Faster R-CNN, a different object detection model, to demonstrate the transferability of our adversarial perturbations. The created poster perturbation is able to fool Faster R-CNN in 85.9% of the video frames in a controlled lab environment, and 40.2% of the video frames in an outdoor environment. Finally, we present preliminary results with a new Creation Attack, where in innocuous physical stickers fool a model into detecting nonexistent objects.

既存研究と比べてどこが凄い？

物理的な物体が検出器から無視されるDisappearance Attackを提案し、実際にYOLOv2から物体の位置を特定できなくさせたり、存在しない物体を検出できなくさせるようなperturbed physical objectsを作成した。

技術や手法のキモはどこ？

RP2アルゴリズム (Eykholt et al.) の位置・回転不変性を拡張して、比較的制御された実験環境において物体検出器を攻撃する方法を示した。

物体検出器は画像分類器に比べて出力構造がrichなため物体検出器に適した新しいLossを提案している。

Disappearance Attack Loss

ゴール: 物体の位置を特定できないようにする。

物体検出器が物体として認識するかどうかのしきい値（よくconfidence score thresholdとして設定するやつ）を下回ればよいので、物体がシーンに入ったときの最大のconfidence scoreを算出する損失関数の値を、しきい値を下回るまで直接最小化してやればよい。

Creation Attack Loss

ゴール: 存在しない物体を認識するようにモデルを騙す。

任意の既存のシーンに追加することができる物理的なステッカーを作成することで騙す。複合損失関数を使って、まず追加された物体（ステッカー）の位置を検出させて、ターゲットとなるClassに誤分類させるようにする。

Figure 5: Patch created by the Creation Attack, aimed at fooling YOLO v2 into detecting nonexistent Stop signs.

Figure 6: Sample frame from our creation attack video after being processed by YOLO v2. The scene includes 4 adversarial stickers reliably recognized as Stop signs.

どうやって有効だと検証した？

YOLOv2を使って屋内実験室と屋外環境において攻撃を評価した。屋内と屋外に分けて実験を行った。屋内の方が成功率が高い。 Faster R-CNNでも有効性を確認した。

議論はある？

今回は物体の位置を特定できない、または存在しない物体を検出するような攻撃をしたが、他にもいくつかの攻撃案がある。
- Bounding Boxを維持しつつ、Classを変える（これはTargeted Attackと同様）
- 人間には無意味に見える物体ではあるが、物体検出器だけが認識できる物体の生成
画像分類や物体検出器に対する攻撃をどうセマンティックセグメンテーションに応用させるかが課題。
今回は単一の物体検出器を騙すことしかやっていない。End2Endなパイプラインを侵害できるかどうかがリアルワールドへの影響度合のポイントになる。一般的には大多数の予測に基づいて制御を行うので、いくつかのフレームで騙せればよいわけではなく、大多数のフレームを騙す必要がある。

次に読むべき論文は？

Tシャツを着て騙す手法。これもYOLOv2だった気がするが、これはYOLOv2がターゲットにされていたのはたまたま時期が重なっただけなのか騙しやすいなのかがあるのか気になる。

arxiv.org

最後までご覧いただきありがとうございました。

アルゴリズム弱太郎

Twitter @01futabato10

Physical Adversarial Examples for Object Detectors

Physical Adversarial Examples for Object Detectors

論文の概要

Abstract

既存研究と比べてどこが凄い？

技術や手法のキモはどこ？

Disappearance Attack Loss

Creation Attack Loss

どうやって有効だと検証した？

議論はある？

次に読むべき論文は？