Adversarial Machine Learning - WhoisNexus Academy

Deep Neural Networks are "Black Boxes". We don't fully understand their decision boundaries. Adversarial Attacks exploit this by finding inputs that are slightly perturbed to cross the boundary.

Model Extraction (Theft)

Companies spend Millions training models (GPT-4).
Attackers can steal the model by simply querying it.
"Input: A. Output: 1."
"Input: B. Output: 2."
After enough queries, the attacker can train a "Shadow Model" that behaves exactly like the victim model, without having the source code or weights.

1. Sponge Attacks

A Denial of Service (DoS) for AI.
The attacker feeds inputs that are mathematically designed to consume maximum Energy and Latency.
For NLP models, this might be a paragraph with complex nested grammar that causes the Transformer to stall.

2. Evasion Attacks

Bypassing antivirus/malware detection.
If an AI scans for malware, hackers can append "Good Strings" (from a benign calculator app) to their virus code.
This shifts the vector enough to fool the AI into classifying it as benign.

Model Extraction (Theft)

1. Sponge Attacks

2. Evasion Attacks

Related Topics