Image Scaling Exposes AI Systems To Hidden Prompt Attacks
As artificial intelligence tools for image generation and processing become more integrated into our daily workflows, the need for robust security measures is paramount. A new attack vector has been identified by security researchers, demonstrating a clever way to exploit AI systems for malicious purposes like data exfiltration. This novel strategy combines a known vulnerability, image scaling attacks, with prompt injection to secretly execute harmful commands.
How the Image Scaling Attack Works
Researchers at the cybersecurity firm Trail of Bits have detailed how prompt injection can be coupled with image scaling to compromise AI tools. This technique can trigger actions ranging from opening an application to stealing sensitive data, all while remaining hidden from the user.
The foundation of this method lies in image scaling attacks, a concept first introduced by German researchers in 2020. AI systems typically downscale large images to process them more efficiently. Attackers can exploit this resizing process to alter how the model perceives the image. The Trail of Bits team has now weaponized this for prompt injection.
Source: Trail of Bits
In this attack, a malicious prompt is embedded within an image in such a way that it is invisible at the image's full resolution. However, when an AI system automatically downscales the image, the change in resolution reveals the hidden text to the model. The AI then interprets this text as a direct instruction and executes the embedded command without the user’s awareness.
To prove the concept, the researchers demonstrated an attack on the Gemini CLI. They uploaded an image with a hidden prompt designed to exfiltrate a user's Google Calendar data to an external email address. You can read the full technical details in their comprehensive post.
Widespread Vulnerability Across Major AI Platforms
The researchers warn that this attack is not isolated to a single system. With minor modifications, it can be effective against a wide range of popular AI platforms, including:
- Gemini’s web interface
- Gemini’s API
- Vertex AI with a Gemini back end
- Google Assistant on Android
- Genspark
To help others test for this vulnerability, the team has released an open-source tool called "Anamorpher" on GitHub. This Python-based tool allows users to craft images designed for multimodal prompt injection attacks and visualize how they work when downscaled.
Recommended Safeguards and Mitigations
Simply restricting certain downscaling algorithms is not an effective solution due to the broad nature of the attack vector. Instead, the researchers propose several more robust mitigation strategies.
One recommendation is to limit the dimensions of uploaded images and, where possible, avoid downscaling them altogether. Furthermore, providing users with an exact preview of the image as the model will see it could help reveal hidden prompts that are not visible in the full-size version.
Most importantly, the researchers urge developers to implement stronger defenses against multimodal prompt injection. A key strategy is to require mandatory user confirmation before the AI model executes any instruction it interprets from text found within an image.