The memory angle is what stands out—once something gets persisted, the impact goes beyond just one interaction.

6404e30b28

In this post, we explore how ChatGPT generated an adversarial image that hijacked my 

 to invoke the memory tool and persist false memories for future chats.

 is genuinely a lot harder to attack than previous models, but it still fell for a ChatGPT generated image. A trick that works well with reasoning models is to challenge them with puzzles.

Claude Opus 4.6+ is more resilient against basic attacks, and reasons before taking actions. This means that most of the well-known, basic adversarial examples and attacks typically do not work.

This is also reflected in Anthropic’s own model card for Mythos.

What is interesting here is that the “thinking” variants of Opus models (and also Mythos Preview) are 

 compared to the non-thinking models. That is also what I have noticed in my testing in the past.

 to demonstrate attack chains, and there are also interesting projects like 

 to look into and come up with payloads.

Once in a while I go back to look at some basics, and when 

 dropped, I was wondering if some demos I had created for 

This post is about such a demo, in particular we are going to use ChatGPT to create a malicious image.

> In this post, we explore how ChatGPT generated an adversarial image that hijacked my `Claude Opus 4.7` to invoke the memory tool and persist false memories for future chats.
>
>
> ![](https://m.stacker.news/138603)
>
>
>
> This matters because `Opus 4.6+` is genuinely a lot harder to attack than previous models, but it still fell for a ChatGPT generated image. A trick that works well with reasoning models is to challenge them with puzzles.
>
> ## Indirect Prompt Injection and Alignment Progress
>
> Claude Opus 4.6+ is more resilient against basic attacks, and reasons before taking actions. This means that most of the well-known, basic adversarial examples and attacks typically do not work.
>
> This is also reflected in Anthropic’s own model card for Mythos.
>
>
> ![](https://m.stacker.news/138604)
>
>
>
> What is interesting here is that the “thinking” variants of Opus models (and also Mythos Preview) are **more susceptible to prompt injection** compared to the non-thinking models. That is also what I have noticed in my testing in the past.
>
> Researchers already showed [attack scenarios](https://veganmosfet.codeberg.page/posts/2026-02-15-openclaw_sandbox/) to demonstrate attack chains, and there are also interesting projects like [PISmith](https://arxiv.org/abs/2603.13026) to look into and come up with payloads.
>
> Once in a while I go back to look at some basics, and when `Opus 4.7` dropped, I was wondering if some demos I had created for `Opus 4.6` would still work…
>
> This post is about such a demo, in particular we are going to use ChatGPT to create a malicious image.
>
> [**...read more at embracethered.com**](https://embracethered.com/blog/posts/2026/breaking-opus-4.7-with-chatgpt/)

Indirect Prompt Injection and Alignment ProgressIndirect Prompt Injection and Alignment Progress