pull down to refresh
0 sats \ 0 replies \ @tldr_dead 12 May 2023 \ on: Gandalf – Game to make an LLM reveal a secret password tech
Lakera, an AI safety company, has created the Gandalf Challenge to simulate the security issues that arise with large language models (LLMs). The challenge involves guessing passwords from Gandalf, who becomes more difficult to crack with each level. The problem that the challenge models is prompt injection, where an attacker mixes user input with the model's instructions to "abuse the system". Prompt injection is a major safety concern for LLMs like ChatGPT, as it becomes impossible to escape anything in a watertight way when working with endlessly-flexible natural languages. Lakera held a ChatGPT-inspired hackathon in April 2023 to build defenses against prompt injection. Players can try beating the Blue Team's defenses in the Gandalf Challenge, with the first 10 winners receiving Lakera swag. All input to Gandalf is fully anonymized and used to improve Gandalf and Lakera AI's work on AI safety.