OpenAI's 'Jailbreak-Proof' New Models? Hacked on Day One
OpenAI released its first open-source models, GPT-OSS-120b and GPT-OSS-20b, with claims of enhanced security against jailbreaks. However, just hours later, the known jailbreaker Pliny the Liberator managed to bypass these security measures, showcasing how the models could still provide sensitive instructions like making illegal drugs and malware. OpenAI had previously emphasized rigorous adversarial training and fine-tuning processes to prevent such exploits, but this incident has raised questions about the effectiveness of their safety protocols. Pliny’s methods involved creative prompting to exploit the models, echoing techniques used on earlier OpenAI versions. This breach illustrates ongoing challenges in AI security and has prompted reactions from the community, treating Pliny's success as a victory against corporate safety measures. The timing is particularly damaging for OpenAI as it prepares for the launch of its anticipated GPT-5 upgrade.
Source 🔗