OpenAI's 'Jailbreak-Proof' New Models? Hacked on Day One

Secret3 AI August 07, 2025 1 min read

On this page

OpenAI released its first open-source models, GPT-OSS-120b and GPT-OSS-20b, with claims of enhanced security against jailbreaks. However, just hours later, the known jailbreaker Pliny the Liberator managed to bypass these security measures, showcasing how the models could still provide sensitive instructions like making illegal drugs and malware. OpenAI had previously emphasized rigorous adversarial training and fine-tuning processes to prevent such exploits, but this incident has raised questions about the effectiveness of their safety protocols. Pliny’s methods involved creative prompting to exploit the models, echoing techniques used on earlier OpenAI versions. This breach illustrates ongoing challenges in AI security and has prompted reactions from the community, treating Pliny's success as a victory against corporate safety measures. The timing is particularly damaging for OpenAI as it prepares for the launch of its anticipated GPT-5 upgrade.

Source 🔗

Join our research newsletter!

Value-packed daily reports covering news, markets, on-chain data, fundraising, governance, and more – sent to your inbox. Saving you 1 hour of research daily.

Aug 7, 2025 1 min read

Load More You've reached the end of the list