Red-Teaming Large Language Models to Identify Novel AI Risks | OSTP

By Alan Mislove, OSTP Assistant Director for Data and Democracy

President Biden has been clear: To seize the benefits of artificial intelligence (AI), we must first manage its risks. The Biden-Harris Administration has been working to ensure that AI is developed safely and responsibly to protect Americans from harm and discrimination. Earlier this month, leading AI companies provided their large language models (LLMs) for the first-ever public assessment “red-teaming” event. The event was held at the AI Village during DEF CON 31, one of the world’s leading cybersecurity conferences, and organized by AI Village, SeedAI, Humane Intelligence, and their partners. Thousands of conference attendees participated—including from community organizations and from 16 community colleges—poking and prodding the LLMs to see if they could make the systems produce undesirable outputs or otherwise fail, with the goal of better understanding the risks that these systems present.

The Administration announced this event in May, when President Biden and Vice President Harris convened some of the leading AI companies at the White House, and in the intervening months, OSTP provided technical expertise to the event organizers to help design the challenge. The event was based on the risks outlined in the Administration’s Blueprint for an AI Bill of Rights and AI Risk Management Framework, and aligns with the recent AI company voluntary commitments announced by the President to mitigate the risks of AI, including external security testing by independent experts.

The results of the event are still being analyzed, and the organizers will provide the AI companies with the data so they can improve their LLMs. Later, the organizers will provide a full report and hope to make data available to approved researchers.

In the meantime, there are a few immediate takeaways. The event demonstrated how external red-teaming can be an effective tool to identify novel AI risks, not only for safety and security, but also for other key AI risks including bias, discrimination, and privacy. These systems must be continuously tested by groups of people with different backgrounds and expertise to identify and mitigate unsafe or discriminatory behavior. External red-teaming has long been used in other areas like cybersecurity to identify novel risks, and this event demonstrated that it can successfully be adapted at scale to help identify risks for AI systems as well.

The AI Village serves as a prototype for future LLM red-teaming events, and much can be learned from the participants’ attempts at the various challenges. The data from the event will help researchers better understand how the participants approached the challenges, and how successfully different mitigations put in place by the companies prevented the LLMs from producing undesirable outputs.

More broadly, the event helped to create norms for continuous, external red-teaming of LLMs for risks to rights and safety, which will be a key method for increasing transparency and accountability for AI companies. We hope that this event and others like it will lead to more familiarity with “red-teaming” of LLMs, help establish a robust ecosystem for red-teaming LLMs in the cybersecurity community and beyond, and ultimately provide critical information to the public about the impacts of these systems.

###

Stay Connected