This website uses cookies, pixels, and similar technologies (“cookies”), some of which are provided by third parties, to enable website features and functionality; measure, analyze, and improve site performance; enhance user experience; record user interactions; and support our advertising and marketing. We and our third-party vendors may monitor, record, and access information and data, including device data, IP address and online identifiers, referring URLs and other browsing information, for these and similar purposes. By clicking “Accept all cookies,” you agree to such purposes. If you continue to browse our site without clicking “Accept all cookies,” or if you click “Reject all cookies,” only cookies necessary to operate and enable default website features and functionalities will be deployed. If you are visiting our Site in the U.S., by using this site or clicking “Accept all cookies,” “Reject all cookies,” or “Preferences,” you acknowledge and agree to our Privacy Policy, Cookie Policy, and Terms of Use.

Accept all cookies Reject all cookies Preferences

portfolio knowledge hubcareers

library

Blog

/

Cyber

IQT Labs releases audit report of RoBERTa, an large language model

December 16, 2022

Andrea Brennen, Deputy Director

Read more to learn about what happened during our latest AI audit report that focuses on a pretrained Large Language Model called RoBERTa.

We are excited to announce the release of our latest AI audit report, which focuses on a pretrained large language model called RoBERTa. Check out the full report here.

Large Language Models (LLMs) like RoBERTa have been in the press a lot recently, thanks to OpenAI's release of ChatGPT. These models are tremendously powerful, but also concerning, in part because of their potential to generate offensive, stereotyped, and racist text. Since LLMs are trained on extremely large text datasets scraped from the internet, it is difficult to know how they will perform in a specific context, or to anticipate undesirable biases in model output.

Our report describes several concerns we uncovered while auditing this model, including:

Two security vulnerabilities in Jupyter notebooks & Jupyter server (which have since been patched).
RoBERTa's proclivity to assign negative sentiment to rare names.
A backdoor we discovered, based on a subword from Saysiat, a language spoken in Taiwan. (We will present our paper on this work next week at the 5^th Workshop on Big Data for CyberSecurity at the 2022 IEEE International Conference on Big Data.)

We also describe how we did the audit, building on the methodology we developed while auditing the deepfake detection tool, FakeFinder. For example:

We mined the AI Incident Database for previous failures of LLMs and used these incidents to help us construct an ethical matrix for RoBERTa.
We worked with BNH.AI to define general categories of bias and create a high-level bias testing plan.

For more information on IQT Labs AI Audits, check out Interrogating RoBERTa: Inside the challenge of learning to audit AI models and tools or get in touch with us by emailing labsinfo@iqt.org.