GenAI
NLP
Findings of ACL

Realistic Evaluation of Toxicity in Large Language Models

June 28, 2024

Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.

Overall

< 1 minute

Tinh Son Luong, Thanh-Thien Le, Linh Van Ngo, and Thien Huu Nguyen

ACL-Findings

Share Article

Related publications

GenAI
CV
NeurIPS
November 28, 2024

Hao Phung*, Quan Dao*, Trung Dao, Viet Hoang Phan, Dimitris N. Metaxas, Anh Tran

GenAI
ML
NeurIPS
November 28, 2024
Long Tung Vuong, Anh Tuan Bui,
Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung
GenAI
ML
NeurIPS
November 28, 2024

Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Van Ngo, Nhat Ho

GenAI
NLP
EMNLP
November 28, 2024

Quyen Tran*, Nguyen Xuan Thanh*, Nguyen Hoang Anh*, Nam Le Hai, Trung Le, Linh Van Ngo, Thien Huu Nguyen