Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation

October 4, 2023

Preparing training data for deep vision models is a labor-intensive task. To address this, generative models have emerged as an effective solution for generating synthetic data. While current generative models produce image-level category labels, we propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion (SD). By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation. These techniques enable us to generate segmentation maps corresponding to synthetic images. These maps serve as pseudo-labels for training semantic segmenters, eliminating the need for labor-intensive pixel-wise annotation. To account for the imperfections in our pseudo-labels, we incorporate uncertainty regions into the segmentation, allowing us to disregard loss from those regions. We conduct evaluations on two datasets, PASCAL VOC and MSCOCO, and our approach significantly outperforms concurrent work. Our benchmarks and code will be released at https://github.com/VinAIResearch/Dataset-Diffusion.

Back to research

Overall

< 1 minute

Quang Nguyen, Vu Tuan Truong, Anh Tran, Khoi Nguyen

NeurIPS 2023

Download PDF

Download Code

Related publications

GenAI

NLP

LREC-COLING

Improving Vietnamese-English Medical Machine Translation

June 28, 2024

Nhu Vo, Dat Quoc Nguyen, Dung D. Le, Massimo Piccardi, Wray Buntine

GenAI

NLP

Findings of ACL

Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

June 28, 2024

Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari

GenAI

NLP

Findings of ACL

Realistic Evaluation of Toxicity in Large Language Models

June 28, 2024

Tinh Son Luong, Thanh-Thien Le, Linh Van Ngo, and Thien Huu Nguyen

GenAI

NLP

ACL Top Tier

UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages

June 28, 2024

Trinh Pham*, Khoi M. Le*, Luu Anh Tuan

Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation

Related publications

Thank you!