NLP
NAACL Findings

Cross-Lingual Summarization with Pseudo-Label Regularization

November 28, 2024

Cross-Lingual Summarization (XLS) aims to summarize a document in the source language into a condensed version in the target language, effectively removing language barriers for nonnative readers. Previous approaches, however, have the same limitation that only a single reference (gold summary) is exploited during model training, making the base model exposed to an underrepresented hypothesis space since the actual number of possible hypotheses is exponentially large. To alleviate this problem,
we present a study adopting pseudo-labels in regularizing standard cross-lingual summarization training. We investigate several components leading to the gains in regularization training with verified experiments involving 8 diverse languages from different families. Conclusively, we show that pseudo-labeling is a simple and effective approach that significantly improves over standard gold reference training in XLS.

Overall

< 1 minute

Thang Le

Share Article

Related publications

NLP
NAACL Findings
November 28, 2024

Thang Le

NLP
EMNLP Findings
November 28, 2024

Duy-Tung Pham*, Thien Trang Nguyen Vu*, Tung Nguyen*, Linh Van Ngo, Duc Anh Nguyen, Thien Huu Nguyen

GenAI
NLP
EMNLP
November 28, 2024

Quyen Tran*, Nguyen Xuan Thanh*, Nguyen Hoang Anh*, Nam Le Hai, Trung Le, Linh Van Ngo, Thien Huu Nguyen

GenAI
NLP
EMNLP Findings
November 28, 2024

Quang Hieu Pham*, Hoang Ngo*, Anh Tuan Luu, Dat Quoc Nguyen