WeChat Mini Program
Old Version Features

WIKIGENBENCH: Exploring Full-length Wikipedia Generation under Real-World Scenario

Jiebin Zhang, Eugene J. Yu,Qinyu Chen, Chenhao Xiong,Dawei Zhu, Han Qian, Mingbo Song,Weimin Xiong,Xiaoguang Li,Qun Liu,Sujian Li

International Conference on Computational Linguistics(2025)

Peking University School of Computer Science

Cited 0|Views35
Abstract
It presents significant challenges to generate comprehensive and accurate Wikipedia articles for newly emerging events under a real-world scenario. Existing attempts fall short either by focusing only on short snippets or by using metrics that are insufficient to evaluate real-world scenarios. In this paper, we construct WIKIGENBENCH, a new benchmark consisting of 1,320 entries, designed to align with real-world scenarios in both generation and evaluation. For generation, we explore a real-world scenario where structured, full-length Wikipedia articles with citations are generated for new events using input documents from web sources. For evaluation, we integrate systematic metrics and LLM-based metrics to assess the verifiability, organization, and other aspects aligned with real-world scenarios. Based on this benchmark, we conduct extensive experiments using various models within three commonly used frameworks: direct RAG, hierarchical structure-based RAG, and RAG with a fine-tuned generation model. Experimental results show that hierarchical-based methods can generate more comprehensive content, while fine-tuned methods achieve better verifiability. However, even the best methods still show a significant gap compared to existing Wikipedia content, indicating that further research is necessary.
More
Translated text
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers
Yunfan Gao,Yun Xiong,Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi,Yi Dai, Jiawei Sun,Meng Wang,Haofen Wang
2023

被引用2488 | 浏览

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了WIKIGENBENCH,一种新的基准数据集,用于在真实世界场景下生成全面、准确的维基百科文章,并探索了不同模型框架在生成和评估方面的表现。

方法】:通过构建包含1,320个条目的WIKIGENBENCH数据集,该数据集模拟真实世界场景,要求生成具有引用的完整维基百科文章,并采用系统性指标和基于大型语言模型的指标进行评估。

实验】:作者在WIKIGENBENCH数据集上进行了广泛实验,使用直接RAG、基于层次结构的RAG以及带有微调生成模型的RAG三种常用框架中的多种模型,实验结果表明层次结构方法生成的内容更全面,微调方法在验证性方面表现更好。