WeChat Mini Program

Old Version Features

Log in

Academic Profile User Profile

My Following Paper Collections Browse History

WIKIGENBENCH: Exploring Full-length Wikipedia Generation under Real-World Scenario

Jiebin Zhang, Eugene J. Yu,Qinyu Chen, Chenhao Xiong,Dawei Zhu, Han Qian, Mingbo Song,Weimin Xiong,Xiaoguang Li,Qun Liu,Sujian Li

International Conference on Computational Linguistics（2025）

Peking University School of Computer Science

Cited 0|Views35

Abstract

It presents significant challenges to generate comprehensive and accurate Wikipedia articles for newly emerging events under a real-world scenario. Existing attempts fall short either by focusing only on short snippets or by using metrics that are insufficient to evaluate real-world scenarios. In this paper, we construct WIKIGENBENCH, a new benchmark consisting of 1,320 entries, designed to align with real-world scenarios in both generation and evaluation. For generation, we explore a real-world scenario where structured, full-length Wikipedia articles with citations are generated for new events using input documents from web sources. For evaluation, we integrate systematic metrics and LLM-based metrics to assess the verifiability, organization, and other aspects aligned with real-world scenarios. Based on this benchmark, we conduct extensive experiments using various models within three commonly used frameworks: direct RAG, hierarchical structure-based RAG, and RAG with a fine-tuned generation model. Experimental results show that hierarchical-based methods can generate more comprehensive content, while fine-tuned methods achieve better verifiability. However, even the best methods still show a significant gap compared to existing Wikipedia content, indicating that further research is necessary.

More

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Related Papers

Reference papers

Automatically Generating Wikipedia Articles: A Structure-Aware Approach.

Christina Sauper,Regina Barzilay

2009

被引用210 | 浏览

Reading Wikipedia to Answer Open-Domain Questions.

Danqi Chen,Adam Fisch,Jason Weston,Antoine Bordes

2017

被引用2433 | 浏览

Retrieve and Refine: Improved Sequence Generation Models for Dialogue

Jason Weston,Emily Dinan,Alexander H. Miller

2018

被引用244 | 浏览

WebGPT: Browser-assisted Question-Answering with Human Feedback

Reiichiro Nakano,Jacob Hilton,Suchir Balaji,Jeff Wu,Long Ouyang,Christina Kim,Christopher Hesse,Shantanu Jain,Vineet Kosaraju,William Saunders,Xu Jiang,Karl Cobbe,

2021

被引用1449 | 浏览

A Survey on Retrieval-Augmented Text Generation

Huayang Li,Yixuan Su,Deng Cai,Yan Wang,Lemao Liu

2022

被引用103 | 浏览

Teaching language models to support answers with verified quotes

Jacob Menick,Maja Trebacz,Vladimir Mikulik,John Aslanides,Francis Song,Martin Chadwick,Mia Glaese,Susannah Young,Lucy Campbell-Gillingham,Geoffrey Irving,Nat McAleese

2022

被引用269 | 浏览

Towards Reasoning in Large Language Models: A Survey

Jie Huang,Kevin Chen-Chuan Chang

2023

被引用961 | 浏览

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions.

Harsh Trivedi,Niranjan Balasubramanian,Tushar Khot,Ashish Sabharwal

2023

被引用492 | 浏览

In-Context Retrieval-Augmented Language Models.

Ori Ram,Yoav Levine,Itay Dalmedigos,Dor Muhlgay,Amnon Shashua,Kevin Leyton-Brown,Yoav Shoham

2023

被引用131 | 浏览

WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus

Hongjin Qian,Yutao Zhu,Zhicheng Dou, Haoqi Gu,Xinyu Zhang,Zheng Liu,Ruofei Lai,Zhao Cao,Jian-Yun Nie,Ji-Rong Wen

2023

被引用10 | 浏览

Active Retrieval Augmented Generation

Zhengbao Jiang,Frank F. Xu,Luyu Gao,Zhiqing Sun,Qian Liu,Jane Yu,Yiming Yang,Jamie Callan,Graham Neubig

2023

被引用657 | 浏览

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Sewon Min,Kalpesh Krishna,Xinxi Lyu,Mike Lewis,Wen-tau Yih,Pang Wei Koh,Mohit Iyyer,Luke Zettlemoyer,Hannaneh Hajishirzi

2023

被引用699 | 浏览

Enabling Large Language Models to Generate Text with Citations

Tianyu Gao,Howard Yen,Jiatong Yu,Danqi Chen

2023

被引用304 | 浏览

RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models

Jie Huang,Wei Ping,Peng Xu,Mohammad Shoeybi,Kevin Chen-Chuan Chang,Bryan Catanzaro

2024

被引用50 | 浏览

Retrieval-Generation Synergy Augmented Large Language Models.

Zhangyin Feng,Xiaocheng Feng, Dezhi Zhao, Maojin Yang,Bing Qin

2024

被引用64 | 浏览

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao,Yun Xiong,Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi,Yi Dai, Jiawei Sun,Meng Wang,Haofen Wang

2023

被引用2488 | 浏览

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本文提出了WIKIGENBENCH，一种新的基准数据集，用于在真实世界场景下生成全面、准确的维基百科文章，并探索了不同模型框架在生成和评估方面的表现。

【方法】：通过构建包含1,320个条目的WIKIGENBENCH数据集，该数据集模拟真实世界场景，要求生成具有引用的完整维基百科文章，并采用系统性指标和基于大型语言模型的指标进行评估。

【实验】：作者在WIKIGENBENCH数据集上进行了广泛实验，使用直接RAG、基于层次结构的RAG以及带有微调生成模型的RAG三种常用框架中的多种模型，实验结果表明层次结构方法生成的内容更全面，微调方法在验证性方面表现更好。

去 AI 文献库对话