GneissWeb: Preparing High Quality Data for LLMs at Scale
Hajar Emami Gohari,Swanand Ravindra Kadhe, Syed Yousaf Shah. Constantin Adam,Abdulhamid Adebayo, Praneet Adusumilli,Farhan Ahmed,Nathalie Baracaldo Angel,Santosh Borse,Yuan-Chi Chang,Xuan-Hong Dang,Nirmit Desai, Ravital Eres,Ran Iwamoto, Alexei Karve, Yan Koyfman,Wei-Han Lee,Changchang Liu,Boris Lublinsky, Takuyo Ohko, Pablo Pesce,Maroun Touma,Shiqiang Wang,Shalisha Witherspoon,Herbert Woisetschlager,David Wood,Kun-Lung Wu,Issei Yoshida,Syed Zawad,Petros Zerfos,Yi Zhou,Bishwaranjan Bhattacharjee CoRR(2025)
AI 理解论文
溯源树
样例
