LLaVA-Grounding: Grounded Visual Chat with Large Multimodal ModelsHao Zhang,Hongyang Li,Feng Li,Tianhe Ren,Xueyan Zou,Shilong Liu,Shijia Huang,Jianfeng Gao,Lei Zhang,Chunyuan Li,Jianwei YangECCV 2024(2024)引用 60|浏览48关键词Visual Question Answering,Image CaptioningAI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要