Chrome Extension
WeChat Mini Program
Use on ChatGLM

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

CVPR 2024(2024)

Cited 150|Views130
Key words
Multimodal Model,Benchmark,Natural Language,Autoregressive Model,Bounding Box,Image Generation,Tokenized,Robot Manipulator,Natural Language Understanding,Denoising,Input Image,Object Detection,Data Augmentation,Diffusion Model,Patch Size,Changes In Architecture,Line Of Work,Language Model,Efficient Implementation,Input Modalities,Vision Transformer,Text Generation,Pre-training Data,Input Text,Audio Segments,Special Token,View Synthesis,Text Output,Sparse Structure,Output Image
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined