A Vision Check-up for Language Models
CVPR 2024(2024)
Massachusetts Institute of Technology
Abstract
What does learning to model relationships between strings teach largelanguage models (LLMs) about the visual world? We systematically evaluate LLMs'abilities to generate and recognize an assortment of visual concepts ofincreasing complexity and then demonstrate how a preliminary visualrepresentation learning system can be trained using models of text. As languagemodels lack the ability to consume or output visual information as pixels, weuse code to represent images in our study. Although LLM-generated images do notlook like natural images, results on image generation and the ability of modelsto correct these generated images indicate that precise modeling of strings canteach language models about numerous aspects of the visual world. Furthermore,experiments on self-supervised visual representation learning, utilizing imagesgenerated with text models, highlight the potential to train vision modelscapable of making semantic assessments of natural images using just LLMs.
MoreTranslated text
Key words
Language Model,Visual System,Ability Of The Model,Natural Images,Representation Learning,Image Generation,Capability Of Model,Visual Model,Visual Learning,Visual World,Visual Concepts,Natural Language,Programming Language,Text Data,Evaluation Protocol,Visual Scene,Visual Capabilities,Concept Of Perception,Fréchet Inception Distance,Visual Hierarchy,Fidelity Scores,Rounds Of Feedback
PDF
View via Publisher
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined