WeChat Mini Program
Old Version Features

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

ICML 2024(2024)

Soochow University

Cited 12|Views135
Abstract
To leverage LLMs for visual synthesis, traditional methods convert rasterimage information into discrete grid tokens through specialized visual modules,while disrupting the model's ability to capture the true semanticrepresentation of visual scenes. This paper posits that an alternativerepresentation of images, vector graphics, can effectively surmount thislimitation by enabling a more natural and semantically coherent segmentation ofthe image information. Thus, we introduce StrokeNUWA, a pioneering workexploring a better visual representation ”stroke tokens” on vector graphics,which is inherently visual semantics rich, naturally compatible with LLMs, andhighly compressed. Equipped with stroke tokens, StrokeNUWA can significantlysurpass traditional LLM-based and optimization-based methods across variousmetrics in the vector graphic generation task. Besides, StrokeNUWA achieves upto a 94x speedup in inference over the speed of prior methods with anexceptional SVG code compression ratio of 6.9
More
Translated text
Key words
Virtual Prototyping
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined