Chrome Extension
WeChat Mini Program
Use on ChatGLM

Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference

Wei Tao, Bin Zhang, Xiaoyang Qu,Jiguang Wan,Jianzong Wang

Design, Automation, and Test in Europe(2025)

Cited 0|Views2
Key words
long-context LLM inference,KV cache,chunk-level quantization search,chunk-level KV cache computation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined