Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference

Wei Tao, Bin Zhang, Xiaoyang Qu,Jiguang Wan,Jianzong Wang

Design, Automation, and Test in Europe（2025）

Cited 0|Views2

Key words

long-context LLM inference,KV cache,chunk-level quantization search,chunk-level KV cache computation

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined