Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Design, Automation, and Test in Europe(2025)
Key words
long-context LLM inference,KV cache,chunk-level quantization search,chunk-level KV cache computation
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined