NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

NeurIPS 2024（2024）

Cited 6|Views29

Key words

large language model,efficiency,CPU inference,attention

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined