A Case for Low Bitwidth Floating Point Arithmetic on FPGA for Transformer Based DNN Inference

Jiajun Wu, Mo Song, Jingmin Zhao,Hayden Kwok-Hay So

2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024（2024）

Cited 1|Views4

Key words

Deep Neural Network,Deep Neural Network Inference,Low Bit-width,Transformer Model,Linear Layer,Floating-point Operations,32-bit Floating-point,Exponent,Nonlinear Function,Processing Unit,Per Cycle,Lookup Table,Partial Products,Deep Neural Network Model,Product Term,Arithmetic Operations,Partial Sums,Least Significant Bit,Hardware Architecture,Minimal Overhead,Sign Bit,Hardware Overhead

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined