Mirage: Towards Low-interruption Services on Batch GPU Clusters with Reinforcement Learning
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(2023)
Key words
GPU Cluster,Deep Learning,Random Forest,XGBoost,Policy Gradient,Deep Q-network,Wall-clock Time,Reinforcement Learning Techniques,Deep Learning Training,Deep Learning Research,Deep Network,Deep Neural Network,Transition State,State Space,Ensemble Method,Language Model,Waiting Time,Transformer Model,Policy Learning,Gradient Boosting Decision Tree,Ensemble Learning Method,Foundation Model,Job Completion Time,Policy Network,Reinforcement Learning Methods,Policy Gradient Method,Deep Q-learning,Medium Load,Single Job,Light Load
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined