Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huaiguang Cai

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Jun 16, 2025

MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu(+118 more)

Abstract:We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems including sandbox-based, real-world software engineering environments. In addition to M1's inherent efficiency advantage for RL training, we propose CISPO, a novel RL algorithm to further enhance RL efficiency. CISPO clips importance sampling weights rather than token updates, outperforming other competitive RL variants. Combining hybrid-attention and CISPO enables MiniMax-M1's full RL training on 512 H800 GPUs to complete in only three weeks, with a rental cost of just $534,700. We release two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively, where the 40K model represents an intermediate phase of the 80K training. Experiments on standard benchmarks show that our models are comparable or superior to strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, with particular strengths in complex software engineering, tool utilization, and long-context tasks. We publicly release MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1.

* A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

Via

Access Paper or Ask Questions

CAMs as Shapley Value-based Explainers

Jan 09, 2025

Huaiguang Cai

Abstract:Class Activation Mapping (CAM) methods are widely used to visualize neural network decisions, yet their underlying mechanisms remain incompletely understood. To enhance the understanding of CAM methods and improve their explainability, we introduce the Content Reserved Game-theoretic (CRG) Explainer. This theoretical framework clarifies the theoretical foundations of GradCAM and HiResCAM by modeling the neural network prediction process as a cooperative game. Within this framework, we develop ShapleyCAM, a new method that leverages gradients and the Hessian matrix to provide more precise and theoretically grounded visual explanations. Due to the computational infeasibility of exact Shapley value calculation, ShapleyCAM employs a second-order Taylor expansion of the cooperative game's utility function to derive a closed-form expression. Additionally, we propose the Residual Softmax Target-Class (ReST) utility function to address the limitations of pre-softmax and post-softmax scores. Extensive experiments across 12 popular networks on the ImageNet validation set demonstrate the effectiveness of ShapleyCAM and its variants. Our findings not only advance CAM explainability but also bridge the gap between heuristic-driven CAM methods and compute-intensive Shapley value-based methods. The code is available at \url{https://github.com/caihuaiguang/pytorch-shapley-cam}.

* Accepted by The Visual Computer

Via

Access Paper or Ask Questions

CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy Machine Learning

Jun 18, 2024

Huaiguang Cai

Abstract:Understanding the decision-making process of machine learning models is crucial for ensuring trustworthy machine learning. Data Shapley, a landmark study on data valuation, advances this understanding by assessing the contribution of each datum to model accuracy. However, the resource-intensive and time-consuming nature of multiple model retraining poses challenges for applying Data Shapley to large datasets. To address this, we propose the CHG (Conduct of Hardness and Gradient) score, which approximates the utility of each data subset on model accuracy during a single model training. By deriving the closed-form expression of the Shapley value for each data point under the CHG score utility function, we reduce the computational complexity to the equivalent of a single model retraining, an exponential improvement over existing methods. Additionally, we employ CHG Shapley for real-time data selection, demonstrating its effectiveness in identifying high-value and noisy data. CHG Shapley facilitates trustworthy model training through efficient data valuation, introducing a novel data-centric perspective on trustworthy machine learning.

Via

Access Paper or Ask Questions

Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference

May 25, 2024

Huaiguang Cai, Zhi Zhou, Qianyi Huang

Figure 1 for Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference

Figure 2 for Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference

Figure 3 for Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference

Figure 4 for Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference

Abstract:With edge intelligence, AI models are increasingly pushed to the edge to serve ubiquitous users. However, due to the drift of model, data, and task, AI model deployed at the edge suffers from degraded accuracy in the inference serving phase. Model retraining handles such drifts by periodically retraining the model with newly arrived data. When colocating model retraining and model inference serving for the same model on resource-limited edge servers, a fundamental challenge arises in balancing the resource allocation for model retraining and inference, aiming to maximize long-term inference accuracy. This problem is particularly difficult due to the underlying mathematical formulation being time-coupled, non-convex, and NP-hard. To address these challenges, we introduce a lightweight and explainable online approximation algorithm, named ORRIC, designed to optimize resource allocation for adaptively balancing the accuracy of model training and inference. The competitive ratio of ORRIC outperforms that of the traditional Inference-Only paradigm, especially when data drift persists for a sufficiently lengthy time. This highlights the advantages and applicable scenarios of colocating model retraining and inference. Notably, ORRIC can be translated into several heuristic algorithms for different resource environments. Experiments conducted in real scenarios validate the effectiveness of ORRIC.

* This paper has been accepted by the IEEE INFOCOM 2024 Main Conference

Via

Access Paper or Ask Questions