Abstract:We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token. We develop an optimized parallel strategy and highly efficient computation-communication overlap techniques for MoE and lightning attention. This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and extrapolate to 4 million tokens during inference at an affordable cost. Our vision-language model, MiniMax-VL-01 is built through continued training with 512 billion vision-language tokens. Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window. We publicly release MiniMax-01 at https://github.com/MiniMax-AI.
Abstract:Influence maximization aims to find a subset of seeds that maximize the influence spread under a given budget. In this paper, we mainly address the data-driven version of this problem, where the diffusion model is not given but needs to be inferred from the history cascades. Several previous works have addressed this topic in a statistical way and provided efficient algorithms with theoretical guarantee. However, in their settings, though the diffusion parameters are inferred, they still need users to preset the diffusion model, which can be an intractable problem in real-world practices. In this paper, we reformulate the problem on the attributed network and leverage the node attributes to estimate the closeness between the connected nodes. Specifically, we propose a machine learning-based framework, named DSCom, to address this problem in a heuristic way. Under this framework, we first infer the users' relationship from the diffusion dataset through attention mechanism and then leverage spectral clustering to overcome the influence overlap problem in the lack of exact diffusion formula. Compared to the previous theoretical works, we carefully designed empirical experiments with parameterized diffusion models based on real-world social networks, which prove the efficiency and effectiveness of our algorithm.