Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiahao Bu

Tsinghua University

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Nov 05, 2024

Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu(+97 more)

Figure 1 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Figure 2 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Figure 3 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Figure 4 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Abstract:In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

* 17 pages, 4 Figures

Via

Access Paper or Ask Questions

ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction

Mar 11, 2021

Jiahao Bu, Lei Ren, Shuang Zheng, Yang Yang, Jingang Wang, Fuzheng Zhang, Wei Wu

Figure 1 for ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction

Figure 2 for ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction

Figure 3 for ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction

Figure 4 for ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction

Abstract:Sentiment analysis has attracted increasing attention in e-commerce. The sentiment polarities underlying user reviews are of great value for business intelligence. Aspect category sentiment analysis (ACSA) and review rating prediction (RP) are two essential tasks to detect the fine-to-coarse sentiment polarities. %Considering the sentiment of the aspects(ACSA) and the overall review rating(RP) simultaneously has the potential to improve the overall performance. ACSA and RP are highly correlated and usually employed jointly in real-world e-commerce scenarios. While most public datasets are constructed for ACSA and RP separately, which may limit the further exploitation of both tasks. To address the problem and advance related researches, we present a large-scale Chinese restaurant review dataset \textbf{ASAP} including $46,730$ genuine reviews from a leading online-to-offline (O2O) e-commerce platform in China. Besides a $5$-star scale rating, each review is manually annotated according to its sentiment polarities towards $18$ pre-defined aspect categories. We hope the release of the dataset could shed some light on the fields of sentiment analysis. Moreover, we propose an intuitive yet effective joint model for ACSA and RP. Experimental results demonstrate that the joint model outperforms state-of-the-art baselines on both tasks.

* 10 Pages, 4 Figures, Accepted at NAACL 2021

Via

Access Paper or Ask Questions

Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Feb 12, 2018

Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng(+3 more)

Figure 1 for Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Figure 2 for Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Figure 3 for Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Figure 4 for Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Abstract:To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels. In this paper, we proposed Donut, an unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our key techniques, Donut greatly outperforms a state-of-arts supervised ensemble approach and a baseline VAE approach, and its best F-scores range from 0.75 to 0.9 for the studied KPIs from a top global Internet company. We come up with a novel KDE interpretation of reconstruction for Donut, making it the first VAE-based anomaly detection algorithm with solid theoretical explanation.

* 12 pages (including references), 17 figures, submitted to WWW 2018: The 2018 Web Conference, April 23--27, 2018, Lyon, France. The contents discarded from the conference version due to the 9-page limitation are also included in this version

Via

Access Paper or Ask Questions

Person Re-identification Meets Image Search

Feb 07, 2015

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jiahao Bu, Qi Tian

Figure 1 for Person Re-identification Meets Image Search

Figure 2 for Person Re-identification Meets Image Search

Figure 3 for Person Re-identification Meets Image Search

Figure 4 for Person Re-identification Meets Image Search

Abstract:For long time, person re-identification and image search are two separately studied tasks. However, for person re-identification, the effectiveness of local features and the "query-search" mode make it well posed for image search techniques. In the light of recent advances in image search, this paper proposes to treat person re-identification as an image search problem. Specifically, this paper claims two major contributions. 1) By designing an unsupervised Bag-of-Words representation, we are devoted to bridging the gap between the two tasks by integrating techniques from image search in person re-identification. We show that our system sets up an effective yet efficient baseline that is amenable to further supervised/unsupervised improvements. 2) We contribute a new high quality dataset which uses DPM detector and includes a number of distractor images. Our dataset reaches closer to realistic settings, and new perspectives are provided. Compared with approaches that rely on feature-feature match, our method is faster by over two orders of magnitude. Moreover, on three datasets, we report competitive results compared with the state-of-the-art methods.

Via

Access Paper or Ask Questions