Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanyu Chen

Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models

Mar 05, 2025

Jiyue Jiang, Alfred Kar Yin Truong, Yanyu Chen, Qinghang Bao, Sheng Wang, Pengan Chen, Jiuming Wang, Lingpeng Kong, Yu Li, Chuan Wu

Abstract:High-quality data resources play a crucial role in learning large language models (LLMs), particularly for low-resource languages like Cantonese. Despite having more than 85 million native speakers, Cantonese is still considered a low-resource language in the field of natural language processing (NLP) due to factors such as the dominance of Mandarin, lack of cohesion within the Cantonese-speaking community, diversity in character encoding and input methods, and the tendency of overseas Cantonese speakers to prefer using English. In addition, rich colloquial vocabulary of Cantonese, English loanwords, and code-switching characteristics add to the complexity of corpus collection and processing. To address these challenges, we collect Cantonese texts from a variety of sources, including open source corpora, Hong Kong-specific forums, Wikipedia, and Common Crawl data. We conduct rigorous data processing through language filtering, quality filtering, content filtering, and de-duplication steps, successfully constructing a high-quality Cantonese corpus of over 2 billion tokens for training large language models. We further refined the model through supervised fine-tuning (SFT) on curated Cantonese tasks, enhancing its ability to handle specific applications. Upon completion of the training, the model achieves state-of-the-art (SOTA) performance on four Cantonese benchmarks. After training on our dataset, the model also exhibits improved performance on other mainstream language tasks.

Via

Access Paper or Ask Questions

Advancing Oyster Phenotype Segmentation with Multi-Network Ensemble and Multi-Scale mechanism

Jan 20, 2025

Wenli Yang, Yanyu Chen, Andrew Trotter, Byeong Kang

Abstract:Phenotype segmentation is pivotal in analysing visual features of living organisms, enhancing our understanding of their characteristics. In the context of oysters, meat quality assessment is paramount, focusing on shell, meat, gonad, and muscle components. Traditional manual inspection methods are time-consuming and subjective, prompting the adoption of machine vision technology for efficient and objective evaluation. We explore machine vision's capacity for segmenting oyster components, leading to the development of a multi-network ensemble approach with a global-local hierarchical attention mechanism. This approach integrates predictions from diverse models and addresses challenges posed by varying scales, ensuring robust instance segmentation across components. Finally, we provide a comprehensive evaluation of the proposed method's performance using different real-world datasets, highlighting its efficacy and robustness in enhancing oyster phenotype segmentation.

Via

Access Paper or Ask Questions

GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments

Dec 06, 2024

Yanyu Chen, Ganhong Huang

Figure 1 for GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments

Figure 2 for GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments

Figure 3 for GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments

Figure 4 for GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments

Abstract:Efficiently deploying large language models (LLMs) in real-world scenarios remains a critical challenge, primarily due to hardware heterogeneity, inference framework limitations, and workload complexities.Efficiently deploying large language models (LLMs) in real-world scenarios remains a critical challenge, primarily due to hardware heterogeneity, inference framework limitations, and workload complexities. These challenges often lead to inefficiencies in memory utilization, latency, and throughput, hindering the effective deployment of LLMs, especially for non-experts. Through extensive experiments, we identify key performance bottlenecks, including sudden drops in memory utilization, latency fluctuations with varying batch sizes, and inefficiencies in multi-GPU configurations. These insights reveal a vast optimization space shaped by the intricate interplay of hardware, frameworks, and workload parameters. This underscores the need for a systematic approach to optimize LLM inference, motivating the design of our framework, GUIDE. GUIDE leverages dynamic modeling and simulation-based optimization to address these issues, achieving prediction errors between 25% and 55% for key metrics such as batch latency, TTFT, and decode throughput. By effectively bridging the gap between theoretical performance and practical deployment, our framework empowers practitioners, particularly non-specialists, to make data-driven decisions and unlock the full potential of LLMs in heterogeneous environments cheaply.

Via

Access Paper or Ask Questions

Training and Serving System of Foundation Models: A Comprehensive Survey

Jan 05, 2024

Jiahang Zhou, Yanyu Chen, Zicong Hong, Wuhui Chen, Yue Yu, Tao Zhang, Hui Wang, Chuanfu Zhang, Zibin Zheng

Abstract:Foundation models (e.g., ChatGPT, DALL-E, PengCheng Mind, PanGu-$\Sigma$) have demonstrated extraordinary performance in key technological areas, such as natural language processing and visual recognition, and have become the mainstream trend of artificial general intelligence. This has led more and more major technology giants to dedicate significant human and financial resources to actively develop their foundation model systems, which drives continuous growth of these models' parameters. As a result, the training and serving of these models have posed significant challenges, including substantial computing power, memory consumption, bandwidth demands, etc. Therefore, employing efficient training and serving strategies becomes particularly crucial. Many researchers have actively explored and proposed effective methods. So, a comprehensive survey of them is essential for system developers and researchers. This paper extensively explores the methods employed in training and serving foundation models from various perspectives. It provides a detailed categorization of these state-of-the-art methods, including finer aspects such as network, computing, and storage. Additionally, the paper summarizes the challenges and presents a perspective on the future development direction of foundation model systems. Through comprehensive discussion and analysis, it hopes to provide a solid theoretical basis and practical guidance for future research and applications, promoting continuous innovation and development in foundation model systems.

Via

Access Paper or Ask Questions

SPOT! Revisiting Video-Language Models for Event Understanding

Dec 01, 2023

Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, Volker Tresp

Abstract:Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only broad-level video captions. This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events? To address this, we introduce SPOT Prober, to benchmark existing video-language models's capacities of distinguishing event-level discrepancies as an indicator of models' event understanding ability. Our approach involves extracting events as tuples (<Subject, Predicate, Object, Attribute, Timestamps>) from videos and generating false event tuples by manipulating tuple components systematically. We reevaluate the existing video-language models with these positive and negative captions and find they fail to distinguish most of the manipulated events. Based on our findings, we propose to plug in these manipulated event captions as hard negative samples and find them effective in enhancing models for event understanding.

Via

Access Paper or Ask Questions

CDR-Adapter: Learning Adapters to Dig Out More Transferring Ability for Cross-Domain Recommendation Models

Nov 04, 2023

Yanyu Chen, Yao Yao, Wai Kin Victor Chan, Li Xiao, Kai Zhang, Liang Zhang, Yun Ye

Abstract:Data sparsity and cold-start problems are persistent challenges in recommendation systems. Cross-domain recommendation (CDR) is a promising solution that utilizes knowledge from the source domain to improve the recommendation performance in the target domain. Previous CDR approaches have mainly followed the Embedding and Mapping (EMCDR) framework, which involves learning a mapping function to facilitate knowledge transfer. However, these approaches necessitate re-engineering and re-training the network structure to incorporate transferrable knowledge, which can be computationally expensive and may result in catastrophic forgetting of the original knowledge. In this paper, we present a scalable and efficient paradigm to address data sparsity and cold-start issues in CDR, named CDR-Adapter, by decoupling the original recommendation model from the mapping function, without requiring re-engineering the network structure. Specifically, CDR-Adapter is a novel plug-and-play module that employs adapter modules to align feature representations, allowing for flexible knowledge transfer across different domains and efficient fine-tuning with minimal training costs. We conducted extensive experiments on the benchmark dataset, which demonstrated the effectiveness of our approach over several state-of-the-art CDR approaches.

* 8 pages, 4 figures, 5 tables, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval Workshop on eCommerce

Via

Access Paper or Ask Questions

Hybrid CNN -Interpreter: Interpret local and global contexts for CNN-based Models

Oct 31, 2022

Wenli Yang, Guan Huang, Renjie Li, Jiahao Yu, Yanyu Chen, Quan Bai, Beyong Kang

Figure 1 for Hybrid CNN -Interpreter: Interpret local and global contexts for CNN-based Models

Figure 2 for Hybrid CNN -Interpreter: Interpret local and global contexts for CNN-based Models

Figure 3 for Hybrid CNN -Interpreter: Interpret local and global contexts for CNN-based Models

Figure 4 for Hybrid CNN -Interpreter: Interpret local and global contexts for CNN-based Models

Abstract:Convolutional neural network (CNN) models have seen advanced improvements in performance in various domains, but lack of interpretability is a major barrier to assurance and regulation during operation for acceptance and deployment of AI-assisted applications. There have been many works on input interpretability focusing on analyzing the input-output relations, but the internal logic of models has not been clarified in the current mainstream interpretability methods. In this study, we propose a novel hybrid CNN-interpreter through: (1) An original forward propagation mechanism to examine the layer-specific prediction results for local interpretability. (2) A new global interpretability that indicates the feature correlation and filter importance effects. By combining the local and global interpretabilities, hybrid CNN-interpreter enables us to have a solid understanding and monitoring of model context during the whole learning process with detailed and consistent representations. Finally, the proposed interpretabilities have been demonstrated to adapt to various CNN-based model structures.

Via

Access Paper or Ask Questions

Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network

Oct 10, 2022

Hongyi Zheng, Yanyu Chen, Zihan Zhang

Figure 1 for Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network

Figure 2 for Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network

Figure 3 for Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network

Figure 4 for Predicting Blossom Date of Cherry Tree With Support Vector Machine and Recurrent Neural Network

Abstract:Our project probes the relationship between temperatures and the blossom date of cherry trees. Through modeling, future flowering will become predictive, helping the public plan travels and avoid pollen season. To predict the date when the cherry trees will blossom exactly could be viewed as a multiclass classification problem, so we applied the multi-class Support Vector Classifier (SVC) and Recurrent Neural Network (RNN), particularly Long Short-term Memory (LSTM), to formulate the problem. In the end, we evaluate and compare the performance of these approaches to find out which one might be more applicable in reality.

* 6 Pages, 6 Figures

Via

Access Paper or Ask Questions

DIWIFT: Discovering Instance-wise Influential Features for Tabular Data

Jul 06, 2022

Pengxiang Cheng, Hong Zhu, Xing Tang, Dugang Liu, Yanyu Chen, Xiaoting Wang, Weike Pan, Zhong Ming, Xiuqiang He

Figure 1 for DIWIFT: Discovering Instance-wise Influential Features for Tabular Data

Figure 2 for DIWIFT: Discovering Instance-wise Influential Features for Tabular Data

Figure 3 for DIWIFT: Discovering Instance-wise Influential Features for Tabular Data

Figure 4 for DIWIFT: Discovering Instance-wise Influential Features for Tabular Data

Abstract:Tabular data is one of the most common data storage formats in business applications, ranging from retail, bank and E-commerce. These applications rely heavily on machine learning models to achieve business success. One of the critical problems in learning tabular data is to distinguish influential features from all the predetermined features. Global feature selection has been well-studied for quite some time, assuming that all instances have the same influential feature subsets. However, different instances rely on different feature subsets in practice, which also gives rise to that instance-wise feature selection receiving increasing attention in recent studies. In this paper, we first propose a novel method for discovering instance-wise influential features for tabular data (DIWIFT), the core of which is to introduce the influence function to measure the importance of an instance-wise feature. DIWIFT is capable of automatically discovering influential feature subsets of different sizes in different instances, which is different from global feature selection that considers all instances with the same influential feature subset. On the other hand, different from the previous instance-wise feature selection, DIWIFT minimizes the validation loss on the validation set and is thus more robust to the distribution shift existing in the training dataset and test dataset, which is important in tabular data. Finally, we conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our DIWIFT, compared it with baseline methods. Moreover, we also demonstrate the robustness of our method via some ablation experiments.

Via

Access Paper or Ask Questions

SAR-ShipNet: SAR-Ship Detection Neural Network via Bidirectional Coordinate Attention and Multi-resolution Feature Fusion

Mar 29, 2022

Yuwen Deng, Donghai Guan, Yanyu Chen, Weiwei Yuan, Jiemin Ji, Mingqiang Wei

Figure 1 for SAR-ShipNet: SAR-Ship Detection Neural Network via Bidirectional Coordinate Attention and Multi-resolution Feature Fusion

Figure 2 for SAR-ShipNet: SAR-Ship Detection Neural Network via Bidirectional Coordinate Attention and Multi-resolution Feature Fusion

Figure 3 for SAR-ShipNet: SAR-Ship Detection Neural Network via Bidirectional Coordinate Attention and Multi-resolution Feature Fusion

Figure 4 for SAR-ShipNet: SAR-Ship Detection Neural Network via Bidirectional Coordinate Attention and Multi-resolution Feature Fusion

Abstract:This paper studies a practically meaningful ship detection problem from synthetic aperture radar (SAR) images by the neural network. We broadly extract different types of SAR image features and raise the intriguing question that whether these extracted features are beneficial to (1) suppress data variations (e.g., complex land-sea backgrounds, scattered noise) of real-world SAR images, and (2) enhance the features of ships that are small objects and have different aspect (length-width) ratios, therefore resulting in the improvement of ship detection. To answer this question, we propose a SAR-ship detection neural network (call SAR-ShipNet for short), by newly developing Bidirectional Coordinate Attention (BCA) and Multi-resolution Feature Fusion (MRF) based on CenterNet. Moreover, considering the varying length-width ratio of arbitrary ships, we adopt elliptical Gaussian probability distribution in CenterNet to improve the performance of base detector models. Experimental results on the public SAR-Ship dataset show that our SAR-ShipNet achieves competitive advantages in both speed and accuracy.

* This paper was accepted by the International Conference on Acoustics, Speech, and Signal Processing(ICASSP) 2022

Via

Access Paper or Ask Questions