Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ye Qi

Sid

HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning

Aug 11, 2024

Zhijian Chen, Zhonghua Li, Jianxin Yang, Ye Qi

Figure 1 for HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning

Figure 2 for HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning

Figure 3 for HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning

Figure 4 for HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning

Abstract:Hierarchical text classification (HTC) is a special sub-task of multi-label classification (MLC) whose taxonomy is constructed as a tree and each sample is assigned with at least one path in the tree. Latest HTC models contain three modules: a text encoder, a structure encoder and a multi-label classification head. Specially, the structure encoder is designed to encode the hierarchy of taxonomy. However, the structure encoder has scale problem. As the taxonomy size increases, the learnable parameters of recent HTC works grow rapidly. Recursive regularization is another widely-used method to introduce hierarchical information but it has collapse problem and generally relaxed by assigning with a small weight (ie. 1e-6). In this paper, we propose a Hierarchy-aware Light Global model with Hierarchical local conTrastive learning (HiLight), a lightweight and efficient global model only consisting of a text encoder and a multi-label classification head. We propose a new learning task to introduce the hierarchical information, called Hierarchical Local Contrastive Learning (HiLCL). Extensive experiments are conducted on two benchmark datasets to demonstrate the effectiveness of our model.

Via

Access Paper or Ask Questions

The Llama 3 Herd of Models

Jul 31, 2024

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan(+521 more)

Abstract:Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Via

Access Paper or Ask Questions

CycleGAN Face-off

Jul 04, 2018

Xiaohan Jin, Ye Qi, Shangxuan Wu

Abstract:Face-off is an interesting case of style transfer where the facial expressions and attributes of one person could be fully transformed to another face. We are interested in the unsupervised training process which only requires two sequences of unaligned video frames from each person and learns what shared attributes to extract automatically. In this project, we explored various improvements for adversarial training (i.e. CycleGAN[Zhu et al., 2017]) to capture details in facial expressions and head poses and thus generate transformation videos of higher consistency and stability.

* Github repo: https://github.com/ShangxuanWu/CycleGAN-Face-off

Via

Access Paper or Ask Questions

When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Apr 18, 2018

Ye Qi, Devendra Singh Sachan, Matthieu Felix, Sarguna Janani Padmanabhan, Graham Neubig

Figure 1 for When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Figure 2 for When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Figure 3 for When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Figure 4 for When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

Abstract:The performance of Neural Machine Translation (NMT) systems often suffers in low-resource scenarios where sufficiently large-scale parallel corpora cannot be obtained. Pre-trained word embeddings have proven to be invaluable for improving performance in natural language analysis tasks, which often suffer from paucity of data. However, their utility for NMT has not been extensively explored. In this work, we perform five sets of experiments that analyze when we can expect pre-trained word embeddings to help in NMT tasks. We show that such embeddings can be surprisingly effective in some cases -- providing gains of up to 20 BLEU points in the most favorable setting.

* NAACL 2018

Via

Access Paper or Ask Questions

XNMT: The eXtensible Neural Machine Translation Toolkit

Mar 01, 2018

Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard(+3 more)

Figure 1 for XNMT: The eXtensible Neural Machine Translation Toolkit

Figure 2 for XNMT: The eXtensible Neural Machine Translation Toolkit

Figure 3 for XNMT: The eXtensible Neural Machine Translation Toolkit

Abstract:This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of machine translation, speech recognition, and multi-tasked machine translation/parsing. XNMT is available open-source at https://github.com/neulab/xnmt

* To be presented at AMTA 2018 Open Source Software Showcase

Via

Access Paper or Ask Questions