Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhunchen Luo

DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

Mar 10, 2025

Ming Wang, Fang Wang, Minghao Hu, Li He, Haiyang Wang, Jun Zhang, Tianwei Yan, Li Li, Zhunchen Luo, Wei Luo(+2 more)

Abstract:Long-form article generation (LFAG) presents challenges such as maintaining logical consistency, comprehensive topic coverage, and narrative coherence across extended articles. Existing datasets often lack both the hierarchical structure and fine-grained annotation needed to effectively decompose tasks, resulting in shallow, disorganized article generation. To address these limitations, we introduce DeFine, a Decomposed and Fine-grained annotated dataset for long-form article generation. DeFine is characterized by its hierarchical decomposition strategy and the integration of domain-specific knowledge with multi-level annotations, ensuring granular control and enhanced depth in article generation. To construct the dataset, a multi-agent collaborative pipeline is proposed, which systematically segments the generation process into four parts: Data Miner, Cite Retreiver, Q&A Annotator and Data Cleaner. To validate the effectiveness of DeFine, we designed and tested three LFAG baselines: the web retrieval, the local retrieval, and the grounded reference. We fine-tuned the Qwen2-7b-Instruct model using the DeFine training dataset. The experimental results showed significant improvements in text quality, specifically in topic coverage, depth of information, and content fidelity. Our dataset publicly available to facilitate future research.

Via

Access Paper or Ask Questions

Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Jul 25, 2019

Hai Ye, Zhunchen Luo

Figure 1 for Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Figure 2 for Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Figure 3 for Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Figure 4 for Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Abstract:Knowledge base provides a potential way to improve the intelligence of information retrieval (IR) systems, for that knowledge base has numerous relations between entities which can help the IR systems to conduct inference from one entity to another entity. Relation extraction is one of the fundamental techniques to construct a knowledge base. Distant supervision is a semi-supervised learning method for relation extraction which learns with labeled and unlabeled data. However, this approach suffers the problem of relation overlapping in which one entity tuple may have multiple relation facts. We believe that relation types can have latent connections, which we call class ties, and can be exploited to enhance relation extraction. However, this property between relation classes has not been fully explored before. In this paper, to exploit class ties between relations to improve relation extraction, we propose a general ranking based multi-label learning framework combined with convolutional neural networks, in which ranking based loss functions with regularization technique are introduced to learn the latent connections between relations. Furthermore, to deal with the problem of class imbalance in distant supervision relation extraction, we further adopt cost-sensitive learning to rescale the costs from the positive and negative labels. Extensive experiments on a widely used dataset show the effectiveness of our model to exploit class ties and to relieve class imbalance problem.

* Preprint submitted to Journal of Information Processing and Management. arXiv admin note: text overlap with arXiv:1612.07602

Via

Access Paper or Ask Questions

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Oct 23, 2018

Xiao Liu, Zhunchen Luo, Heyan Huang

Figure 1 for Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Figure 2 for Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Figure 3 for Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Abstract:Event extraction is of practical utility in natural language processing. In the real world, it is a common phenomenon that multiple events existing in the same sentence, where extracting them are more difficult than extracting a single event. Previous works on modeling the associations between events by sequential modeling methods suffer a lot from the low efficiency in capturing very long-range dependencies. In this paper, we propose a novel Jointly Multiple Events Extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information flow and attention-based graph convolution networks to model graph information. The experiment results demonstrate that our proposed framework achieves competitive results compared with state-of-the-art methods.

* EMNLP. 1 (2018) 1247-1256
* accepted by EMNLP 2018

Via

Access Paper or Ask Questions

Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions

Feb 23, 2018

Hai Ye, Xin Jiang, Zhunchen Luo, Wenhan Chao

Figure 1 for Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions

Figure 2 for Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions

Figure 3 for Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions

Figure 4 for Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions

Abstract:In this paper, we propose to study the problem of COURT VIEW GENeration from the fact description in a criminal case. The task aims to improve the interpretability of charge prediction systems and help automatic legal document generation. We formulate this task as a text-to-text natural language generation (NLG) problem. Sequenceto-sequence model has achieved cutting-edge performances in many NLG tasks. However, due to the non-distinctions of fact descriptions, it is hard for Seq2Seq model to generate charge-discriminative court views. In this work, we explore charge labels to tackle this issue. We propose a label-conditioned Seq2Seq model with attention for this problem, to decode court views conditioned on encoded charge labels. Experimental results show the effectiveness of our method.

* To appear in NAACL 2018, Long paper

Via

Access Paper or Ask Questions

Jointly Extracting Relations with Class Ties via Effective Deep Ranking

Aug 05, 2017

Hai Ye, Wenhan Chao, Zhunchen Luo, Zhoujun Li

Figure 1 for Jointly Extracting Relations with Class Ties via Effective Deep Ranking

Figure 2 for Jointly Extracting Relations with Class Ties via Effective Deep Ranking

Figure 3 for Jointly Extracting Relations with Class Ties via Effective Deep Ranking

Figure 4 for Jointly Extracting Relations with Class Ties via Effective Deep Ranking

Abstract:Connections between relations in relation extraction, which we call class ties, are common. In distantly supervised scenario, one entity tuple may have multiple relation facts. Exploiting class ties between relations of one entity tuple will be promising for distantly supervised relation extraction. However, previous models are not effective or ignore to model this property. In this work, to effectively leverage class ties, we propose to make joint relation extraction with a unified model that integrates convolutional neural network (CNN) with a general pairwise ranking framework, in which three novel ranking loss functions are introduced. Additionally, an effective method is presented to relieve the severe class imbalance problem from NR (not relation) for model training. Experiments on a widely used dataset show that leveraging class ties will enhance extraction and demonstrate the effectiveness of our model to learn class ties. Our model outperforms the baselines significantly, achieving state-of-the-art performance.

* To appear in ACL2017

Via

Access Paper or Ask Questions