Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ping Jian

TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Jun 23, 2025

Zhongbin Guo, Yuhao Wang, Ping Jian, Xinyue Chen, Wei Peng, Ertai E

Figure 1 for TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Figure 2 for TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Figure 3 for TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Figure 4 for TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Abstract:Satellite image time-series analysis demands fine-grained spatial-temporal reasoning, which remains a challenge for existing multimodal large language models (MLLMs). In this work, we study the capabilities of MLLMs on a novel task that jointly targets temporal change understanding and future scene generation, aiming to assess their potential for modeling complex multimodal dynamics over time. We propose TAMMs, a Temporal-Aware Multimodal Model for satellite image change understanding and forecasting, which enhances frozen MLLMs with lightweight temporal modules for structured sequence encoding and contextual prompting. To guide future image generation, TAMMs introduces a Semantic-Fused Control Injection (SFCI) mechanism that adaptively combines high-level semantic reasoning and structural priors within an enhanced ControlNet. This dual-path conditioning enables temporally consistent and semantically grounded image synthesis. Experiments demonstrate that TAMMs outperforms strong MLLM baselines in both temporal change understanding and future image forecasting tasks, highlighting how carefully designed temporal reasoning and semantic fusion can unlock the full potential of MLLMs for spatio-temporal understanding.

* Submitted to the 33rd ACM International Conference on Multimedia. Our dataset can be found at https://huggingface.co/datasets/IceInPot/TAMMs

Via

Access Paper or Ask Questions

Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

Sep 24, 2024

Chenxu Wang, Ping Jian, Zhen Yang

Figure 1 for Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

Figure 2 for Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

Figure 3 for Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

Figure 4 for Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

Abstract:Logical reading comprehension is a challenging task that entails grasping the underlying semantics of text and applying reasoning to deduce the correct answer. Prior researches have primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation. However, previous work constructing chain-of-thought rationales concentrates solely on analyzing correct options, neglecting the incorrect alternatives. Addtionally, earlier efforts on data augmentation by altering contexts rely on rule-based methods, which result in generated contexts that lack diversity and coherence. To address these issues, we propose a Premise-Oriented Data Augmentation (PODA) framework. This framework can generate CoT rationales including analyses for both correct and incorrect options, while constructing diverse and high-quality counterfactual contexts from incorrect candidate options. We integrate summarizing premises and identifying premises for each option into rationales. Subsequently, we employ multi-step prompts with identified premises to construct counterfactual context. To facilitate the model's capabilities to better differentiate the reasoning process associated with each option, we introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples. Experimental results on three representative LLMs demonstrate that our method can improve the baselines substantially across two challenging logical reasoning benchmarks (ReClor and LogiQA 2.0). The data and code are released at https://github.com/lalalamdbf/TPReasoner.

Via

Access Paper or Ask Questions

Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition

Nov 01, 2023

Chenxu Wang, Ping Jian, Mu Huang

Figure 1 for Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition

Figure 2 for Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition

Figure 3 for Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition

Figure 4 for Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition

Abstract:Implicit Discourse Relation Recognition (IDRR), which infers discourse relations without the help of explicit connectives, is still a crucial and challenging task for discourse parsing. Recent works tend to exploit the hierarchical structure information from the annotated senses, which demonstrate enhanced discourse relation representations can be obtained by integrating sense hierarchy. Nevertheless, the performance and robustness for IDRR are significantly constrained by the availability of annotated data. Fortunately, there is a wealth of unannotated utterances with explicit connectives, that can be utilized to acquire enriched discourse relation features. In light of such motivation, we propose a Prompt-based Logical Semantics Enhancement (PLSE) method for IDRR. Essentially, our method seamlessly injects knowledge relevant to discourse relation into pre-trained language models through prompt-based connective prediction. Furthermore, considering the prompt-based connective prediction exhibits local dependencies due to the deficiency of masked language model (MLM) in capturing global semantics, we design a novel self-supervised learning objective based on mutual information maximization to derive enhanced representations of logical semantics for IDRR. Experimental results on PDTB 2.0 and CoNLL16 datasets demonstrate that our method achieves outstanding and consistent performance against the current state-of-the-art models.

* This paper is accepted by the EMNLP 2023 Main Conference

Via

Access Paper or Ask Questions

Teach model to answer questions after comprehending the document

Jul 18, 2023

Ruiqing Sun, Ping Jian

Figure 1 for Teach model to answer questions after comprehending the document

Figure 2 for Teach model to answer questions after comprehending the document

Figure 3 for Teach model to answer questions after comprehending the document

Figure 4 for Teach model to answer questions after comprehending the document

Abstract:Multi-choice Machine Reading Comprehension (MRC) is a challenging extension of Natural Language Processing (NLP) that requires the ability to comprehend the semantics and logical relationships between entities in a given text. The MRC task has traditionally been viewed as a process of answering questions based on the given text. This single-stage approach has often led the network to concentrate on generating the correct answer, potentially neglecting the comprehension of the text itself. As a result, many prevalent models have faced challenges in performing well on this task when dealing with longer texts. In this paper, we propose a two-stage knowledge distillation method that teaches the model to better comprehend the document by dividing the MRC task into two separate stages. Our experimental results show that the student model, when equipped with our method, achieves significant improvements, demonstrating the effectiveness of our method.

Via

Access Paper or Ask Questions

MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for Answer Selection

Oct 10, 2020

Yingxue Zhang, Fandong Meng, Peng Li, Ping Jian, Jie Zhou

Figure 1 for MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for Answer Selection

Figure 2 for MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for Answer Selection

Figure 3 for MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for Answer Selection

Figure 4 for MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for Answer Selection

Abstract:As conventional answer selection (AS) methods generally match the question with each candidate answer independently, they suffer from the lack of matching information between the question and the candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates information from potentially correct candidate answers as extra evidence for matching the question with a candidate. In specific, we explicitly consider the potential correctness of candidates and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on two benchmarks, namely WikiQA and SemEval-2016 CQA, show that our model significantly outperforms existing methods that do not rely on external resources.

Via

Access Paper or Ask Questions

Tag Recommendation by Word-Level Tag Sequence Modeling

Nov 30, 2019

Xuewen Shi, Heyan Huang, Shuyang Zhao, Ping Jian, Yi-Kun Tang

Figure 1 for Tag Recommendation by Word-Level Tag Sequence Modeling

Figure 2 for Tag Recommendation by Word-Level Tag Sequence Modeling

Figure 3 for Tag Recommendation by Word-Level Tag Sequence Modeling

Figure 4 for Tag Recommendation by Word-Level Tag Sequence Modeling

Abstract:In this paper, we transform tag recommendation into a word-based text generation problem and introduce a sequence-to-sequence model. The model inherits the advantages of LSTM-based encoder for sequential modeling and attention-based decoder with local positional encodings for learning relations globally. Experimental results on Zhihu datasets illustrate the proposed model outperforms other state-of-the-art text classification based methods.

* This is a full length version of the paper in DASFAA 2019

Via

Access Paper or Ask Questions

Neural Chinese Word Segmentation as Sequence to Sequence Translation

Nov 29, 2019

Xuewen Shi, Heyan Huang, Ping Jian, Yuhang Guo, Xiaochi Wei, Yi-Kun Tang

Figure 1 for Neural Chinese Word Segmentation as Sequence to Sequence Translation

Figure 2 for Neural Chinese Word Segmentation as Sequence to Sequence Translation

Figure 3 for Neural Chinese Word Segmentation as Sequence to Sequence Translation

Figure 4 for Neural Chinese Word Segmentation as Sequence to Sequence Translation

Abstract:Recently, Chinese word segmentation (CWS) methods using neural networks have made impressive progress. Most of them regard the CWS as a sequence labeling problem which construct models based on local features rather than considering global information of input sequence. In this paper, we cast the CWS as a sequence translation problem and propose a novel sequence-to-sequence CWS model with an attention-based encoder-decoder framework. The model captures the global information from the input and directly outputs the segmented sequence. It can also tackle other NLP tasks with CWS jointly in an end-to-end mode. Experiments on Weibo, PKU and MSRA benchmark datasets show that our approach has achieved competitive performances compared with state-of-the-art methods. Meanwhile, we successfully applied our proposed model to jointly learning CWS and Chinese spelling correction, which demonstrates its applicability of multi-task fusion.

* In proceedings of SMP 2017 (Chinese National Conference on Social Media Processing)

Via

Access Paper or Ask Questions

Semantic Graph Convolutional Network for Implicit Discourse Relation Classification

Oct 21, 2019

Yingxue Zhang, Ping Jian, Fandong Meng, Ruiying Geng, Wei Cheng, Jie Zhou

Figure 1 for Semantic Graph Convolutional Network for Implicit Discourse Relation Classification

Figure 2 for Semantic Graph Convolutional Network for Implicit Discourse Relation Classification

Figure 3 for Semantic Graph Convolutional Network for Implicit Discourse Relation Classification

Figure 4 for Semantic Graph Convolutional Network for Implicit Discourse Relation Classification

Abstract:Implicit discourse relation classification is of great importance for discourse parsing, but remains a challenging problem due to the absence of explicit discourse connectives communicating these relations. Modeling the semantic interactions between the two arguments of a relation has proven useful for detecting implicit discourse relations. However, most previous approaches model such semantic interactions from a shallow interactive level, which is inadequate on capturing enough semantic information. In this paper, we propose a novel and effective Semantic Graph Convolutional Network (SGCN) to enhance the modeling of inter-argument semantics on a deeper interaction level for implicit discourse relation classification. We first build an interaction graph over representations of the two arguments, and then automatically extract in-depth semantic interactive information through graph convolution. Experimental results on the English corpus PDTB and the Chinese corpus CDTB both demonstrate the superiority of our model to previous state-of-the-art systems.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

Few-Shot Text Classification with Induction Network

Feb 27, 2019

Ruiying Geng, Binhua Li, Yongbin Li, Yuxiao Ye, Ping Jian, Jian Sun

Figure 1 for Few-Shot Text Classification with Induction Network

Figure 2 for Few-Shot Text Classification with Induction Network

Figure 3 for Few-Shot Text Classification with Induction Network

Figure 4 for Few-Shot Text Classification with Induction Network

Abstract:Text classification tends to struggle when data is deficient or when it needs to adapt to unseen classes. In such challenging scenarios, recent studies often use meta learning to simulate the few-shot task, in which new queries are compared to a small support set on a sample-wise level. However, this sample-wise comparison may be severely disturbed by the various expressions in the same class. Therefore, we should be able to learn a general representation of each class in the support set and then compare it to new queries. In this paper, we propose a novel Induction Network to learn such generalized class-wise representations, innovatively combining the dynamic routing algorithm with the typical meta learning framework. In this way, our model is able to induce from particularity to university, which is a more human-like learning approach. We evaluate our model on a well-studied sentiment classification dataset (English) and a real-world dialogue intent classification dataset (Chinese). Experiment results show that, on both datasets, our model significantly outperforms existing state-of-the-art models and improves the average accuracy by more than 3%, which proves the effectiveness of class-wise generalization in few-shot text classification.

* 7 pages, 3 figures

Via

Access Paper or Ask Questions