Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Philippe Muller

Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation

May 10, 2025

Galann Pennec, Zhengyuan Liu, Nicholas Asher, Philippe Muller, Nancy F. Chen

Abstract:Vision-Language Models (VLMs) often struggle to balance visual and textual information when summarizing complex multimodal inputs, such as entire TV show episodes. In this paper, we propose a zero-shot video-to-text summarization approach that builds its own screenplay representation of an episode, effectively integrating key video moments, dialogue, and character information into a unified document. Unlike previous approaches, we simultaneously generate screenplays and name the characters in zero-shot, using only the audio, video, and transcripts as input. Additionally, we highlight that existing summarization metrics can fail to assess the multimodal content in summaries. To address this, we introduce MFactSum, a multimodal metric that evaluates summaries with respect to both vision and text modalities. Using MFactSum, we evaluate our screenplay summaries on the SummScreen3D dataset, demonstrating superiority against state-of-the-art VLMs such as Gemini 1.5 by generating summaries containing 20% more relevant visual information while requiring 75% less of the video as input.

Via

Access Paper or Ask Questions

In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models

Aug 07, 2024

Ayrton San Joaquin, Bin Wang, Zhengyuan Liu, Nicholas Asher, Brian Lim, Philippe Muller, Nancy Chen

Abstract:Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and evaluation samples with a trained model. Notably, we assess the model's internal gradients to estimate this relationship, aiming to rank the contribution of each training point. To enhance efficiency, we propose an optimization to compute influence functions with a reduced number of layers while achieving similar accuracy. By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data. Meantime, using influence functions to analyze model coverage to certain testing samples could provide a reliable and interpretable signal on the training set's coverage of those test points.

Via

Access Paper or Ask Questions

DiscSense: Automated Semantic Analysis of Discourse Markers

Jun 02, 2020

Damien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller

Figure 1 for DiscSense: Automated Semantic Analysis of Discourse Markers

Figure 2 for DiscSense: Automated Semantic Analysis of Discourse Markers

Figure 3 for DiscSense: Automated Semantic Analysis of Discourse Markers

Figure 4 for DiscSense: Automated Semantic Analysis of Discourse Markers

Abstract:Discourse markers ({\it by contrast}, {\it happily}, etc.) are words or phrases that are used to signal semantic and/or pragmatic relationships between clauses or sentences. Recent work has fruitfully explored the prediction of discourse markers between sentence pairs in order to learn accurate sentence representations, that are useful in various classification tasks. In this work, we take another perspective: using a model trained to predict discourse markers between sentence pairs, we predict plausible markers between sentence pairs with a known semantic relation (provided by existing classification datasets). These predictions allow us to study the link between discourse markers and the semantic relations annotated in classification datasets. Handcrafted mappings have been proposed between markers and discourse relations on a limited set of markers and a limited set of categories, but there exist hundreds of discourse markers expressing a wide variety of relations, and there is no consensus on the taxonomy of relations between competing discourse theories (which are largely built in a top-down fashion). By using an automatic rediction method over existing semantically annotated datasets, we provide a bottom-up characterization of discourse markers in English. The resulting dataset, named DiscSense, is publicly available.

* Accepted at LREC2020

Via

Access Paper or Ask Questions

Discourse-Based Evaluation of Language Understanding

Jul 19, 2019

Damien Sileo, Tim Van-de-Cruys, Camille Pradel, Philippe Muller

Figure 1 for Discourse-Based Evaluation of Language Understanding

Figure 2 for Discourse-Based Evaluation of Language Understanding

Figure 3 for Discourse-Based Evaluation of Language Understanding

Abstract:We introduce DiscEval, a compilation of $11$ evaluation datasets with a focus on discourse, that can be used for evaluation of English Natural Language Understanding when considering meaning as use. We make the case that evaluation with discourse tasks is overlooked and that Natural Language Inference (NLI) pretraining may not lead to the learning really universal representations. DiscEval can also be used as supplementary training data for multi-task learning-based systems, and is publicly available, alongside the code for gathering and preprocessing the datasets.

Via

Access Paper or Ask Questions

Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Apr 04, 2019

Damien Sileo, Tim Van-De-Cruys, Camille Pradel, Philippe Muller

Figure 1 for Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Figure 2 for Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Figure 3 for Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Figure 4 for Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Abstract:Various NLP problems -- such as the prediction of sentence similarity, entailment, and discourse relations -- are all instances of the same general task: the modeling of semantic relations between a pair of textual elements. A popular model for such problems is to embed sentences into fixed size vectors, and use composition functions (e.g. concatenation or sum) of those vectors as features for the prediction. At the same time, composition of embeddings has been a main focus within the field of Statistical Relational Learning (SRL) whose goal is to predict relations between entities (typically from knowledge base triples). In this article, we show that previous work on relation prediction between texts implicitly uses compositions from baseline SRL models. We show that such compositions are not expressive enough for several tasks (e.g. natural language inference). We build on recent SRL models to address textual relational problems, showing that they are more expressive, and can alleviate issues from simpler compositions. The resulting models significantly improve the state of the art in both transferable sentence representation learning and relation prediction.

* Camera-ready for *SEM 2019

Via

Access Paper or Ask Questions

Mining Discourse Markers for Unsupervised Sentence Representation Learning

Mar 28, 2019

Damien Sileo, Tim Van-De-Cruys, Camille Pradel, Philippe Muller

Figure 1 for Mining Discourse Markers for Unsupervised Sentence Representation Learning

Figure 2 for Mining Discourse Markers for Unsupervised Sentence Representation Learning

Figure 3 for Mining Discourse Markers for Unsupervised Sentence Representation Learning

Figure 4 for Mining Discourse Markers for Unsupervised Sentence Representation Learning

Abstract:Current state of the art systems in NLP heavily rely on manually annotated datasets, which are expensive to construct. Very little work adequately exploits unannotated data -- such as discourse markers between sentences -- mainly because of data sparseness and ineffective extraction methods. In the present work, we propose a method to automatically discover sentence pairs with relevant discourse markers, and apply it to massive amounts of data. Our resulting dataset contains 174 discourse markers with at least 10k examples each, even for rare markers such as coincidentally or amazingly We use the resulting data as supervision for learning transferable sentence embeddings. In addition, we show that even though sentence representation learning through prediction of discourse markers yields state of the art results across different transfer tasks, it is not clear that our models made use of the semantic relation between sentences, thus leaving room for further improvements. Our datasets are publicly available (https://github.com/synapse-developpement/Discovery)

* Camera-ready for NAACL HLT 2019

Via

Access Paper or Ask Questions

Synapse at CAp 2017 NER challenge: Fasttext CRF

Sep 14, 2017

Damien Sileo, Camille Pradel, Philippe Muller, Tim Van de Cruys

Figure 1 for Synapse at CAp 2017 NER challenge: Fasttext CRF

Figure 2 for Synapse at CAp 2017 NER challenge: Fasttext CRF

Figure 3 for Synapse at CAp 2017 NER challenge: Fasttext CRF

Abstract:We present our system for the CAp 2017 NER challenge which is about named entity recognition on French tweets. Our system leverages unsupervised learning on a larger dataset of French tweets to learn features feeding a CRF model. It was ranked first without using any gazetteer or structured external data, with an F-measure of 58.89\%. To the best of our knowledge, it is the first system to use fasttext embeddings (which include subword representations) and an embedding-based sentence representation for NER.

* CAP2017

Via

Access Paper or Ask Questions

Evaluating Temporal Graphs Built from Texts via Transitive Reduction

Jan 16, 2014

Xavier Tannier, Philippe Muller

Figure 1 for Evaluating Temporal Graphs Built from Texts via Transitive Reduction

Figure 2 for Evaluating Temporal Graphs Built from Texts via Transitive Reduction

Figure 3 for Evaluating Temporal Graphs Built from Texts via Transitive Reduction

Figure 4 for Evaluating Temporal Graphs Built from Texts via Transitive Reduction

Abstract:Temporal information has been the focus of recent attention in information extraction, leading to some standardization effort, in particular for the task of relating events in a text. This task raises the problem of comparing two annotations of a given text, because relations between events in a story are intrinsically interdependent and cannot be evaluated separately. A proper evaluation measure is also crucial in the context of a machine learning approach to the problem. Finding a common comparison referent at the text level is not obvious, and we argue here in favor of a shift from event-based measures to measures on a unique textual object, a minimal underlying temporal graph, or more formally the transitive reduction of the graph of relations between event boundaries. We support it by an investigation of its properties on synthetic data and on a well-know temporal corpus.

* Journal Of Artificial Intelligence Research, Volume 40, pages 375-413, 2011

Via

Access Paper or Ask Questions

Plausible reasoning from spatial observations

Jan 10, 2013

Jerome Lang, Philippe Muller

Figure 1 for Plausible reasoning from spatial observations

Figure 2 for Plausible reasoning from spatial observations

Figure 3 for Plausible reasoning from spatial observations

Figure 4 for Plausible reasoning from spatial observations

Abstract:This article deals with plausible reasoning from incomplete knowledge about large-scale spatial properties. The availableinformation, consisting of a set of pointwise observations,is extrapolated to neighbour points. We make use of belief functions to represent the influence of the knowledge at a given point to another point; the quantitative strength of this influence decreases when the distance between both points increases. These influences arethen aggregated using a variant of Dempster's rule of combination which takes into account the relative dependence between observations.

* Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

Via

Access Paper or Ask Questions

Learning Recursive Segments for Discourse Parsing

Mar 28, 2010

Stergos Afantenos, Pascal Denis, Philippe Muller, Laurence Danlos

Abstract:Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse like SDRT allows for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques combined with a simple repairing heuristic that enforces global coherence. Our system was developed and evaluated on the first round of annotations provided by the French Annodis project (an ongoing effort to create a discourse bank for French). Cross-validated on only 47 documents (1,445 EDUs), our system achieves encouraging performance results with an F-score of 73% for finding EDUs.

* published at LREC 2010

Via

Access Paper or Ask Questions