Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Atsuhiro Takasu

Table-Text Alignment: Explaining Claim Verification Against Tables in Scientific Papers

Jun 12, 2025

Xanh Ho, Sunisth Kumar, Yun-Ang Wu, Florian Boudin, Atsuhiro Takasu, Akiko Aizawa

Abstract:Scientific claim verification against tables typically requires predicting whether a claim is supported or refuted given a table. However, we argue that predicting the final label alone is insufficient: it reveals little about the model's reasoning and offers limited interpretability. To address this, we reframe table-text alignment as an explanation task, requiring models to identify the table cells essential for claim verification. We build a new dataset by extending the SciTab benchmark with human-annotated cell-level rationales. Annotators verify the claim label and highlight the minimal set of cells needed to support their decision. After the annotation process, we utilize the collected information and propose a taxonomy for handling ambiguous cases. Our experiments show that (i) incorporating table alignment information improves claim verification performance, and (ii) most LLMs, while often predicting correct labels, fail to recover human-aligned rationales, suggesting that their predictions do not stem from faithful reasoning.

* 8 pages; code and data are available at https://github.com/Alab-NII/SciTabAlign

Via

Access Paper or Ask Questions

An Encoding--Searching Separation Perspective on Bi-Encoder Neural Search

Aug 02, 2024

Hung-Nghiep Tran, Akiko Aizawa, Atsuhiro Takasu

Abstract:This paper reviews, analyzes, and proposes a new perspective on the bi-encoder architecture for neural search. While the bi-encoder architecture is widely used due to its simplicity and scalability at test time, it has some notable issues such as low performance on seen datasets and weak zero-shot performance on new datasets. In this paper, we analyze these issues and summarize two main critiques: the encoding information bottleneck problem and limitations of the basic assumption of embedding search. We then construct a thought experiment to logically analyze the encoding and searching operations and challenge the basic assumption of embedding search. Building on these observations, we propose a new perspective on the bi-encoder architecture called the \textit{encoding--searching separation} perspective, which conceptually and practically separates the encoding and searching operations. This new perspective is applied to explain the root cause of the identified issues and discuss ways to mitigate the problems. Finally, we discuss the implications of the ideas underlying the new perspective, the design surface that it exposes and the potential research directions arising from it.

Via

Access Paper or Ask Questions

On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest

Dec 16, 2023

Tatsuya Akutsu, Avraham A. Melkman, Atsuhiro Takasu

Abstract:In this paper, we focus on the prediction phase of a random forest and study the problem of representing a bag of decision trees using a smaller bag of decision trees, where we only consider binary decision problems on the binary domain and simple decision trees in which an internal node is limited to querying the Boolean value of a single variable. As a main result, we show that the majority function of $n$ variables can be represented by a bag of $T$ ($< n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$ and $T$ must be odd (in order to avoid the tie break). We also show that a bag of $n$ decision trees can be represented by a bag of $T$ decision trees each with polynomial size if $n-T$ is a constant and a small classification error is allowed. A related result on the $k$-out-of-$n$ functions is presented too.

Via

Access Paper or Ask Questions

Melody-conditioned lyrics generation via fine-tuning language model and its evaluation with ChatGPT

Oct 02, 2023

Zhe Zhang, Karol Lasocki, Yi Yu, Atsuhiro Takasu

Abstract:We leverage character-level language models for syllable-level lyrics generation from symbolic melody. By fine-tuning a character-level pre-trained model, we integrate language knowledge into the beam search of a syllable-level Transformer generator. Using ChatGPT-based evaluations, we demonstrate enhanced coherence and correctness in the generated lyrics.

Via

Access Paper or Ask Questions

Controllable Lyrics-to-Melody Generation

Jun 05, 2023

Zhe Zhang, Yi Yu, Atsuhiro Takasu

Abstract:Lyrics-to-melody generation is an interesting and challenging topic in AI music research field. Due to the difficulty of learning the correlations between lyrics and melody, previous methods suffer from low generation quality and lack of controllability. Controllability of generative models enables human interaction with models to generate desired contents, which is especially important in music generation tasks towards human-centered AI that can facilitate musicians in creative activities. To address these issues, we propose a controllable lyrics-to-melody generation network, ConL2M, which is able to generate realistic melodies from lyrics in user-desired musical style. Our work contains three main novelties: 1) To model the dependencies of music attributes cross multiple sequences, inter-branch memory fusion (Memofu) is proposed to enable information flow between multi-branch stacked LSTM architecture; 2) Reference style embedding (RSE) is proposed to improve the quality of generation as well as control the musical style of generated melodies; 3) Sequence-level statistical loss (SeqLoss) is proposed to help the model learn sequence-level features of melodies given lyrics. Verified by evaluation metrics for music quality and controllability, initial study of controllable lyrics-to-melody generation shows better generation quality and the feasibility of interacting with users to generate the melodies in desired musical styles when given lyrics.

Via

Access Paper or Ask Questions

An End-to-End Multi-Task Learning Model for Image-based Table Recognition

Mar 29, 2023

Nam Tuan Ly, Atsuhiro Takasu

Abstract:Image-based table recognition is a challenging task due to the diversity of table styles and the complexity of table structures. Most of the previous methods focus on a non-end-to-end approach which divides the problem into two separate sub-problems: table structure recognition; and cell-content recognition and then attempts to solve each sub-problem independently using two separate systems. In this paper, we propose an end-to-end multi-task learning model for image-based table recognition. The proposed model consists of one shared encoder, one shared decoder, and three separate decoders which are used for learning three sub-tasks of table recognition: table structure recognition, cell detection, and cell-content recognition. The whole system can be easily trained and inferred in an end-to-end approach. In the experiments, we evaluate the performance of the proposed model on two large-scale datasets: FinTabNet and PubTabNet. The experiment results show that the proposed model outperforms the state-of-the-art methods in all benchmark datasets.

* VISIGRAPP2023 - Volume 5: VISAPP, pages 626-634
* 10 pages, VISAPP2023. arXiv admin note: substantial text overlap with arXiv:2303.07641

Via

Access Paper or Ask Questions

TabIQA: Table Questions Answering on Business Document Images

Mar 27, 2023

Phuc Nguyen, Nam Tuan Ly, Hideaki Takeda, Atsuhiro Takasu

Figure 1 for TabIQA: Table Questions Answering on Business Document Images

Figure 2 for TabIQA: Table Questions Answering on Business Document Images

Figure 3 for TabIQA: Table Questions Answering on Business Document Images

Figure 4 for TabIQA: Table Questions Answering on Business Document Images

Abstract:Table answering questions from business documents has many challenges that require understanding tabular structures, cross-document referencing, and additional numeric computations beyond simple search queries. This paper introduces a novel pipeline, named TabIQA, to answer questions about business document images. TabIQA combines state-of-the-art deep learning techniques 1) to extract table content and structural information from images and 2) to answer various questions related to numerical data, text-based information, and complex queries from structured tables. The evaluation results on VQAonBD 2023 dataset demonstrate the effectiveness of TabIQA in achieving promising performance in answering table-related questions. The TabIQA repository is available at https://github.com/phucty/itabqa.

* First two authors contributed equally

Via

Access Paper or Ask Questions

Rethinking Image-based Table Recognition Using Weakly Supervised Methods

Mar 14, 2023

Nam Tuan Ly, Atsuhiro Takasu, Phuc Nguyen, Hideaki Takeda

Figure 1 for Rethinking Image-based Table Recognition Using Weakly Supervised Methods

Figure 2 for Rethinking Image-based Table Recognition Using Weakly Supervised Methods

Figure 3 for Rethinking Image-based Table Recognition Using Weakly Supervised Methods

Figure 4 for Rethinking Image-based Table Recognition Using Weakly Supervised Methods

Abstract:Most of the previous methods for table recognition rely on training datasets containing many richly annotated table images. Detailed table image annotation, e.g., cell or text bounding box annotation, however, is costly and often subjective. In this paper, we propose a weakly supervised model named WSTabNet for table recognition that relies only on HTML (or LaTeX) code-level annotations of table images. The proposed model consists of three main parts: an encoder for feature extraction, a structure decoder for generating table structure, and a cell decoder for predicting the content of each cell in the table. Our system is trained end-to-end by stochastic gradient descent algorithms, requiring only table images and their ground-truth HTML (or LaTeX) representations. To facilitate table recognition with deep learning, we create and release WikiTableSet, the largest publicly available image-based table recognition dataset built from Wikipedia. WikiTableSet contains nearly 4 million English table images, 590K Japanese table images, and 640k French table images with corresponding HTML representation and cell bounding boxes. The extensive experiments on WikiTableSet and two large-scale datasets: FinTabNet and PubTabNet demonstrate that the proposed weakly supervised model achieves better, or similar accuracies compared to the state-of-the-art models on all benchmark datasets.

* ICPRAM2023, pages 872-880, 2023
* 10 pages, ICPRAM2023

Via

Access Paper or Ask Questions

MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction

Oct 04, 2022

Hung Nghiep Tran, Atsuhiro Takasu

Figure 1 for MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction

Figure 2 for MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction

Figure 3 for MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction

Figure 4 for MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction

Abstract:Knowledge graph embedding aims to predict the missing relations between entities in knowledge graphs. Tensor-decomposition-based models, such as ComplEx, provide a good trade-off between efficiency and expressiveness, that is crucial because of the large size of real world knowledge graphs. The recent multi-partition embedding interaction (MEI) model subsumes these models by using the block term tensor format and provides a systematic solution for the trade-off. However, MEI has several drawbacks, some of which carried from its subsumed tensor-decomposition-based models. In this paper, we address these drawbacks and introduce the Multi-partition Embedding Interaction iMproved beyond block term format (MEIM) model, with independent core tensor for ensemble effects and soft orthogonality for max-rank mapping, in addition to multi-partition embedding. MEIM improves expressiveness while still being highly efficient, helping it to outperform strong baselines and achieve state-of-the-art results on difficult link prediction benchmarks using fairly small embedding sizes. The source code is released at https://github.com/tranhungnghiep/MEIM-KGE.

* Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2022
* Accepted at the International Joint Conference on Artificial Intelligence (IJCAI), 2022; add appendix with extra experiments

Via

Access Paper or Ask Questions

Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Jun 22, 2021

Tung Doan, Atsuhiro Takasu

Figure 1 for Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Figure 2 for Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Figure 3 for Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Figure 4 for Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Abstract:Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A popular algorithm for optimally solving this problem is dynamic programming (DP), which has quadratic computation and memory requirements. Given that sequences in practice are too long, this algorithm is not a practical approach. Although many heuristic algorithms have been proposed to approximate the optimal segmentation, they have no guarantee on the quality of their solutions. In this paper, we take a differentiable approach to alleviate the aforementioned issues. First, we introduce a novel sigmoid-based regularization to smoothly approximate the combinatorial constraints. Combining it with objective of the balanced kernel clustering, we formulate a differentiable model termed Kernel clustering with sigmoid-based regularization (KCSR), where the gradient-based algorithm can be exploited to obtain the optimal segmentation. Second, we develop a stochastic variant of the proposed model. By using the stochastic gradient descent algorithm, which has much lower time and space complexities, for optimization, the second model can perform segmentation on overlong data sequences. Finally, for simultaneously segmenting multiple data sequences, we slightly modify the sigmoid-based regularization to further introduce an extended variant of the proposed model. Through extensive experiments on various types of data sequences performances of our models are evaluated and compared with those of the existing methods. The experimental results validate advantages of the proposed models. Our Matlab source code is available on github.

Via

Access Paper or Ask Questions