Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanna Suominen

TULUN: Transparent and Adaptable Low-resource Machine Translation

May 24, 2025

Raphaël Merx, Hanna Suominen, Lois Hong, Nick Thieberger, Trevor Cohn, Ekaterina Vylomova

Abstract:Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for non-technical users and small organizations. To address this gap, we propose Tulun, a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy. Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation tasks for Tetun and Bislama, our system achieves improvements of 16.90-22.41 ChrF++ points over baseline MT systems. Across six low-resource languages on the FLORES dataset, Tulun outperforms both standalone MT and LLM approaches, achieving an average improvement of 2.8 ChrF points over NLLB-54B.

Via

Access Paper or Ask Questions

Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service

Nov 19, 2024

Raphael Merx, Hanna Suominen, Adérito José Guterres Correia, Trevor Cohn, Ekaterina Vylomova

Abstract:The impact of machine translation (MT) on low-resource languages remains poorly understood. In particular, observational studies of actual usage patterns are scarce. Such studies could provide valuable insights into user needs and behaviours, complementing survey-based methods. Here we present an observational analysis of real-world MT usage for Tetun, the lingua franca of Timor-Leste, using server logs from a widely-used MT service with over $70,000$ monthly active users. Our analysis of $100,000$ translation requests reveals patterns that challenge assumptions based on existing corpora. We find that users, many of them students on mobile devices, typically translate short texts into Tetun across diverse domains including science, healthcare, and daily life. This contrasts sharply with available Tetun corpora, which are dominated by news articles covering government and social issues. Our results suggest that MT systems for languages like Tetun should prioritise translating into the low-resource language, handling brief inputs effectively, and covering a wide range of domains relevant to educational contexts. More broadly, this study demonstrates how observational analysis can inform low-resource language technology development, by grounding research in practical community needs.

Via

Access Paper or Ask Questions

Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation

Apr 16, 2024

Sam Cantrill, David Ahmedt-Aristizabal, Lars Petersson, Hanna Suominen, Mohammad Ali Armin

Abstract:Camera-based remote photoplethysmography (rPPG) enables contactless measurement of important physiological signals such as pulse rate (PR). However, dynamic and unconstrained subject motion introduces significant variability into the facial appearance in video, confounding the ability of video-based methods to accurately extract the rPPG signal. In this study, we leverage the 3D facial surface to construct a novel orientation-conditioned facial texture video representation which improves the motion robustness of existing video-based facial rPPG estimation methods. Our proposed method achieves a significant 18.2% performance improvement in cross-dataset testing on MMPD over our baseline using the PhysNet model trained on PURE, highlighting the efficacy and generalization benefits of our designed video representation. We demonstrate significant performance improvements of up to 29.6% in all tested motion scenarios in cross-dataset testing on MMPD, even in the presence of dynamic and unconstrained subject motion, emphasizing the benefits of disentangling motion through modeling the 3D facial surface for motion robust facial rPPG estimation. We validate the efficacy of our design decisions and the impact of different video processing steps through an ablation study. Our findings illustrate the potential strengths of exploiting the 3D facial surface as a general strategy for addressing dynamic and unconstrained subject motion in videos. The code is available at https://samcantrill.github.io/orientation-uv-rppg/.

* 12 pages, 8 figures, 6 tables; corrected abstract typo

Via

Access Paper or Ask Questions

Zero- and Few-Shots Knowledge Graph Triplet Extraction with Large Language Models

Dec 04, 2023

Andrea Papaluca, Daniel Krefl, Sergio Mendez Rodriguez, Artem Lensky, Hanna Suominen

Abstract:In this work, we tested the Triplet Extraction (TE) capabilities of a variety of Large Language Models (LLMs) of different sizes in the Zero- and Few-Shots settings. In detail, we proposed a pipeline that dynamically gathers contextual information from a Knowledge Base (KB), both in the form of context triplets and of (sentence, triplets) pairs as examples, and provides it to the LLM through a prompt. The additional context allowed the LLMs to be competitive with all the older fully trained baselines based on the Bidirectional Long Short-Term Memory (BiLSTM) Network architecture. We further conducted a detailed analysis of the quality of the gathered KB context, finding it to be strongly correlated with the final TE performance of the model. In contrast, the size of the model appeared to only logarithmically improve the TE capabilities of the LLMs.

Via

Access Paper or Ask Questions

Comparing Deep Learning Models for the Task of Volatility Prediction Using Multivariate Data

Jun 23, 2023

Wenbo Ge, Pooia Lalbakhsh, Leigh Isai, Artem Lensky, Hanna Suominen

Abstract:This study aims to compare multiple deep learning-based forecasters for the task of predicting volatility using multivariate data. The paper evaluates a range of models, starting from simpler and shallower ones and progressing to deeper and more complex architectures. Additionally, the performance of these models is compared against naive predictions and variations of classical GARCH models. The prediction of volatility for five assets, namely S&P500, NASDAQ100, gold, silver, and oil, is specifically addressed using GARCH models, Multi-Layer Perceptrons, Recurrent Neural Networks, Temporal Convolutional Networks, and the Temporal Fusion Transformer. In the majority of cases, the Temporal Fusion Transformer, followed by variants of the Temporal Convolutional Network, outperformed classical approaches and shallow networks. These experiments were repeated, and the differences observed between the competing models were found to be statistically significant, thus providing strong encouragement for their practical application.

Via

Access Paper or Ask Questions

Enhancing Clinical Information Extraction with Transferred Contextual Embeddings

Sep 22, 2021

Zimin Wan, Chenchen Xu, Hanna Suominen

Figure 1 for Enhancing Clinical Information Extraction with Transferred Contextual Embeddings

Figure 2 for Enhancing Clinical Information Extraction with Transferred Contextual Embeddings

Figure 3 for Enhancing Clinical Information Extraction with Transferred Contextual Embeddings

Figure 4 for Enhancing Clinical Information Extraction with Transferred Contextual Embeddings

Abstract:The Bidirectional Encoder Representations from Transformers (BERT) model has achieved the state-of-the-art performance for many natural language processing (NLP) tasks. Yet, limited research has been contributed to studying its effectiveness when the target domain is shifted from the pre-training corpora, for example, for biomedical or clinical NLP applications. In this paper, we applied it to a widely studied a hospital information extraction (IE) task and analyzed its performance under the transfer learning setting. Our application became the new state-of-the-art result by a clear margin, compared with a range of existing IE models. Specifically, on this nursing handover data set, the macro-average F1 score from our model was 0.438, whilst the previous best deep learning models had 0.416. In conclusion, we showed that BERT based pre-training models can be transferred to health-related documents under mild conditions and with a proper fine-tuning process.

* 6 pages, 4 figures

Via

Access Paper or Ask Questions

Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

Aug 23, 2021

Haozhan Sun, Chenchen Xu, Hanna Suominen

Figure 1 for Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

Figure 2 for Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

Figure 3 for Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

Figure 4 for Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

Abstract:Well-annotated datasets, as shown in recent top studies, are becoming more important for researchers than ever before in supervised machine learning (ML). However, the dataset annotation process and its related human labor costs remain overlooked. In this work, we analyze the relationship between the annotation granularity and ML performance in sequence labeling, using clinical records from nursing shift-change handover. We first study a model derived from textual language features alone, without additional information based on nursing knowledge. We find that this sequence tagger performs well in most categories under this granularity. Then, we further include the additional manual annotations by a nurse, and find the sequence tagging performance remaining nearly the same. Finally, we give a guideline and reference to the community arguing it is not necessary and even not recommended to annotate in detailed granularity because of a low Return on Investment. Therefore we recommend emphasizing other features, like textual knowledge, for researchers and practitioners as a cost-effective source for increasing the sequence labeling performance.

Via

Access Paper or Ask Questions

ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

Mar 07, 2021

Dongxu Li, Chenchen Xu, Kaihao Zhang, Xin Yu, Yiran Zhong, Wenqi Ren, Hanna Suominen, Hongdong Li

Figure 1 for ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

Figure 2 for ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

Figure 3 for ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

Figure 4 for ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

Abstract:Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions. In order to utilize neighboring sharp patches, typical methods rely mainly on homography or optical flows to spatially align neighboring blurry frames. However, such explicit approaches are less effective in the presence of fast motions with large pixel displacements. In this work, we propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space. To construct distant pixel correspondences, our model builds a correlation volume pyramid among all the pixel-pairs between neighboring frames. To enhance the features of the reference frame, we design a correlative aggregation module that maximizes the pixel-pair correlations with its neighbors based on the volume pyramid. Finally, we feed the aggregated features into a reconstruction module to obtain the restored frame. We design a generative adversarial paradigm to optimize the model progressively. Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) Dataset for Video Deblurring (HFR-DVD). Quantitative and qualitative experiments show that our model performs favorably on both datasets against previous state-of-the-art methods, confirming the benefit of modeling all-range spatial correspondence for video deblurring.

* Preprint for CVPR 2021 Poster

Via

Access Paper or Ask Questions

Learning to Continually Learn Rapidly from Few and Noisy Data

Mar 06, 2021

Nicholas I-Hsien Kuo, Mehrtash Harandi, Nicolas Fourrier, Christian Walder, Gabriela Ferraro, Hanna Suominen

Figure 1 for Learning to Continually Learn Rapidly from Few and Noisy Data

Figure 2 for Learning to Continually Learn Rapidly from Few and Noisy Data

Figure 3 for Learning to Continually Learn Rapidly from Few and Noisy Data

Figure 4 for Learning to Continually Learn Rapidly from Few and Noisy Data

Abstract:Neural networks suffer from catastrophic forgetting and are unable to sequentially learn new tasks without guaranteed stationarity in data distribution. Continual learning could be achieved via replay -- by concurrently training externally stored old data while learning a new task. However, replay becomes less effective when each past task is allocated with less memory. To overcome this difficulty, we supplemented replay mechanics with meta-learning for rapid knowledge acquisition. By employing a meta-learner, which \textit{learns a learning rate per parameter per past task}, we found that base learners produced strong results when less memory was available. Additionally, our approach inherited several meta-learning advantages for continual learning: it demonstrated strong robustness to continually learn under the presence of noises and yielded base learners to higher accuracy in less updates.

* Accepted to the Meta-Learning and Co-Hosted Competition of AAAI 2021. See https://aaai.org/Conferences/AAAI-21/ws21workshops/ and see https://sites.google.com/chalearn.org/metalearning?pli=1#h.kt23ep5wlehv

Via

Access Paper or Ask Questions

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Oct 12, 2020

Dongxu Li, Chenchen Xu, Xin Yu, Kaihao Zhang, Ben Swift, Hanna Suominen, Hongdong Li

Figure 1 for TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Figure 2 for TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Figure 3 for TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Figure 4 for TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Abstract:Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences. Sign videos consist of continuous sequences of sign gestures with no clear boundaries in between. Existing SLT models usually represent sign visual features in a frame-wise manner so as to avoid needing to explicitly segmenting the videos into isolated signs. However, these methods neglect the temporal information of signs and lead to substantial ambiguity in translation. In this paper, we explore the temporal semantic structures of signvideos to learn more discriminative features. To this end, we first present a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation. Taking advantage of the proposed segment representation, we develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet. Specifically, TSPNet introduces an inter-scale attention to evaluate and enhance local semantic consistency of sign segments and an intra-scale attention to resolve semantic ambiguity by using non-local video context. Experiments show that our TSPNet outperforms the state-of-the-art with significant improvements on the BLEU score (from 9.58 to 13.41) and ROUGE score (from 31.80 to 34.96)on the largest commonly-used SLT dataset. Our implementation is available at https://github.com/verashira/TSPNet.

* NeurIPS 2020 preprint

Via

Access Paper or Ask Questions