ISSCO, University of Geneva
Abstract:In this era of rapid technological advancements, communication continues to evolve as new linguistic phenomena emerge. Among these is Arabizi, a hybrid form of Arabic that incorporates Latin characters and numbers to represent the spoken dialects of Arab communities. Arabizi is widely used on social media and allows people to communicate in an informal and dynamic way, but it poses significant challenges for machine translation due to its lack of formal structure and deeply embedded cultural nuances. This case study arises from a growing need to translate Arabizi for gisting purposes. It evaluates the capacity of different LLMs to decode and translate Arabizi, focusing on multiple Arabic dialects that have rarely been studied up until now. Using a combination of human evaluators and automatic metrics, this research project investigates the model's performance in translating Arabizi into both Modern Standard Arabic and English. Key questions explored include which dialects are translated most effectively and whether translations into English surpass those into Arabic.
Abstract:We present a new platform, "Regulus Lite", which supports rapid development and web deployment of several types of phrasal speech translation systems using a minimal formalism. A distinguishing feature is that most development work can be performed directly by domain experts. We motivate the need for platforms of this type and discuss three specific cases: medical speech translation, speech-to-sign-language translation and voice questionnaires. We briefly describe initial experiences in developing practical systems.
Abstract:This paper focusses on mental state adjectives and offers a unified analysis in the theory of Generative Lexicon (Pustejovsky, 1991, 1995). We show that, instead of enumerating the various syntactic constructions they enter into, with the different senses which arise, it is possible to give them a rich typed semantic representation which will explain both their semantic and syntactic polymorphism.
Abstract:We describe how substantial domain-independent language-processing systems for French and Spanish were quickly developed by manually adapting an existing English-language system, the SRI Core Language Engine. We explain the adaptation process in detail, and argue that it provides a fairly general recipe for converting a grammar-based system for English into a corresponding one for a Romance language.
Abstract:The paper argues the importance of high-quality translation for spoken language translation systems. It describes an architecture suitable for rapid development of high-quality limited-domain translation systems, which has been implemented within an advanced prototype English to French spoken language translator. The focus of the paper is the hybrid transfer model which combines unification-based rules and a set of trainable statistical preferences; roughly, rules encode domain-independent grammatical information and preferences encode domain-dependent distributional information. The preferences are trained from sets of examples produced by the system, which have been annotated by human judges as correct or incorrect. An experiment is described in which the model was tested on a 2000 utterance sample of previously unseen data.