Abstract:In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
Abstract:The recently introduced independent fluctuating two-ray (IFTR) fading model, consisting of two specular components fluctuating independently plus a diffuse component, has proven to provide an excellent fit to different wireless environments, including the millimeter-wave band. However, the original formulations of the probability density function (PDF) and cumulative distribution function (CDF) of this model are not applicable to all possible values of its defining parameters, and are given in terms of multifold generalized hypergeometric functions, which prevents their widespread use for the derivation of performance metric expressions. In this paper we present a new formulation of the IFTR model as a countable mixture of Gamma distributions which greatly facilitates the performance evaluation for this model in terms of the metrics already known for the much simpler and widely used Nakagami-m fading. Additionally, a closed-form expression is presented for the generalized moment generating function (GMGF), which permits to readily obtain all the moments of the distribution of the model, as well as several relevant performance metrics. Based on these new derivations, the IFTR model is evaluated for the average channel capacity, the outage probability with and without co-channel interference, and the bit error rate (BER), which are verified by Monte Carlo simulations.
Abstract:We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.