Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anand Rao

Dima

Gemma 3 Technical Report

Mar 25, 2025

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière(+202 more)

Abstract:We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

Via

Access Paper or Ask Questions

An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement

Nov 29, 2024

Saurabh Mishra, Mahendra Shinde, Aniket Yadav, Bilal Ayyub, Anand Rao

Figure 1 for An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement

Figure 2 for An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement

Figure 3 for An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement

Figure 4 for An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement

Abstract:Infrastructure construction, often dubbed an "industry of industries," is closely linked with government spending and public procurement, offering significant opportunities for improved efficiency and productivity through better transparency and information access. By leveraging these opportunities, we can achieve notable gains in productivity, cost savings, and broader economic benefits. Our approach introduces an integrated software ecosystem utilizing Data Mesh and Service Mesh architectures. This system includes the largest training dataset for infrastructure and procurement, encompassing over 100 billion tokens, scientific publications, activities, and risk data, all structured by a systematic AI framework. Supported by a Knowledge Graph linked to domain-specific multi-agent tasks and Q&A capabilities, our platform standardizes and ingests diverse data sources, transforming them into structured knowledge. Leveraging large language models (LLMs) and automation, our system revolutionizes data structuring and knowledge creation, aiding decision-making in early-stage project planning, detailed research, market trend analysis, and qualitative assessments. Its web-scalable architecture delivers domain-curated information, enabling AI agents to facilitate reasoning and manage uncertainties, while preparing for future expansions with specialized agents targeting particular challenges. This integration of AI with domain expertise not only boosts efficiency and decision-making in construction and infrastructure but also establishes a framework for enhancing government efficiency and accelerating the transition of traditional industries to digital workflows. This work is poised to significantly influence AI-driven initiatives in this sector and guide best practices in AI Operations.

Via

Access Paper or Ask Questions

Reliability, Resilience and Human Factors Engineering for Trustworthy AI Systems

Nov 13, 2024

Saurabh Mishra, Anand Rao, Ramayya Krishnan, Bilal Ayyub, Amin Aria, Enrico Zio

Abstract:As AI systems become integral to critical operations across industries and services, ensuring their reliability and safety is essential. We offer a framework that integrates established reliability and resilience engineering principles into AI systems. By applying traditional metrics such as failure rate and Mean Time Between Failures (MTBF) along with resilience engineering and human reliability analysis, we propose an integrate framework to manage AI system performance, and prevent or efficiently recover from failures. Our work adapts classical engineering methods to AI systems and outlines a research agenda for future technical studies. We apply our framework to a real-world AI system, using system status data from platforms such as openAI, to demonstrate its practical applicability. This framework aligns with emerging global standards and regulatory frameworks, providing a methodology to enhance the trustworthiness of AI systems. Our aim is to guide policy, regulation, and the development of reliable, safe, and adaptable AI technologies capable of consistent performance in real-world environments.

Via

Access Paper or Ask Questions

Gemma 2: Improving Open Language Models at a Practical Size

Aug 02, 2024

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé(+187 more)

Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size

Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size

Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size

Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size

Abstract:In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.

Via

Access Paper or Ask Questions

Intelligent Systematic Investment Agent: an ensemble of deep learning and evolutionary strategies

Mar 24, 2022

Prasang Gupta, Shaz Hoda, Anand Rao

Figure 1 for Intelligent Systematic Investment Agent: an ensemble of deep learning and evolutionary strategies

Figure 2 for Intelligent Systematic Investment Agent: an ensemble of deep learning and evolutionary strategies

Figure 3 for Intelligent Systematic Investment Agent: an ensemble of deep learning and evolutionary strategies

Figure 4 for Intelligent Systematic Investment Agent: an ensemble of deep learning and evolutionary strategies

Abstract:Machine learning driven trading strategies have garnered a lot of interest over the past few years. There is, however, limited consensus on the ideal approach for the development of such trading strategies. Further, most literature has focused on trading strategies for short-term trading, with little or no focus on strategies that attempt to build long-term wealth. Our paper proposes a new approach for developing long-term investment strategies using an ensemble of evolutionary algorithms and a deep learning model by taking a series of short-term purchase decisions. Our methodology focuses on building long-term wealth by improving systematic investment planning (SIP) decisions on Exchange Traded Funds (ETF) over a period of time. We provide empirical evidence of superior performance (around 1% higher returns) using our ensemble approach as compared to the traditional daily systematic investment practice on a given ETF. Our results are based on live trading decisions made by our algorithm and executed on the Robinhood trading platform.

* 19 pages, 10 figures

Via

Access Paper or Ask Questions

Independent Ethical Assessment of Text Classification Models: A Hate Speech Detection Case Study

Jul 19, 2021

Amitoj Singh, Jingshu Chen, Lihao Zhang, Amin Rasekh, Ilana Golbin, Anand Rao

Figure 1 for Independent Ethical Assessment of Text Classification Models: A Hate Speech Detection Case Study

Figure 2 for Independent Ethical Assessment of Text Classification Models: A Hate Speech Detection Case Study

Figure 3 for Independent Ethical Assessment of Text Classification Models: A Hate Speech Detection Case Study

Figure 4 for Independent Ethical Assessment of Text Classification Models: A Hate Speech Detection Case Study

Abstract:An independent ethical assessment of an artificial intelligence system is an impartial examination of the system's development, deployment, and use in alignment with ethical values. System-level qualitative frameworks that describe high-level requirements and component-level quantitative metrics that measure individual ethical dimensions have been developed over the past few years. However, there exists a gap between the two, which hinders the execution of independent ethical assessments in practice. This study bridges this gap and designs a holistic independent ethical assessment process for a text classification model with a special focus on the task of hate speech detection. The assessment is further augmented with protected attributes mining and counterfactual-based analysis to enhance bias assessment. It covers assessments of technical performance, data bias, embedding bias, classification bias, and interpretability. The proposed process is demonstrated through an assessment of a deep hate speech detection model.

* 27th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021), August 14 - 18, 2021 - Singapore

Via

Access Paper or Ask Questions

Consumer Demand Modeling During COVID-19 Pandemic

May 03, 2021

Shaz Hoda, Amitoj Singh, Anand Rao, Remzi Ural, Nicholas Hodson

Figure 1 for Consumer Demand Modeling During COVID-19 Pandemic

Figure 2 for Consumer Demand Modeling During COVID-19 Pandemic

Figure 3 for Consumer Demand Modeling During COVID-19 Pandemic

Figure 4 for Consumer Demand Modeling During COVID-19 Pandemic

Abstract:The current pandemic has introduced substantial uncertainty to traditional methods for demand planning. These uncertainties stem from the disease progression, government interventions, economy and consumer behavior. While most of the emerging literature on the pandemic has focused on disease progression, a few have focused on consequent regulations and their impact on individual behavior. The contributions of this paper include a quantitative behavior model of fear of COVID-19, impact of government interventions on consumer behavior, and impact of consumer behavior on consumer choice and hence demand for goods. It brings together multiple models for disease progression, consumer behavior and demand estimation-thus bridging the gap between disease progression and consumer demand. We use panel regression to understand the drivers of demand during the pandemic and Bayesian inference to simplify the regulation landscape that can help build scenarios for resilient demand planning. We illustrate this resilient demand planning model using a specific example of gas retailing. We find that demand is sensitive to fear of COVID-19: as the number of COVID-19 cases increase over the previous week, the demand for gas decreases -- though this dissipates over time. Further, government regulations restrict access to different services, thereby reducing mobility, which in itself reduces demand.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions