Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minh Do

SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam

Apr 21, 2025

Tue Vo, Lakshay Sharma, Tuan Dinh, Khuong Dinh, Trang Nguyen, Trung Phan, Minh Do, Duong Vu

Abstract:Understanding and monitoring aquatic biodiversity is critical for ecological health and conservation efforts. This paper proposes SuoiAI, an end-to-end pipeline for building a dataset of aquatic invertebrates in Vietnam and employing machine learning (ML) techniques for species classification. We outline the methods for data collection, annotation, and model training, focusing on reducing annotation effort through semi-supervised learning and leveraging state-of-the-art object detection and classification models. Our approach aims to overcome challenges such as data scarcity, fine-grained classification, and deployment in diverse environmental conditions.

* Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2025

Via

Access Paper or Ask Questions

Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data

Mar 17, 2025

Haozhe Si, Yuxuan Wan, Minh Do, Deepak Vasisht, Han Zhao, Hendrik F. Hamann

Abstract:Geospatial raster (imagery) data, such as that collected by satellite-based imaging systems at different times and spectral bands, hold immense potential for enabling a wide range of high-impact applications. This potential stems from the rich information that is spatially and temporally contextualized across multiple channels and sensing modalities. Recent work has adapted existing self-supervised learning approaches for such geospatial data. However, they fall short of scalable model architectures, leading to inflexibility and computational inefficiencies when faced with an increasing number of channels and modalities. To address these limitations, we introduce Low-rank Efficient Spatial-Spectral Vision Transformer (LESS ViT) with three key innovations: i) the LESS Attention Block that approximates high-dimensional spatial-spectral attention through Kronecker's product of the low-dimensional spatial and spectral attention components; ii) the Continuous Positional-Channel Embedding Layer that preserves both spatial and spectral continuity and physical characteristics of each patch; and iii) the Perception Field Mask that exploits local spatial dependencies by constraining attention to neighboring patches. To evaluate the proposed innovations, we construct a benchmark, GFM-Bench, which serves as a comprehensive benchmark for such geospatial raster data. We pretrain LESS ViT using a Hyperspectral Masked Autoencoder framework with integrated positional and channel masking strategies. Experimental results demonstrate that our proposed method surpasses current state-of-the-art multi-modal geospatial foundation models, achieving superior performance with less computation and fewer parameters. The flexibility and extensibility of our framework make it a promising direction for future geospatial data analysis tasks that involve a wide range of modalities and channels.

Via

Access Paper or Ask Questions

Transforming the Hybrid Cloud for Emerging AI Workloads

Nov 20, 2024

Deming Chen, Alaa Youssef, Ruchi Pendse, André Schleife, Bryan K. Clark, Hendrik Hamann, Jingrui He, Teodoro Laino, Lav Varshney, Yuxiong Wang(+34 more)

Abstract:This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co-design approaches, emphasizing usability, manageability, affordability, adaptability, efficiency, and scalability. By integrating cutting-edge technologies such as generative and agentic AI, cross-layer automation and optimization, unified control plane, and composable and adaptive system architecture, the proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. Incorporating quantum computing as it matures will enable quantum-accelerated simulations for materials science, climate modeling, and other high-impact domains. Collaborative efforts between academia and industry are central to this vision, driving advancements in foundation models for material design and climate solutions, scalable multimodal data processing, and enhanced physics-based AI emulators for applications like weather forecasting and carbon sequestration. Research priorities include advancing AI agentic systems, LLM as an Abstraction (LLMaaA), AI model optimization and unified abstractions across heterogeneous infrastructure, end-to-end edge-cloud transformation, efficient programming model, middleware and platform, secure infrastructure, application-adaptive cloud systems, and new quantum-classical collaborative workflows. These ideas and solutions encompass both theoretical and practical research questions, requiring coordinated input and support from the research community. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms, fostering breakthroughs in AI-driven applications and scientific discovery across academia, industry, and society.

* 70 pages, 27 figures

Via

Access Paper or Ask Questions

Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition

Jul 23, 2024

Abhi Kamboj, Anh Duy Nguyen, Minh Do

Abstract:Despite living in a multi-sensory world, most AI models are limited to textual and visual interpretations of human motion and behavior. Inertial measurement units (IMUs) provide a salient signal to understand human motion; however, they are challenging to use due to their uninterpretability and scarcity of their data. We investigate a method to transfer knowledge between visual and inertial modalities using the structure of an informative joint representation space designed for human action recognition (HAR). We apply the resulting Fusion and Cross-modal Transfer (FACT) method to a novel setup, where the model does not have access to labeled IMU data during training and is able to perform HAR with only IMU data during testing. Extensive experiments on a wide range of RGB-IMU datasets demonstrate that FACT significantly outperforms existing methods in zero-shot cross-modal transfer.

Via

Access Paper or Ask Questions

A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition

Mar 17, 2024

Abhi Kamboj, Minh Do

Figure 1 for A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition

Figure 2 for A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition

Figure 3 for A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition

Abstract:Despite living in a multi-sensory world, most AI models are limited to textual and visual understanding of human motion and behavior. In fact, full situational awareness of human motion could best be understood through a combination of sensors. In this survey we investigate how knowledge can be transferred and utilized amongst modalities for Human Activity/Action Recognition (HAR), i.e. cross-modality transfer learning. We motivate the importance and potential of IMU data and its applicability in cross-modality learning as well as the importance of studying the HAR problem. We categorize HAR related tasks by time and abstractness and then compare various types of multimodal HAR datasets. We also distinguish and expound on many related but inconsistently used terms in the literature, such as transfer learning, domain adaptation, representation learning, sensor fusion, and multimodal learning, and describe how cross-modal learning fits with all these concepts. We then review the literature in IMU-based cross-modal transfer for HAR. The two main approaches for cross-modal transfer are instance-based transfer, where instances of one modality are mapped to another (e.g. knowledge is transferred in the input space), or feature-based transfer, where the model relates the modalities in an intermediate latent space (e.g. knowledge is transferred in the feature space). Finally, we discuss future research directions and applications in cross-modal HAR.

Via

Access Paper or Ask Questions

Multi-stream Fusion for Class Incremental Learning in Pill Image Classification

Oct 05, 2022

Trong-Tung Nguyen, Hieu H. Pham, Phi Le Nguyen, Thanh Hung Nguyen, Minh Do

Figure 1 for Multi-stream Fusion for Class Incremental Learning in Pill Image Classification

Figure 2 for Multi-stream Fusion for Class Incremental Learning in Pill Image Classification

Figure 3 for Multi-stream Fusion for Class Incremental Learning in Pill Image Classification

Figure 4 for Multi-stream Fusion for Class Incremental Learning in Pill Image Classification

Abstract:Classifying pill categories from real-world images is crucial for various smart healthcare applications. Although existing approaches in image classification might achieve a good performance on fixed pill categories, they fail to handle novel instances of pill categories that are frequently presented to the learning algorithm. To this end, a trivial solution is to train the model with novel classes. However, this may result in a phenomenon known as catastrophic forgetting, in which the system forgets what it learned in previous classes. In this paper, we address this challenge by introducing the class incremental learning (CIL) ability to traditional pill image classification systems. Specifically, we propose a novel incremental multi-stream intermediate fusion framework enabling incorporation of an additional guidance information stream that best matches the domain of the problem into various state-of-the-art CIL methods. From this framework, we consider color-specific information of pill images as a guidance stream and devise an approach, namely "Color Guidance with Multi-stream intermediate fusion"(CG-IMIF) for solving CIL pill image classification task. We conduct comprehensive experiments on real-world incremental pill image classification dataset, namely VAIPE-PCIL, and find that the CG-IMIF consistently outperforms several state-of-the-art methods by a large margin in different task settings. Our code, data, and trained model are available at https://github.com/vinuni-vishc/CG-IMIF.

* Accepted for publication in the Asian Conference on Computer Vision (ACCV 2022)

Via

Access Paper or Ask Questions

Planning for Compilation of a Quantum Algorithm for Graph Coloring

Feb 23, 2020

Minh Do, Zhihui Wang, Bryan O'Gorman, Davide Venturelli, Eleanor Rieffel, Jeremy Frank

Figure 1 for Planning for Compilation of a Quantum Algorithm for Graph Coloring

Figure 2 for Planning for Compilation of a Quantum Algorithm for Graph Coloring

Figure 3 for Planning for Compilation of a Quantum Algorithm for Graph Coloring

Figure 4 for Planning for Compilation of a Quantum Algorithm for Graph Coloring

Abstract:The problem of compiling general quantum algorithms for implementation on near-term quantum processors has been introduced to the AI community. Previous work demonstrated that temporal planning is an attractive approach for part of this compilationtask, specifically, the routing of circuits that implement the Quantum Alternating Operator Ansatz (QAOA) applied to the MaxCut problem on a quantum processor architecture. In this paper, we extend the earlier work to route circuits that implement QAOA for Graph Coloring problems. QAOA for coloring requires execution of more, and more complex, operations on the chip, which makes routing a more challenging problem. We evaluate the approach on state-of-the-art hardware architectures from leading quantum computing companies. Additionally, we apply a planning approach to qubit initialization. Our empirical evaluation shows that temporal planning compares well to reasonable analytic upper bounds, and that solving qubit initialization with a classical planner generally helps temporal planners in finding shorter-makespan compilations for QAOA for Graph Coloring. These advances suggest that temporal planning can be an effective approach for more complex quantum computing algorithms and architectures.

* The 24th European Conference on Artificial Intelligence (ECAI 2020)
* 8 pages, 4 tables, 5 figures

Via

Access Paper or Ask Questions

Beyond Domain Adaptation: Unseen Domain Encapsulation via Universal Non-volume Preserving Models

Dec 09, 2018

Thanh-Dat Truong, Chi Nhan Duong, Khoa Luu, Minh-Triet Tran, Minh Do

Figure 1 for Beyond Domain Adaptation: Unseen Domain Encapsulation via Universal Non-volume Preserving Models

Figure 2 for Beyond Domain Adaptation: Unseen Domain Encapsulation via Universal Non-volume Preserving Models

Figure 3 for Beyond Domain Adaptation: Unseen Domain Encapsulation via Universal Non-volume Preserving Models

Figure 4 for Beyond Domain Adaptation: Unseen Domain Encapsulation via Universal Non-volume Preserving Models

Abstract:Recognition across domains has recently become an active topic in the research community. However, it has been largely overlooked in the problem of recognition in new unseen domains. Under this condition, the delivered deep network models are unable to be updated, adapted or fine-tuned. Therefore, recent deep learning techniques, such as: domain adaptation, feature transferring, and fine-tuning, cannot be applied. This paper presents a novel Universal Non-volume Preserving approach to the problem of domain generalization in the context of deep learning. The proposed method can be easily incorporated with any other ConvNet framework within an end-to-end deep network design to improve the performance. On digit recognition, we benchmark on four popular digit recognition databases, i.e. MNIST, USPS, SVHN and MNIST-M. The proposed method is also experimented on face recognition on Extended Yale-B, CMU-PIE and CMU-MPIE databases and compared against other the state-of-the-art methods. In the problem of pedestrian detection, we empirically observe that the proposed method learns models that improve performance across a priori unknown data distributions.

Via

Access Paper or Ask Questions

Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation

Mar 19, 2018

Kyle E. C. Booth, Minh Do, J. Christopher Beck, Eleanor Rieffel, Davide Venturelli, Jeremy Frank

Figure 1 for Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation

Figure 2 for Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation

Figure 3 for Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation

Figure 4 for Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation

Abstract:Recently, the makespan-minimization problem of compiling a general class of quantum algorithms into near-term quantum processors has been introduced to the AI community. The research demonstrated that temporal planning is a strong approach for a class of quantum circuit compilation (QCC) problems. In this paper, we explore the use of constraint programming (CP) as an alternative and complementary approach to temporal planning. We extend previous work by introducing two new problem variations that incorporate important characteristics identified by the quantum computing community. We apply temporal planning and CP to the baseline and extended QCC problems as both stand-alone and hybrid approaches. Our hybrid methods use solutions found by temporal planning to warm start CP, leveraging the ability of the former to find satisficing solutions to problems with a high degree of task optionality, an area that CP typically struggles with. The CP model, benefiting from inferred bounds on planning horizon length and task counts provided by the warm start, is then used to find higher quality solutions. Our empirical evaluation indicates that while stand-alone CP is only competitive for the smallest problems, CP in our hybridization with temporal planning out-performs stand-alone temporal planning in the majority of problem classes.

* 9 pages, 2 figures, Proceedings of the 28th International Conference of Automated Planning and Scheduling 2018 (ICAPS-18)

Via

Access Paper or Ask Questions

Compiling quantum circuits to realistic hardware architectures using temporal planners

Dec 21, 2017

Davide Venturelli, Minh Do, Eleanor Rieffel, Jeremy Frank

Figure 1 for Compiling quantum circuits to realistic hardware architectures using temporal planners

Figure 2 for Compiling quantum circuits to realistic hardware architectures using temporal planners

Figure 3 for Compiling quantum circuits to realistic hardware architectures using temporal planners

Figure 4 for Compiling quantum circuits to realistic hardware architectures using temporal planners

Abstract:To run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus on compiling to superconducting hardware architectures with nearest neighbor constraints. Our initial experiments focus on compiling Quantum Alternating Operator Ansatz (QAOA) circuits whose high number of commuting gates allow great flexibility in the order in which the gates can be applied. That freedom makes it more challenging to find optimal compilations but also means there is a greater potential win from more optimized compilation than for less flexible circuits. We map this quantum circuit compilation problem to a temporal planning problem, and generated a test suite of compilation problems for QAOA circuits of various sizes to a realistic hardware architecture. We report compilation results from several state-of-the-art temporal planners on this test set. This early empirical evaluation demonstrates that temporal planning is a viable approach to quantum circuit compilation.

* 2017 Quantum Sci. Technol. - also related to proceedings of IJCAI 2017, and ICAPS SPARK Workshop 2017
* updated manuscript, more planners and results

Via

Access Paper or Ask Questions