Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huong Ha

REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis

Apr 12, 2025

Duy-Cat Can, Quang-Huy Tang, Huong Ha, Binh T. Nguyen, Oliver Y. Chén

Abstract:Timely and accurate diagnosis of neurodegenerative disorders, such as Alzheimer's disease, is central to disease management. Existing deep learning models require large-scale annotated datasets and often function as "black boxes". Additionally, datasets in clinical practice are frequently small or unlabeled, restricting the full potential of deep learning methods. Here, we introduce REMEMBER -- Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning -- a new machine learning framework that facilitates zero- and few-shot Alzheimer's diagnosis using brain MRI scans through a reference-based reasoning process. Specifically, REMEMBER first trains a contrastively aligned vision-text model using expert-annotated reference data and extends pseudo-text modalities that encode abnormality types, diagnosis labels, and composite clinical descriptions. Then, at inference time, REMEMBER retrieves similar, human-validated cases from a curated dataset and integrates their contextual information through a dedicated evidence encoding module and attention-based inference head. Such an evidence-guided design enables REMEMBER to imitate real-world clinical decision-making process by grounding predictions in retrieved imaging and textual context. Specifically, REMEMBER outputs diagnostic predictions alongside an interpretable report, including reference images and explanations aligned with clinical workflows. Experimental results demonstrate that REMEMBER achieves robust zero- and few-shot performance and offers a powerful and explainable framework to neuroimaging-based diagnosis in the real world, especially under limited data.

Via

Access Paper or Ask Questions

VisTA: Vision-Text Alignment Model with Contrastive Learning using Multimodal Data for Evidence-Driven, Reliable, and Explainable Alzheimer's Disease Diagnosis

Feb 03, 2025

Duy-Cat Can, Linh D. Dang, Quang-Huy Tang, Dang Minh Ly, Huong Ha, Guillaume Blanc, Oliver Y. Chén, Binh T. Nguyen

Abstract:Objective: Assessing Alzheimer's disease (AD) using high-dimensional radiology images is clinically important but challenging. Although Artificial Intelligence (AI) has advanced AD diagnosis, it remains unclear how to design AI models embracing predictability and explainability. Here, we propose VisTA, a multimodal language-vision model assisted by contrastive learning, to optimize disease prediction and evidence-based, interpretable explanations for clinical decision-making. Methods: We developed VisTA (Vision-Text Alignment Model) for AD diagnosis. Architecturally, we built VisTA from BiomedCLIP and fine-tuned it using contrastive learning to align images with verified abnormalities and their descriptions. To train VisTA, we used a constructed reference dataset containing images, abnormality types, and descriptions verified by medical experts. VisTA produces four outputs: predicted abnormality type, similarity to reference cases, evidence-driven explanation, and final AD diagnoses. To illustrate VisTA's efficacy, we reported accuracy metrics for abnormality retrieval and dementia prediction. To demonstrate VisTA's explainability, we compared its explanations with human experts' explanations. Results: Compared to 15 million images used for baseline pretraining, VisTA only used 170 samples for fine-tuning and obtained significant improvement in abnormality retrieval and dementia prediction. For abnormality retrieval, VisTA reached 74% accuracy and an AUC of 0.87 (26% and 0.74, respectively, from baseline models). For dementia prediction, VisTA achieved 88% accuracy and an AUC of 0.82 (30% and 0.57, respectively, from baseline models). The generated explanations agreed strongly with human experts' and provided insights into the diagnostic process. Taken together, VisTA optimize prediction, clinical reasoning, and explanation.

Via

Access Paper or Ask Questions

BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings

Dec 17, 2024

Lam Ngo, Huong Ha, Jeffrey Chan, Hongyu Zhang

Figure 1 for BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings

Figure 2 for BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings

Figure 3 for BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings

Figure 4 for BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings

Abstract:When it comes to expensive black-box optimization problems, Bayesian Optimization (BO) is a well-known and powerful solution. Many real-world applications involve a large number of dimensions, hence scaling BO to high dimension is of much interest. However, state-of-the-art high-dimensional BO methods still suffer from the curse of dimensionality, highlighting the need for further improvements. In this work, we introduce BOIDS, a novel high-dimensional BO algorithm that guides optimization by a sequence of one-dimensional direction lines using a novel tailored line-based optimization procedure. To improve the efficiency, we also propose an adaptive selection technique to identify most optimal lines for each round of line-based optimization. Additionally, we incorporate a subspace embedding technique for better scaling to high-dimensional spaces. We further provide theoretical analysis of our proposed method to analyze its convergence property. Our extensive experimental results show that BOIDS outperforms state-of-the-art baselines on various synthetic and real-world benchmark problems.

* Published at AAAI Conference on Artificial Intelligence, 2025

Via

Access Paper or Ask Questions

High-dimensional Bayesian Optimization via Covariance Matrix Adaptation Strategy

Feb 05, 2024

Lam Ngo, Huong Ha, Jeffrey Chan, Vu Nguyen, Hongyu Zhang

Abstract:Bayesian Optimization (BO) is an effective method for finding the global optimum of expensive black-box functions. However, it is well known that applying BO to high-dimensional optimization problems is challenging. To address this issue, a promising solution is to use a local search strategy that partitions the search domain into local regions with high likelihood of containing the global optimum, and then use BO to optimize the objective function within these regions. In this paper, we propose a novel technique for defining the local regions using the Covariance Matrix Adaptation (CMA) strategy. Specifically, we use CMA to learn a search distribution that can estimate the probabilities of data points being the global optimum of the objective function. Based on this search distribution, we then define the local regions consisting of data points with high probabilities of being the global optimum. Our approach serves as a meta-algorithm as it can incorporate existing black-box BO optimizers, such as BO, TuRBO, and BAxUS, to find the global optimum of the objective function within our derived local regions. We evaluate our proposed method on various benchmark synthetic and real-world problems. The results demonstrate that our method outperforms existing state-of-the-art techniques.

* Transactions on Machine Learning Research 2024
* 31 pages, 17 figures

Via

Access Paper or Ask Questions

Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation

Jun 12, 2023

Huong Ha, Vu Nguyen, Hongyu Zhang, Anton van den Hengel

Abstract:Gaussian process (GP) based Bayesian optimization (BO) is a powerful method for optimizing black-box functions efficiently. The practical performance and theoretical guarantees associated with this approach depend on having the correct GP hyperparameter values, which are usually unknown in advance and need to be estimated from the observed data. However, in practice, these estimations could be incorrect due to biased data sampling strategies commonly used in BO. This can lead to degraded performance and break the sub-linear global convergence guarantee of BO. To address this issue, we propose a new BO method that can sub-linearly converge to the global optimum of the objective function even when the true GP hyperparameters are unknown in advance and need to be estimated from the observed data. Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process, and employs a novel training loss function for the GP hyperparameter estimation process that ensures unbiased estimation from the observed data. We further provide theoretical analysis of our proposed method. Finally, we demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.

* 23 pages, 5 figures

Via

Access Paper or Ask Questions

Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks

Dec 27, 2022

Huong Ha, Zongwen Fan, Hongyu Zhang

Figure 1 for Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks

Figure 2 for Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks

Figure 3 for Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks

Figure 4 for Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks

Abstract:Configurable software systems are employed in many important application domains. Understanding the performance of the systems under all configurations is critical to prevent potential performance issues caused by misconfiguration. However, as the number of configurations can be prohibitively large, it is not possible to measure the system performance under all configurations. Thus, a common approach is to build a prediction model from a limited measurement data to predict the performance of all configurations as scalar values. However, it has been pointed out that there are different sources of uncertainty coming from the data collection or the modeling process, which can make the scalar predictions not certainly accurate. To address this problem, we propose a Bayesian deep learning based method, namely BDLPerf, that can incorporate uncertainty into the prediction model. BDLPerf can provide both scalar predictions for configurations' performance and the corresponding confidence intervals of these scalar predictions. We also develop a novel uncertainty calibration technique to ensure the reliability of the confidence intervals generated by a Bayesian prediction model. Finally, we suggest an efficient hyperparameter tuning technique so as to train the prediction model within a reasonable amount of time whilst achieving high accuracy. Our experimental results on 10 real-world systems show that BDLPerf achieves higher accuracy than existing approaches, in both scalar performance prediction and confidence interval estimation.

Via

Access Paper or Ask Questions

An Efficient Framework for Monitoring Subgroup Performance of Machine Learning Systems

Dec 16, 2022

Huong Ha

Abstract:Monitoring machine learning systems post deployment is critical to ensure the reliability of the systems. Particularly importance is the problem of monitoring the performance of machine learning systems across all the data subgroups (subpopulations). In practice, this process could be prohibitively expensive as the number of data subgroups grows exponentially with the number of input features, and the process of labelling data to evaluate each subgroup's performance is costly. In this paper, we propose an efficient framework for monitoring subgroup performance of machine learning systems. Specifically, we aim to find the data subgroup with the worst performance using a limited number of labeled data. We mathematically formulate this problem as an optimization problem with an expensive black-box objective function, and then suggest to use Bayesian optimization to solve this problem. Our experimental results on various real-world datasets and machine learning systems show that our proposed framework can retrieve the worst-performing data subgroup effectively and efficiently.

* Accepted to the ML Safety Workshop at NeurIPS 2022

Via

Access Paper or Ask Questions

ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms

Apr 11, 2021

Huong Ha, Sunil Gupta, Santu Rana, Svetha Venkatesh

Figure 1 for ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms

Figure 2 for ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms

Abstract:Machine learning models are being used extensively in many important areas, but there is no guarantee a model will always perform well or as its developers intended. Understanding the correctness of a model is crucial to prevent potential failures that may have significant detrimental impact in critical application areas. In this paper, we propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN). We develop a novel data augmentation method helping to train the BNN to achieve high accuracy. We also devise a theoretic information based sampling strategy to sample data points so as to achieve accurate estimations for the metrics of interest. Finally, we conduct an extensive set of experiments to test various machine learning models for different types of metrics. Our experiments show that the metrics estimations by our method are significantly better than existing baselines.

* Accepted to the RobustML workshop at ICLR 2021

Via

Access Paper or Ask Questions

Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces

Feb 14, 2021

Xingchen Wan, Vu Nguyen, Huong Ha, Binxin Ru, Cong Lu, Michael A. Osborne

Figure 1 for Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces

Figure 2 for Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces

Figure 3 for Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces

Figure 4 for Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces

Abstract:High-dimensional black-box optimisation remains an important yet notoriously challenging problem. Despite the success of Bayesian optimisation methods on continuous domains, domains that are categorical, or that mix continuous and categorical variables, remain challenging. We propose a novel solution -- we combine local optimisation with a tailored kernel design, effectively handling high-dimensional categorical and mixed search spaces, whilst retaining sample efficiency. We further derive convergence guarantee for the proposed approach. Finally, we demonstrate empirically that our method outperforms the current baselines on a variety of synthetic and real-world tasks in terms of performance, computational costs, or both.

* 9 page, 6 figures (26 pages, 13 figures, 2 tables including references and appendices)

Via

Access Paper or Ask Questions

High Dimensional Level Set Estimation with Bayesian Neural Network

Dec 17, 2020

Huong Ha, Sunil Gupta, Santu Rana, Svetha Venkatesh

Figure 1 for High Dimensional Level Set Estimation with Bayesian Neural Network

Figure 2 for High Dimensional Level Set Estimation with Bayesian Neural Network

Figure 3 for High Dimensional Level Set Estimation with Bayesian Neural Network

Figure 4 for High Dimensional Level Set Estimation with Bayesian Neural Network

Abstract:Level Set Estimation (LSE) is an important problem with applications in various fields such as material design, biotechnology, machine operational testing, etc. Existing techniques suffer from the scalability issue, that is, these methods do not work well with high dimensional inputs. This paper proposes novel methods to solve the high dimensional LSE problems using Bayesian Neural Networks. In particular, we consider two types of LSE problems: (1) \textit{explicit} LSE problem where the threshold level is a fixed user-specified value, and, (2) \textit{implicit} LSE problem where the threshold level is defined as a percentage of the (unknown) maximum of the objective function. For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points so as to maximally increase the level set accuracy. Furthermore, we also analyse the theoretical time complexity of our proposed acquisition functions, and suggest a practical methodology to efficiently tune the network hyper-parameters to achieve high model accuracy. Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.

* Accepted at AAAI'2021

Via

Access Paper or Ask Questions