Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kento Kodama

Black-box optimization and quantum annealing for filtering out mislabeled training instances

Jan 12, 2025

Makoto Otsuka, Kento Kodama, Keisuke Morita, Masayuki Ohzeki

Figure 1 for Black-box optimization and quantum annealing for filtering out mislabeled training instances

Figure 2 for Black-box optimization and quantum annealing for filtering out mislabeled training instances

Figure 3 for Black-box optimization and quantum annealing for filtering out mislabeled training instances

Figure 4 for Black-box optimization and quantum annealing for filtering out mislabeled training instances

Abstract:This study proposes an approach for removing mislabeled instances from contaminated training datasets by combining surrogate model-based black-box optimization (BBO) with postprocessing and quantum annealing. Mislabeled training instances, a common issue in real-world datasets, often degrade model generalization, necessitating robust and efficient noise-removal strategies. The proposed method evaluates filtered training subsets based on validation loss, iteratively refines loss estimates through surrogate model-based BBO with postprocessing, and leverages quantum annealing to efficiently sample diverse training subsets with low validation error. Experiments on a noisy majority bit task demonstrate the method's ability to prioritize the removal of high-risk mislabeled instances. Integrating D-Wave's clique sampler running on a physical quantum annealer achieves faster optimization and higher-quality training subsets compared to OpenJij's simulated quantum annealing sampler or Neal's simulated annealing sampler, offering a scalable framework for enhancing dataset quality. This work highlights the effectiveness of the proposed method for supervised learning tasks, with future directions including its application to unsupervised learning, real-world datasets, and large-scale implementations.

Via

Access Paper or Ask Questions