Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Improved Decision Tree

An Extended Horizon Tactical Decision-Making for Automated Driving Based on Monte Carlo Tree Search

Apr 22, 2025

Karim Essalmi, Fernando Garrido, Fawzi Nashashibi

Abstract:This paper introduces COR-MCTS (Conservation of Resources - Monte Carlo Tree Search), a novel tactical decision-making approach for automated driving focusing on maneuver planning over extended horizons. Traditional decision-making algorithms are often constrained by fixed planning horizons, typically up to 6 seconds for classical approaches and 3 seconds for learning-based methods limiting their adaptability in particular dynamic driving scenarios. However, planning must be done well in advance in environments such as highways, roundabouts, and exits to ensure safe and efficient maneuvers. To address this challenge, we propose a hybrid method integrating Monte Carlo Tree Search (MCTS) with our prior utility-based framework, COR-MP (Conservation of Resources Model for Maneuver Planning). This combination enables long-term, real-time decision-making, significantly enhancing the ability to plan a sequence of maneuvers over extended horizons. Through simulations across diverse driving scenarios, we demonstrate that COR-MCTS effectively improves planning robustness and decision efficiency over extended horizons.

* 6 pages, 5 figures, submitted and accepted to the IEEE Intelligent Vehicles Symposium Conference (IV 2025)

Via

Access Paper or Ask Questions

Dynamic Regularized CBDT: Variance-Calibrated Causal Boosting for Interpretable Heterogeneous Treatment Effects

Apr 18, 2025

Yichen Liu

Abstract:Heterogeneous treatment effect estimation in high-stakes applications demands models that simultaneously optimize precision, interpretability, and calibration. Many existing tree-based causal inference techniques, however, exhibit high estimation errors when applied to observational data because they struggle to capture complex interactions among factors and rely on static regularization schemes. In this work, we propose Dynamic Regularized Causal Boosted Decision Trees (CBDT), a novel framework that integrates variance regularization and average treatment effect calibration into the loss function of gradient boosted decision trees. Our approach dynamically updates the regularization parameters using gradient statistics to better balance the bias-variance tradeoff. Extensive experiments on standard benchmark datasets and real-world clinical data demonstrate that the proposed method significantly improves estimation accuracy while maintaining reliable coverage of true treatment effects. In an intensive care unit patient triage study, the method successfully identified clinically actionable rules and achieved high accuracy in treatment effect estimation. The results validate that dynamic regularization can effectively tighten error bounds and enhance both predictive performance and model interpretability.

* Preprint version. 13 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework

Apr 16, 2025

Jack Preuveneers, Joseph Ternasky, Fuat Alican, Yigit Ihlamur

Abstract:We present a novel framework that bridges the gap between the interpretability of decision trees and the advanced reasoning capabilities of large language models (LLMs) to predict startup success. Our approach leverages chain-of-thought prompting to generate detailed reasoning logs, which are subsequently distilled into structured, human-understandable logical rules. The pipeline integrates multiple enhancements - efficient data ingestion, a two-step refinement process, ensemble candidate sampling, simulated reinforcement learning scoring, and persistent memory - to ensure both stable decision-making and transparent output. Experimental evaluations on curated startup datasets demonstrate that our combined pipeline improves precision by 54% from 0.225 to 0.346 and accuracy by 50% from 0.46 to 0.70 compared to a standalone OpenAI o3 model. Notably, our model achieves over 2x the precision of a random classifier (16%). By combining state-of-the-art AI reasoning with explicit rule-based explanations, our method not only augments traditional decision-making processes but also facilitates expert intervention and continuous policy refinement. This work lays the foundation for the implementation of interpretable LLM-powered decision frameworks in high-stakes investment environments and other domains that require transparent and data-driven insights.

Via

Access Paper or Ask Questions

A Fully Automated Pipeline for Conversational Discourse Annotation: Tree Scheme Generation and Labeling with Large Language Models

Apr 11, 2025

Kseniia Petukhova, Ekaterina Kochmar

Abstract:Recent advances in Large Language Models (LLMs) have shown promise in automating discourse annotation for conversations. While manually designing tree annotation schemes significantly improves annotation quality for humans and models, their creation remains time-consuming and requires expert knowledge. We propose a fully automated pipeline that uses LLMs to construct such schemes and perform annotation. We evaluate our approach on speech functions (SFs) and the Switchboard-DAMSL (SWBD-DAMSL) taxonomies. Our experiments compare various design choices, and we show that frequency-guided decision trees, paired with an advanced LLM for annotation, can outperform previously manually designed trees and even match or surpass human annotators while significantly reducing the time required for annotation. We release all code and resultant schemes and annotations to facilitate future research on discourse annotation.

Via

Access Paper or Ask Questions

RO-FIGS: Efficient and Expressive Tree-Based Ensembles for Tabular Data

Apr 09, 2025

Urška Matjašec, Nikola Simidjievski, Mateja Jamnik

Abstract:Tree-based models are often robust to uninformative features and can accurately capture non-smooth, complex decision boundaries. Consequently, they often outperform neural network-based models on tabular datasets at a significantly lower computational cost. Nevertheless, the capability of traditional tree-based ensembles to express complex relationships efficiently is limited by using a single feature to make splits. To improve the efficiency and expressiveness of tree-based methods, we propose Random Oblique Fast Interpretable Greedy-Tree Sums (RO-FIGS). RO-FIGS builds on Fast Interpretable Greedy-Tree Sums, and extends it by learning trees with oblique or multivariate splits, where each split consists of a linear combination learnt from random subsets of features. This helps uncover interactions between features and improves performance. The proposed method is suitable for tabular datasets with both numerical and categorical features. We evaluate RO-FIGS on 22 real-world tabular datasets, demonstrating superior performance and much smaller models over other tree- and neural network-based methods. Additionally, we analyse their splits to reveal valuable insights into feature interactions, enriching the information learnt from SHAP summary plots, and thereby demonstrating the enhanced interpretability of RO-FIGS models. The proposed method is well-suited for applications, where balance between accuracy and interpretability is essential.

Via

Access Paper or Ask Questions

A Customized SAT-based Solver for Graph Coloring

Apr 07, 2025

Timo Brand, Daniel Faber, Stephan Held, Petra Mutzel

Abstract:We introduce ZykovColor, a novel SAT-based algorithm to solve the graph coloring problem working on top of an encoding that mimics the Zykov tree. Our method is based on an approach of H\'ebrard and Katsirelos (2020) that employs a propagator to enforce transitivity constraints, incorporate lower bounds for search tree pruning, and enable inferred propagations. We leverage the recently introduced IPASIR-UP interface for CaDiCal to implement these techniques with a SAT solver. Furthermore, we propose new features that take advantage of the underlying SAT solver. These include modifying the integrated decision strategy with vertex domination hints and using incremental bottom-up search that allows to reuse learned clauses from previous calls. Additionally, we integrate a more efficient clique computation to improve the lower bounds during the search. We validate the effectiveness of each new feature through an experimental analysis. ZykovColor outperforms other state-of-the-art graph coloring implementations on the DIMACS benchmark set. Further experiments on random Erd\H{o}s-R\'enyi graphs show that our new approach dominates state-of-the-art SAT-based methods for both very sparse and highly dense graphs.

* 5 figures, 2 tables, source code published at https://github.com/trewes/ZykovColor

Via

Access Paper or Ask Questions

Predicting Survivability of Cancer Patients with Metastatic Patterns Using Explainable AI

Apr 07, 2025

Polycarp Nalela, Deepthi Rao, Praveen Rao

Abstract:Cancer remains a leading global health challenge and a major cause of mortality. This study leverages machine learning (ML) to predict the survivability of cancer patients with metastatic patterns using the comprehensive MSK-MET dataset, which includes genomic and clinical data from 25,775 patients across 27 cancer types. We evaluated five ML models-XGBoost, Na\"ive Bayes, Decision Tree, Logistic Regression, and Random Fores using hyperparameter tuning and grid search. XGBoost emerged as the best performer with an area under the curve (AUC) of 0.82. To enhance model interpretability, SHapley Additive exPlanations (SHAP) were applied, revealing key predictors such as metastatic site count, tumor mutation burden, fraction of genome altered, and organ-specific metastases. Further survival analysis using Kaplan-Meier curves, Cox Proportional Hazards models, and XGBoost Survival Analysis identified significant predictors of patient outcomes, offering actionable insights for clinicians. These findings could aid in personalized prognosis and treatment planning, ultimately improving patient care.

Via

Access Paper or Ask Questions

Learning and Improving Backgammon Strategy

Apr 03, 2025

Gregory R. Galperin

Abstract:A novel approach to learning is presented, combining features of on-line and off-line methods to achieve considerable performance in the task of learning a backgammon value function in a process that exploits the processing power of parallel supercomputers. The off-line methods comprise a set of techniques for parallelizing neural network training and $TD(\lambda)$ reinforcement learning; here Monte-Carlo ``Rollouts'' are introduced as a massively parallel on-line policy improvement technique which applies resources to the decision points encountered during the search of the game tree to further augment the learned value function estimate. A level of play roughly as good as, or possibly better than, the current champion human and computer backgammon players has been achieved in a short period of learning.

* Proceedings of the CBCL Learning Day (1994)
* Accompanied by oral presentation by Gregory Galperin at the CBCL Learning Day 1994

Via

Access Paper or Ask Questions

Early detection of diabetes through transfer learning-based eye (vision) screening and improvement of machine learning model performance and advanced parameter setting algorithms

Apr 04, 2025

Mohammad Reza Yousefi, Ali Bakrani, Amin Dehghani

Abstract:Diabetic Retinopathy (DR) is a serious and common complication of diabetes, caused by prolonged high blood sugar levels that damage the small retinal blood vessels. If left untreated, DR can progress to retinal vein occlusion and stimulate abnormal blood vessel growth, significantly increasing the risk of blindness. Traditional diabetes diagnosis methods often utilize convolutional neural networks (CNNs) to extract visual features from retinal images, followed by classification algorithms such as decision trees and k-nearest neighbors (KNN) for disease detection. However, these approaches face several challenges, including low accuracy and sensitivity, lengthy machine learning (ML) model training due to high data complexity and volume, and the use of limited datasets for testing and evaluation. This study investigates the application of transfer learning (TL) to enhance ML model performance in DR detection. Key improvements include dimensionality reduction, optimized learning rate adjustments, and advanced parameter tuning algorithms, aimed at increasing efficiency and diagnostic accuracy. The proposed model achieved an overall accuracy of 84% on the testing dataset, outperforming prior studies. The highest class-specific accuracy reached 89%, with a maximum sensitivity of 97% and an F1-score of 92%, demonstrating strong performance in identifying DR cases. These findings suggest that TL-based DR screening is a promising approach for early diagnosis, enabling timely interventions to prevent vision loss and improve patient outcomes.

* 25 pages,12 Figures, 1 Table

Via

Access Paper or Ask Questions

Using ensemble methods of machine learning to predict real estate prices

Apr 05, 2025

Oleh Pastukh, Viktor Khomyshyn

Abstract:In recent years, machine learning (ML) techniques have become a powerful tool for improving the accuracy of predictions and decision-making. Machine learning technologies have begun to penetrate all areas, including the real estate sector. Correct forecasting of real estate value plays an important role in the buyer-seller chain, because it ensures reasonableness of price expectations based on the offers available in the market and helps to avoid financial risks for both parties of the transaction. Accurate forecasting is also important for real estate investors to make an informed decision on a specific property. This study helps to gain a deeper understanding of how effective and accurate ensemble machine learning methods are in predicting real estate values. The results obtained in the work are quite accurate, as can be seen from the coefficient of determination (R^2), root mean square error (RMSE) and mean absolute error (MAE) calculated for each model. The Gradient Boosting Regressor model provides the highest accuracy, the Extra Trees Regressor, Hist Gradient Boosting Regressor and Random Forest Regressor models give good results. In general, ensemble machine learning techniques can be effectively used to solve real estate valuation. This work forms ideas for future research, which consist in the preliminary processing of the data set by searching and extracting anomalous values, as well as the practical implementation of the obtained results.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

Topic:Improved Decision Tree

Papers and Code