Abstract:Large Language Models (LLMs) have shown exceptional performance in text processing. Notably, LLMs can synthesize information from large datasets and explain their decisions similarly to human reasoning through a chain of thought (CoT). An emerging application of LLMs is the handling and interpreting of numerical data, where fine-tuning enhances their performance over basic inference methods. This paper proposes a novel approach to training LLMs using knowledge transfer from a random forest (RF) ensemble, leveraging its efficiency and accuracy. By converting RF decision paths into natural language statements, we generate outputs for LLM fine-tuning, enhancing the model's ability to classify and explain its decisions. Our method includes verifying these rules through established classification metrics, ensuring their correctness. We also examine the impact of preprocessing techniques on the representation of numerical data and their influence on classification accuracy and rule correctness
Abstract:This paper focuses on the problem of estimating historical traffic volumes between sparsely-located traffic sensors, which transportation agencies need to accurately compute statewide performance measures. To this end, the paper examines applications of vehicle probe data, automatic traffic recorder counts, and neural network models to estimate hourly volumes in the Maryland highway network, and proposes a novel approach that combines neural networks with an existing profiling method. On average, the proposed approach yields 24% more accurate estimates than volume profiles, which are currently used by transportation agencies across the US to compute statewide performance measures. The paper also quantifies the value of using vehicle probe data in estimating hourly traffic volumes, which provides important managerial insights to transportation agencies interested in acquiring this type of data. For example, results show that volumes can be estimated with a mean absolute percent error of about 21% at locations where average number of observed probes is between 30 and 47 vehicles/hr, which provides a useful guideline for assessing the value of probe vehicle data from different vendors.
Abstract:Transportation agencies have an opportunity to leverage increasingly-available trajectory datasets to improve their analyses and decision-making processes. However, this data is typically purchased from vendors, which means agencies must understand its potential benefits beforehand in order to properly assess its value relative to the cost of acquisition. While the literature concerned with trajectory data is rich, it is naturally fragmented and focused on technical contributions in niche areas, which makes it difficult for government agencies to assess its value across different transportation domains. To overcome this issue, the current paper explores trajectory data from the perspective of a road transportation agency interested in acquiring trajectories to enhance its analyses. The paper provides a literature review illustrating applications of trajectory data in six areas of road transportation systems analysis: demand estimation, modeling human behavior, designing public transit, traffic performance measurement and prediction, environment and safety. In addition, it visually explores 20 million GPS traces in Maryland, illustrating existing and suggesting new applications of trajectory data.