Abstract:We present a foundation model (FM) for lossy scientific data compression, combining a variational autoencoder (VAE) with a hyper-prior structure and a super-resolution (SR) module. The VAE framework uses hyper-priors to model latent space dependencies, enhancing compression efficiency. The SR module refines low-resolution representations into high-resolution outputs, improving reconstruction quality. By alternating between 2D and 3D convolutions, the model efficiently captures spatiotemporal correlations in scientific data while maintaining low computational cost. Experimental results demonstrate that the FM generalizes well to unseen domains and varying data shapes, achieving up to 4 times higher compression ratios than state-of-the-art methods after domain-specific fine-tuning. The SR module improves compression ratio by 30 percent compared to simple upsampling techniques. This approach significantly reduces storage and transmission costs for large-scale scientific simulations while preserving data integrity and fidelity.
Abstract:Effective congestion management along signalized corridors is essential for improving productivity and reducing costs, with arterial travel time serving as a key performance metric. Traditional approaches, such as Coordinated Signal Timing and Adaptive Traffic Control Systems, often lack scalability and generalizability across diverse urban layouts. We propose Fusion-based Dynamic Graph Neural Networks (FDGNN), a structured framework for simultaneous modeling of travel time distributions in both directions along arterial corridors. FDGNN utilizes attentional graph convolution on dynamic, bidirectional graphs and integrates fusion techniques to capture evolving spatiotemporal traffic dynamics. The framework is trained on extensive hours of simulation data and utilizes GPU computation to ensure scalability. The results demonstrate that our framework can efficiently and accurately model travel time as a normal distribution on arterial roads leveraging a unique dynamic graph representation of corridor traffic states. This representation integrates sequential traffic signal timing plans, local driving behaviors, temporal turning movement counts, and ingress traffic volumes, even when aggregated over intervals as short as a single cycle length. The results demonstrate resilience to effective traffic variations, including cycle lengths, green time percentages, traffic density, and counterfactual routes. Results further confirm its stability under varying conditions at different intersections. This framework supports dynamic signal timing, enhances congestion management, and improves travel time reliability in real-world applications.
Abstract:Scientific applications in fields such as high energy physics, computational fluid dynamics, and climate science generate vast amounts of data at high velocities. This exponential growth in data production is surpassing the advancements in computing power, network capabilities, and storage capacities. To address this challenge, data compression or reduction techniques are crucial. These scientific datasets have underlying data structures that consist of structured and block structured multidimensional meshes where each grid point corresponds to a tensor. It is important that data reduction techniques leverage strong spatial and temporal correlations that are ubiquitous in these applications. Additionally, applications such as CFD, process tensors comprising hundred plus species and their attributes at each grid point. Reduction techniques should be able to leverage interrelationships between the elements in each tensor. In this paper, we propose an attention-based hierarchical compression method utilizing a block-wise compression setup. We introduce an attention-based hyper-block autoencoder to capture inter-block correlations, followed by a block-wise encoder to capture block-specific information. A PCA-based post-processing step is employed to guarantee error bounds for each data block. Our method effectively captures both spatiotemporal and inter-variable correlations within and between data blocks. Compared to the state-of-the-art SZ3, our method achieves up to 8 times higher compression ratio on the multi-variable S3D dataset. When evaluated on single-variable setups using the E3SM and XGC datasets, our method still achieves up to 3 times and 2 times higher compression ratio, respectively.
Abstract:This paper utilizes video analytics to study pedestrian and vehicle traffic behavior, focusing on analyzing traffic patterns during football gamedays. The University of Florida (UF) hosts six to seven home football games on Saturdays during the college football season, attracting significant pedestrian activity. Through video analytics, this study provides valuable insights into the impact of these events on traffic volumes and safety at intersections. Comparing pedestrian and vehicle activities on gamedays versus non-gamedays reveals differing patterns. For example, pedestrian volume substantially increases during gamedays, which is positively correlated with the probability of the away team winning. This correlation is likely because fans of the home team enjoy watching difficult games. Win probabilities as an early predictor of pedestrian volumes at intersections can be a tool to help traffic professionals anticipate traffic management needs. Pedestrian-to-vehicle (P2V) conflicts notably increase on gamedays, particularly a few hours before games start. Addressing this, a "Barnes Dance" movement phase within the intersection is recommended. Law enforcement presence during high-activity gamedays can help ensure pedestrian compliance and enhance safety. In contrast, we identified that vehicle-to-vehicle (V2V) conflicts generally do not increase on gamedays and may even decrease due to heightened driver caution.
Abstract:Traffic congestion has significant impacts on both the economy and the environment. Measures of Effectiveness (MOEs) have long been the standard for evaluating the level of service and operational efficiency of traffic intersections. However, the scarcity of traditional high-resolution loop detector data (ATSPM) presents challenges in accurately measuring MOEs or capturing the intricate temporospatial characteristics inherent in urban intersection traffic. In response to this challenge, we have introduced the Multi-Task Deep Learning Digital Twin (MTDT) as a solution for multifaceted and precise intersection traffic flow simulation. MTDT enables accurate, fine-grained estimation of loop detector waveform time series for each lane of movement, alongside successful estimation of several MOEs for each lane group associated with a traffic phase concurrently and for all approaches of an arbitrary urban intersection. Unlike existing deep learning methodologies, MTDT distinguishes itself through its adaptability to local temporal and spatial features, such as signal timing plans, intersection topology, driving behaviors, and turning movement counts. While maintaining a straightforward design, our model emphasizes the advantages of multi-task learning in traffic modeling. By consolidating the learning process across multiple tasks, MTDT demonstrates reduced overfitting, increased efficiency, and enhanced effectiveness by sharing representations learned by different tasks. Furthermore, our approach facilitates sequential computation and lends itself to complete parallelization through GPU implementation. This not only streamlines the computational process but also enhances scalability and performance.
Abstract:Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We present a pipelined compression approach that first uses neural-network-based techniques to derive regions where QoI are highly likely to be present. Then, we employ a Guaranteed Autoencoder (GAE) to compress data with differential error bounds. GAE uses QoI information to apply low-error compression to only these regions. This results in overall high compression ratios while still achieving downstream goals of simulation or data collections. Experimental results are presented for climate data generated from the E3SM Simulation model for downstream quantities such as tropical cyclone and atmospheric river detection and tracking. These results show that our approach is superior to comparable methods in the literature.
Abstract:Traffic congestion has significant economic, environmental, and social ramifications. Intersection traffic flow dynamics are influenced by numerous factors. While microscopic traffic simulators are valuable tools, they are computationally intensive and challenging to calibrate. Moreover, existing machine-learning approaches struggle to provide lane-specific waveforms or adapt to intersection topology and traffic patterns. In this study, we propose two efficient and accurate "Digital Twin" models for intersections, leveraging Graph Attention Neural Networks (GAT). These attentional graph auto-encoder digital twins capture temporal, spatial, and contextual aspects of traffic within intersections, incorporating various influential factors such as high-resolution loop detector waveforms, signal state records, driving behaviors, and turning-movement counts. Trained on diverse counterfactual scenarios across multiple intersections, our models generalize well, enabling the estimation of detailed traffic waveforms for any intersection approach and exit lanes. Multi-scale error metrics demonstrate that our models perform comparably to microsimulations. The primary application of our study lies in traffic signal optimization, a pivotal area in transportation systems research. These lightweight digital twins can seamlessly integrate into corridor and network signal timing optimization frameworks. Furthermore, our study's applications extend to lane reconfiguration, driving behavior analysis, and facilitating informed decisions regarding intersection safety and efficiency enhancements. A promising avenue for future research involves extending this approach to urban freeway corridors and integrating it with measures of effectiveness metrics.
Abstract:We describe MGARD, a software providing MultiGrid Adaptive Reduction for floating-point scientific data on structured and unstructured grids. With exceptional data compression capability and precise error control, MGARD addresses a wide range of requirements, including storage reduction, high-performance I/O, and in-situ data analysis. It features a unified application programming interface (API) that seamlessly operates across diverse computing architectures. MGARD has been optimized with highly-tuned GPU kernels and efficient memory and device management mechanisms, ensuring scalable and rapid operations.
Abstract:Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy compression. We investigate the trade-off between data precision and temporal output rates, revealing that reducing data precision and increasing timestep frequency lead to more accurate analysis outcomes. Additionally, we integrate spatiotemporal feature detection with data compression and demonstrate that performing adaptive error-bounded compression in higher dimensional space enables greater compression ratios, leveraging the error propagation theory of a transformation-based compressor. To evaluate our approach, we conduct experiments using the well-known E3SM climate simulation code and apply our method to compress variables used for cyclone tracking. Our results show a significant reduction in storage size while enhancing the quality of cyclone tracking analysis, both quantitatively and qualitatively, in comparison to the prevalent timestep decimation approach. Compared to three state-of-the-art lossy compressors lacking feature preservation capabilities, our adaptive compression framework improves perfectly matched cases in TC tracking by 26.4-51.3% at medium compression ratios and by 77.3-571.1% at large compression ratios, with a merely 5-11% computational overhead.
Abstract:We address the problem of unsupervised semantic segmentation of outdoor LiDAR point clouds in diverse traffic scenarios. The key idea is to leverage the spatiotemporal nature of a dynamic point cloud sequence and introduce drastically stronger augmentation by establishing spatiotemporal correspondences across multiple frames. We dovetail clustering and pseudo-label learning in this work. Essentially, we alternate between clustering points into semantic groups and optimizing models using point-wise pseudo-spatiotemporal labels with a simple learning objective. Therefore, our method can learn discriminative features in an unsupervised learning fashion. We show promising segmentation performance on Semantic-KITTI, SemanticPOSS, and FLORIDA benchmark datasets covering scenarios in autonomous vehicle and intersection infrastructure, which is competitive when compared against many existing fully supervised learning methods. This general framework can lead to a unified representation learning approach for LiDAR point clouds incorporating domain knowledge.