Abstract:Precision pathology relies on detecting fine-grained morphological abnormalities within specific Regions of Interest (ROIs), as these local, texture-rich cues - rather than global slide contexts - drive expert diagnostic reasoning. While Vision-Language (V-L) models promise data efficiency by leveraging semantic priors, adapting them faces a critical Granularity Mismatch, where generic representations fail to resolve such subtle defects. Current adaptation methods often treat modalities as independent streams, failing to ground semantic prompts in ROI-specific visual contexts. To bridge this gap, we propose the Hierarchical Adaptation and Alignment Framework (HAAF). At its core is a novel Cross-Level Scaled Alignment (CLSA) mechanism that enforces a sequential calibration order: visual features first inject context into text prompts to generate content-adaptive descriptors, which then spatially guide the visual encoder to spotlight anomalies. Additionally, a dual-branch inference strategy integrates semantic scores with geometric prototypes to ensure stability in few-shot settings. Experiments on four benchmarks show HAAF significantly outperforms state-of-the-art methods and effectively scales with domain-specific backbones (e.g., CONCH) in low-resource scenarios.




Abstract:Simulation is pivotal in evaluating the performance of autonomous driving systems due to the advantages in efficiency and cost compared to on-road testing. Realistic multi-agent behavior~(e.g., interactive and long-term) is needed to narrow the gap between the simulation and the reality. The existing work has the following shortcomings in achieving this goal:~(1) log replay offers realistic scenarios but leads to unrealistic collisions due to lacking dynamic interactions, and~(2) model-based and learning-based solutions encourage interactions but often deviate from real-world data in long horizons. In this work, we propose LitSim, a long-term interactive simulation approach that maximizes realism while avoiding unrealistic collisions. Specifically, we replay the log for most scenarios and intervene only when LitSim predicts unrealistic conflicts. We then encourage interactions among the agents and resolve the conflicts, thereby reducing the likelihood of unrealistic collisions. We train and validate our model on the real-world dataset NGSIM, and the experimental results demonstrate that LitSim outperforms the current popular approaches in realism and reactivity.