Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zoran Kostic

A Vision-Based Analysis of Congestion Pricing in New York City

Feb 03, 2026

Mehmet Kerem Turkcan, Jhonatan Tavori, Javad Ghaderi, Gil Zussman, Zoran Kostic, Andrew Smyth

Abstract:We examine the impact of New York City's congestion pricing program through automated analysis of traffic camera data. Our computer vision pipeline processes footage from over 900 cameras distributed throughout Manhattan and New York, comparing traffic patterns from November 2024 through the program's implementation in January 2025 until January 2026. We establish baseline traffic patterns and identify systematic changes in vehicle density across the monitored region.

Via

Access Paper or Ask Questions

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization

Sep 18, 2025

Yu-Wen Chen, William Ho, Maxim Topaz, Julia Hirschberg, Zoran Kostic

Abstract:Speaker diarization (SD) struggles in real-world scenarios due to dynamic environments and unknown speaker counts. SD is rarely used alone and is often paired with automatic speech recognition (ASR), but non-modular methods that jointly train on domain-specific data have limited flexibility. Moreover, many applications require true speaker identities rather than SD's pseudo labels. We propose a training-free modular pipeline combining off-the-shelf SD, ASR, and a large language model (LLM) to determine who spoke, what was said, and who they are. Using structured LLM prompting on reconciled SD and ASR outputs, our method leverages semantic continuity in conversational context to refine low-confidence speaker labels and assigns role identities while correcting split speakers. On a real-world patient-clinician dataset, our approach achieves a 29.7% relative error reduction over baseline reconciled SD and ASR. It enhances diarization performance without additional training and delivers a complete pipeline for SD, ASR, and speaker identity detection in practical applications.

Via

Access Paper or Ask Questions

Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks

Mar 16, 2025

Mehmet Kerem Turkcan, Mattia Ballo, Filippo Filicori, Zoran Kostic

Figure 1 for Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks

Figure 2 for Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks

Figure 3 for Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks

Figure 4 for Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks

Abstract:We introduce specialized diffusion-based generative models that capture the spatiotemporal dynamics of fine-grained robotic surgical sub-stitch actions through supervised learning on annotated laparoscopic surgery footage. The proposed models form a foundation for data-driven world models capable of simulating the biomechanical interactions and procedural dynamics of surgical suturing with high temporal fidelity. Annotating a dataset of $\sim2K$ clips extracted from simulation videos, we categorize surgical actions into fine-grained sub-stitch classes including ideal and non-ideal executions of needle positioning, targeting, driving, and withdrawal. We fine-tune two state-of-the-art video diffusion models, LTX-Video and HunyuanVideo, to generate high-fidelity surgical action sequences at $\ge$768x512 resolution and $\ge$49 frames. For training our models, we explore both Low-Rank Adaptation (LoRA) and full-model fine-tuning approaches. Our experimental results demonstrate that these world models can effectively capture the dynamics of suturing, potentially enabling improved training simulators, surgical skill assessment tools, and autonomous surgical systems. The models also display the capability to differentiate between ideal and non-ideal technique execution, providing a foundation for building surgical training and evaluation systems. We release our models for testing and as a foundation for future research. Project Page: https://mkturkcan.github.io/suturingmodels/

Via

Access Paper or Ask Questions

The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications

Nov 29, 2024

Navid Salami Pargoo, Mahshid Ghasemi, Shuren Xia, Mehmet Kerem Turkcan, Taqiya Ehsan, Chengbo Zang, Yuan Sun, Javad Ghaderi, Gil Zussman, Zoran Kostic(+1 more)

Figure 1 for The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications

Figure 2 for The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications

Figure 3 for The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications

Figure 4 for The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications

Abstract:As urban populations grow, cities are becoming more complex, driving the deployment of interconnected sensing systems to realize the vision of smart cities. These systems aim to improve safety, mobility, and quality of life through applications that integrate diverse sensors with real-time decision-making. Streetscape applications-focusing on challenges like pedestrian safety and adaptive traffic management-depend on managing distributed, heterogeneous sensor data, aligning information across time and space, and enabling real-time processing. These tasks are inherently complex and often difficult to scale. The Streetscape Application Services Stack (SASS) addresses these challenges with three core services: multimodal data synchronization, spatiotemporal data fusion, and distributed edge computing. By structuring these capabilities as clear, composable abstractions with clear semantics, SASS allows developers to scale streetscape applications efficiently while minimizing the complexity of multimodal integration. We evaluated SASS in two real-world testbed environments: a controlled parking lot and an urban intersection in a major U.S. city. These testbeds allowed us to test SASS under diverse conditions, demonstrating its practical applicability. The Multimodal Data Synchronization service reduced temporal misalignment errors by 88%, achieving synchronization accuracy within 50 milliseconds. Spatiotemporal Data Fusion service improved detection accuracy for pedestrians and vehicles by over 10%, leveraging multicamera integration. The Distributed Edge Computing service increased system throughput by more than an order of magnitude. Together, these results show how SASS provides the abstractions and performance needed to support real-time, scalable urban applications, bridging the gap between sensing infrastructure and actionable streetscape intelligence.

Via

Access Paper or Ask Questions

Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes

Sep 04, 2024

Mehmet Kerem Turkcan, Ian Li, Chengbo Zang, Javad Ghaderi, Gil Zussman, Zoran Kostic

Figure 1 for Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes

Figure 2 for Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes

Figure 3 for Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes

Abstract:We introduce Boundless, a photo-realistic synthetic data generation system for enabling highly accurate object detection in dense urban streetscapes. Boundless can replace massive real-world data collection and manual ground-truth object annotation (labeling) with an automated and configurable process. Boundless is based on the Unreal Engine 5 (UE5) City Sample project with improvements enabling accurate collection of 3D bounding boxes across different lighting and scene variability conditions. We evaluate the performance of object detection models trained on the dataset generated by Boundless when used for inference on a real-world dataset acquired from medium-altitude cameras. We compare the performance of the Boundless-trained model against the CARLA-trained model and observe an improvement of 7.8 mAP. The results we achieved support the premise that synthetic data generation is a credible methodology for training/fine-tuning scalable object detection models for urban scenes.

Via

Access Paper or Ask Questions

Data-Driven Traffic Simulation for an Intersection in a Metropolis

Aug 01, 2024

Chengbo Zang, Mehmet Kerem Turkcan, Gil Zussman, Javad Ghaderi, Zoran Kostic

Figure 1 for Data-Driven Traffic Simulation for an Intersection in a Metropolis

Figure 2 for Data-Driven Traffic Simulation for an Intersection in a Metropolis

Figure 3 for Data-Driven Traffic Simulation for an Intersection in a Metropolis

Figure 4 for Data-Driven Traffic Simulation for an Intersection in a Metropolis

Abstract:We present a novel data-driven simulation environment for modeling traffic in metropolitan street intersections. Using real-world tracking data collected over an extended period of time, we train trajectory forecasting models to learn agent interactions and environmental constraints that are difficult to capture conventionally. Trajectories of new agents are first coarsely generated by sampling from the spatial and temporal generative distributions, then refined using state-of-the-art trajectory forecasting models. The simulation can run either autonomously, or under explicit human control conditioned on the generative distributions. We present the experiments for a variety of model configurations. Under an iterative prediction scheme, the way-point-supervised TrajNet++ model obtained 0.36 Final Displacement Error (FDE) in 20 FPS on an NVIDIA A100 GPU.

* CVPR 2024 Workshop POETS Oral

Via

Access Paper or Ask Questions

Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

Apr 25, 2024

Mehmet Kerem Turkcan, Sanjeev Narasimhan, Chengbo Zang, Gyung Hyun Je, Bo Yu, Mahshid Ghasemi, Javad Ghaderi, Gil Zussman, Zoran Kostic

Figure 1 for Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

Figure 2 for Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

Figure 3 for Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

Figure 4 for Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

Abstract:We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above. It enables the testing of object detection models for variations in lighting, building shadows, weather, and scene dynamics. We evaluate contemporary object detection architectures on the dataset, observing that state-of-the-art methods have lower performance in detecting small pedestrians compared to vehicles, corresponding to a 10% difference in average precision (AP). Using structurally similar datasets for pretraining the models results in an increase of 1.8% mean AP (mAP). We further find that incorporating domain-specific data augmentations helps improve model performance. Using pseudo-labeled data, obtained from inference outcomes of the best-performing models, improves the performance of the models. Finally, comparing the models trained using the data collected in two different time intervals, we find a performance drift in models due to the changes in intersection conditions over time. The best-performing model achieves a pedestrian AP of 92.0% with 11.5 ms inference time on NVIDIA A100 GPUs, and an mAP of 95.4%.

Via

Access Paper or Ask Questions

Birds Eye View Social Distancing Analysis System

Dec 14, 2021

Zhengye Yang, Mingfei Sun, Hongzhe Ye, Zihao Xiong, Gil Zussman, Zoran Kostic

Figure 1 for Birds Eye View Social Distancing Analysis System

Figure 2 for Birds Eye View Social Distancing Analysis System

Figure 3 for Birds Eye View Social Distancing Analysis System

Figure 4 for Birds Eye View Social Distancing Analysis System

Abstract:Social distancing can reduce the infection rates in respiratory pandemics such as COVID-19. Traffic intersections are particularly suitable for monitoring and evaluation of social distancing behavior in metropolises. We propose and evaluate a privacy-preserving social distancing analysis system (B-SDA), which uses bird's-eye view video recordings of pedestrians who cross traffic intersections. We devise algorithms for video pre-processing, object detection and tracking which are rooted in the known computer-vision and deep learning techniques, but modified to address the problem of detecting very small objects/pedestrians captured by a highly elevated camera. We propose a method for incorporating pedestrian grouping for detection of social distancing violations. B-SDA is used to compare pedestrian behavior based on pre-pandemic and pandemic videos in a major metropolitan area. The accomplished pedestrian detection performance is $63.0\%$ $AP_{50}$ and the tracking performance is $47.6\%$ MOTA. The social distancing violation rate of $15.6\%$ during the pandemic is notably lower than $31.4\%$ pre-pandemic baseline, indicating that pedestrians followed CDC-prescribed social distancing recommendations. The proposed system is suitable for deployment in real-world applications.

Via

Access Paper or Ask Questions