Abstract:Background consistency remains a significant challenge in image editing tasks. Despite extensive developments, existing works still face a trade-off between maintaining similarity to the original image and generating content that aligns with the target. Here, we propose KV-Edit, a training-free approach that uses KV cache in DiTs to maintain background consistency, where background tokens are preserved rather than regenerated, eliminating the need for complex mechanisms or expensive training, ultimately generating new content that seamlessly integrates with the background within user-provided regions. We further explore the memory consumption of the KV cache during editing and optimize the space complexity to $O(1)$ using an inversion-free method. Our approach is compatible with any DiT-based generative model without additional training. Experiments demonstrate that KV-Edit significantly outperforms existing approaches in terms of both background and image quality, even surpassing training-based methods. Project webpage is available at https://xilluill.github.io/projectpages/KV-Edit
Abstract:Since NASA put forward the concept of the digital twin in 2010, many industries have put forward the dynamic goal of digital development, and the transportation industry is also among them. With more and more companies laying out on this virgin land, the digital twin transportation industry has grown rapidly and gradually formed a complete scientific research system. However, under the largely mature framework, there are still many loophole problems that need to be solved. In the process of constructing a road network with point cloud information, we summarize several major features of the point cloud collected by laser scanners and analyze the potential problems of constructing the network, such as misjudging the feature points as ground points and grid voids. On this basis, we reviewed relevant literature and proposed targeted solutions, such as building a point cloud pyramid modeled after the image pyramid, expanding the virtual grid, etc., applying CSF for ground-point cloud extraction, and constructing a road network model using the PTD (progressive density-based filter) algorithm. For the problem of road sign detection, we optimize the remote sensing data in the ground point cloud by enhancing the information density using edge detection, improving the data quality by removing the low intensity points, and achieving 90% accuracy of road text recognition using PaddleOCR and Densenet. As for the real-time digital twin traffic, we design the P2PRN network using the backbone of MPR-GAN for 2D feature generation and SuperGlue for 2D feature matching, rendering the viewpoints according to the matching optimization points, completing the multimodal matching task after several iterations, and successfully calculating the road camera position with 10{\deg} and 15m accuracy.