Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oleg Voynov

ATATA: One Algorithm to Align Them All

Jan 16, 2026

Boyi Pang, Savva Ignatyev, Vladimir Ippolitov, Ramil Khafizov, Yurii Melnik, Oleg Voynov, Maksim Nakhodnov, Aibek Alanov, Xiaopeng Fan, Peter Wonka(+1 more)

Abstract:We suggest a new multi-modal algorithm for joint inference of paired structurally aligned samples with Rectified Flow models. While some existing methods propose a codependent generation process, they do not view the problem of joint generation from a structural alignment perspective. Recent work uses Score Distillation Sampling to generate aligned 3D models, but SDS is known to be time-consuming, prone to mode collapse, and often provides cartoonish results. By contrast, our suggested approach relies on the joint transport of a segment in the sample space, yielding faster computation at inference time. Our approach can be built on top of an arbitrary Rectified Flow model operating on the structured latent space. We show the applicability of our method to the domains of image, video, and 3D shape generation using state-of-the-art baselines and evaluate it against both editing-based and joint inference-based competing approaches. We demonstrate a high degree of structural alignment for the sample pairs obtained with our method and a high visual quality of the samples. Our method improves the state-of-the-art for image and video generation pipelines. For 3D generation, it is able to show comparable quality while working orders of magnitude faster.

Via

Access Paper or Ask Questions

A3D: Does Diffusion Dream about 3D Alignment?

Jun 21, 2024

Savva Ignatyev, Nina Konovalova, Daniil Selikhanovych, Nikolay Patakin, Oleg Voynov, Dmitry Senushkin, Alexander Filippov, Anton Konushin, Peter Wonka, Evgeny Burnaev

Figure 1 for A3D: Does Diffusion Dream about 3D Alignment?

Figure 2 for A3D: Does Diffusion Dream about 3D Alignment?

Figure 3 for A3D: Does Diffusion Dream about 3D Alignment?

Figure 4 for A3D: Does Diffusion Dream about 3D Alignment?

Abstract:We tackle the problem of text-driven 3D generation from a geometry alignment perspective. We aim at the generation of multiple objects which are consistent in terms of semantics and geometry. Recent methods based on Score Distillation have succeeded in distilling the knowledge from 2D diffusion models to high-quality objects represented by 3D neural radiance fields. These methods handle multiple text queries separately, and therefore, the resulting objects have a high variability in object pose and structure. However, in some applications such as geometry editing, it is desirable to obtain aligned objects. In order to achieve alignment, we propose to optimize the continuous trajectories between the aligned objects, by modeling a space of linear pairwise interpolations of the textual embeddings with a single NeRF representation. We demonstrate that similar objects, consisting of semantically corresponding parts, can be well aligned in 3D space without costly modifications to the generation process. We provide several practical scenarios including mesh editing and object hybridization that benefit from geometry alignment and experimentally demonstrate the efficiency of our method. https://voyleg.github.io/a3d/

Via

Access Paper or Ask Questions

NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion

Dec 07, 2023

Savva Ignatyev, Daniil Selikhanovych, Oleg Voynov, Yiqun Wang, Peter Wonka, Stamatios Lefkimmiatis, Evgeny Burnaev

Figure 1 for NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion

Figure 2 for NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion

Figure 3 for NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion

Figure 4 for NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion

Abstract:We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured. Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to complete the shape in unobserved regions in a plausible manner. We introduce three components. First, we suggest employing normal maps as a pure geometric representation for SDS instead of color renderings which are entangled with the appearance information. Second, we introduce the freezing of the SDS noise during training which results in more coherent gradients and better convergence. Third, we propose Multi-View SDS as a way to condition the generation of the non-observable part of the surface without fine-tuning or making changes to the underlying 2D Stable Diffusion model. We evaluate our approach on the BlendedMVS dataset demonstrating significant qualitative and quantitative improvements over competing methods.

Via

Access Paper or Ask Questions

Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects

May 29, 2023

Yue Fan, Ivan Skorokhodov, Oleg Voynov, Savva Ignatyev, Evgeny Burnaev, Peter Wonka, Yiqun Wang

Abstract:We develop a method that recovers the surface, materials, and illumination of a scene from its posed multi-view images. In contrast to prior work, it does not require any additional data and can handle glossy objects or bright lighting. It is a progressive inverse rendering approach, which consists of three stages. First, we reconstruct the scene radiance and signed distance function (SDF) with our novel regularization strategy for specular reflections. Our approach considers both the diffuse and specular colors, which allows for handling complex view-dependent lighting effects for surface reconstruction. Second, we distill light visibility and indirect illumination from the learned SDF and radiance field using learnable mapping functions. Third, we design a method for estimating the ratio of incoming direct light represented via Spherical Gaussians reflected in a specular manner and then reconstruct the materials and direct illumination of the scene. Experimental results demonstrate that the proposed method outperforms the current state-of-the-art in recovering surfaces, materials, and lighting without relying on any additional data.

* 12 pages, 10 figures. Project page: https://authors-hub.github.io/Factored-NeuS

Via

Access Paper or Ask Questions

Multi-sensor large-scale dataset for multi-view 3D reconstruction

Mar 11, 2022

Oleg Voynov, Gleb Bobrovskikh, Pavel Karpyshev, Andrei-Timotei Ardelean, Arseniy Bozhenko, Saveliy Galochkin, Ekaterina Karmanova, Pavel Kopanev, Yaroslav Labutin-Rymsho, Ruslan Rakhimov(+6 more)

Figure 1 for Multi-sensor large-scale dataset for multi-view 3D reconstruction

Figure 2 for Multi-sensor large-scale dataset for multi-view 3D reconstruction

Figure 3 for Multi-sensor large-scale dataset for multi-view 3D reconstruction

Figure 4 for Multi-sensor large-scale dataset for multi-view 3D reconstruction

Abstract:We present a new multi-sensor dataset for 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. The data for each scene is obtained under a large number of lighting conditions, and the scenes are selected to emphasize a diverse set of material properties challenging for existing algorithms. In the acquisition process, we aimed to maximize high-resolution depth data quality for challenging cases, to provide reliable ground truth for learning algorithms. Overall, we provide over 1.4 million images of 110 different scenes acquired at 14 lighting conditions from 100 viewing directions. We expect our dataset will be useful for evaluation and training of 3D reconstruction algorithms of different types and for other related tasks. Our dataset and accompanying software will be available online.

Via

Access Paper or Ask Questions

Can We Use Neural Regularization to Solve Depth Super-Resolution?

Dec 21, 2021

Milena Gazdieva, Oleg Voynov, Alexey Artemov, Youyi Zheng, Luiz Velho, Evgeny Burnaev

Figure 1 for Can We Use Neural Regularization to Solve Depth Super-Resolution?

Figure 2 for Can We Use Neural Regularization to Solve Depth Super-Resolution?

Figure 3 for Can We Use Neural Regularization to Solve Depth Super-Resolution?

Figure 4 for Can We Use Neural Regularization to Solve Depth Super-Resolution?

Abstract:Depth maps captured with commodity sensors often require super-resolution to be used in applications. In this work we study a super-resolution approach based on a variational problem statement with Tikhonov regularization where the regularizer is parametrized with a deep neural network. This approach was previously applied successfully in photoacoustic tomography. We experimentally show that its application to depth map super-resolution is difficult, and provide suggestions about the reasons for that.

* 9 pages

Via

Access Paper or Ask Questions

Towards Unpaired Depth Enhancement and Super-Resolution in the Wild

May 25, 2021

Aleksandr Safin, Maxim Kan, Nikita Drobyshev, Oleg Voynov, Alexey Artemov, Alexander Filippov, Denis Zorin, Evgeny Burnaev

Figure 1 for Towards Unpaired Depth Enhancement and Super-Resolution in the Wild

Figure 2 for Towards Unpaired Depth Enhancement and Super-Resolution in the Wild

Figure 3 for Towards Unpaired Depth Enhancement and Super-Resolution in the Wild

Figure 4 for Towards Unpaired Depth Enhancement and Super-Resolution in the Wild

Abstract:Depth maps captured with commodity sensors are often of low quality and resolution; these maps need to be enhanced to be used in many applications. State-of-the-art data-driven methods of depth map super-resolution rely on registered pairs of low- and high-resolution depth maps of the same scenes. Acquisition of real-world paired data requires specialized setups. Another alternative, generating low-resolution maps from high-resolution maps by subsampling, adding noise and other artificial degradation methods, does not fully capture the characteristics of real-world low-resolution images. As a consequence, supervised learning methods trained on such artificial paired data may not perform well on real-world low-resolution inputs. We consider an approach to depth map enhancement based on learning from unpaired data. While many techniques for unpaired image-to-image translation have been proposed, most are not directly applicable to depth maps. We propose an unpaired learning method for simultaneous depth enhancement and super-resolution, which is based on a learnable degradation model and surface normal estimates as features to produce more accurate depth maps. We demonstrate that our method outperforms existing unpaired methods and performs on par with paired methods on a new benchmark for unpaired learning that we developed.

Via

Access Paper or Ask Questions

How Good MVSNets Are at Depth Fusion

Nov 30, 2020

Oleg Voynov, Aleksandr Safin, Savva Ignatyev, Evgeny Burnaev

Figure 1 for How Good MVSNets Are at Depth Fusion

Figure 2 for How Good MVSNets Are at Depth Fusion

Figure 3 for How Good MVSNets Are at Depth Fusion

Figure 4 for How Good MVSNets Are at Depth Fusion

Abstract:We study the effects of the additional input to deep multi-view stereo methods in the form of low-quality sensor depth. We modify two state-of-the-art deep multi-view stereo methods for using with the input depth. We show that the additional input depth may improve the quality of deep multi-view stereo.

* 7 pages, 6 figures, 1 table. Accepted to ICMV 2020

Via

Access Paper or Ask Questions

Deep Vectorization of Technical Drawings

Mar 16, 2020

Vage Egiazarian, Oleg Voynov, Alexey Artemov, Denis Volkhonskiy, Aleksandr Safin, Maria Taktasheva, Denis Zorin, Evgeny Burnaev

Figure 1 for Deep Vectorization of Technical Drawings

Figure 2 for Deep Vectorization of Technical Drawings

Figure 3 for Deep Vectorization of Technical Drawings

Figure 4 for Deep Vectorization of Technical Drawings

Abstract:We present a new method for vectorization of technical line drawings, such as floor plans, architectural drawings, and 2D CAD images. Our method includes (1) a deep learning-based cleaning stage to eliminate the background and imperfections in the image and fill in missing parts, (2) a transformer-based network to estimate vector primitives, and (3) optimization procedure to obtain the final primitive configurations. We train the networks on synthetic data, renderings of vector line drawings, and manually vectorized scans of line drawings. Our method quantitatively and qualitatively outperforms a number of existing techniques on a collection of representative technical drawings.

Via

Access Paper or Ask Questions

Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds

Dec 13, 2019

Vage Egiazarian, Savva Ignatyev, Alexey Artemov, Oleg Voynov, Andrey Kravchenko, Youyi Zheng, Luiz Velho, Evgeny Burnaev

Figure 1 for Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds

Figure 2 for Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds

Figure 3 for Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds

Figure 4 for Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds

Abstract:Constructing high-quality generative models for 3D shapes is a fundamental task in computer vision with diverse applications in geometry processing, engineering, and design. Despite the recent progress in deep generative modelling, synthesis of finely detailed 3D surfaces, such as high-resolution point clouds, from scratch has not been achieved with existing approaches. In this work, we propose to employ the latent-space Laplacian pyramid representation within a hierarchical generative model for 3D point clouds. We combine the recently proposed latent-space GAN and Laplacian GAN architectures to form a multi-scale model capable of generating 3D point clouds at increasing levels of detail. Our evaluation demonstrates that our model outperforms the existing generative models for 3D point clouds.

Via

Access Paper or Ask Questions