Abstract:In this paper, we address the problem of person re-identification problem, i.e., retrieving instances from gallery which are generated by the same person as the given probe image. This is very challenging because the person's appearance usually undergoes significant variations due to changes in illumination, camera angle and view, background clutter, and occlusion over the camera network. In this paper, we assume that the matched gallery images should not only be similar to the probe, but also be similar to each other, under suitable metric. We express this assumption with a fully connected CRF model in which each node corresponds to a gallery and every pair of nodes are connected by an edge. A label variable is associated with each node to indicate whether the corresponding image is from target person. We define unary potential for each node using existing feature calculation and matching techniques, which reflect the similarity between probe and gallery image, and define pairwise potential for each edge in terms of a weighed combination of Gaussian kernels, which encode appearance similarity between pair of gallery images. The specific form of pairwise potential allows us to exploit an efficient inference algorithm to calculate the marginal distribution of each label variable for this dense connected CRF. We show the superiority of our method by applying it to public datasets and comparing with the state of the art.
Abstract:One of the fundamental requirements for visual surveillance using smart camera networks is the correct association of each persons observations generated on different cameras. Recently, distributed data association that involves only local information processing on each camera node and mutual information exchanging between neighboring cameras has attracted many research interests due to its superiority in large scale applications. In this paper, we formulate the problem of data association in smart camera networks as an Integer Programming problem by introducing a set of linking variables, and propose two distributed algorithms, namely L-DD and Q-DD, to solve the Integer Programming problem using dual decomposition technique. In our algorithms, the original IP problem is decomposed into several sub-problems, which can be solved locally and efficiently on each smart camera, and then different sub-problems reach consensus on their solutions in a rigorous way by adjusting their parameters based on projected sub-gradient optimization. The proposed methods are simple and flexible, in that (i) we can incorporate any feature extraction and matching technique into our framework to measure the similarity between two observations, which is used to define the cost of each link, and (ii) we can decompose the original problem in any way as long as the resulting sub-problem can be solved independently on individual camera. We show the competitiveness of our methods in both accuracy and speed by theoretical analysis and experimental comparison with state of the art algorithms on two real data sets collected by camera networks in our campus garden and office building.
Abstract:One of the fundamental requirements for visual surveillance using non-overlapping camera networks is the correct labeling of tracked objects on each camera in a consistent way,in the sense that the captured tracklets, or observations in this paper, of the same object at different cameras should be assigned with the same label. In this paper, we formulate this task as a Bayesian inference problem and propose a distributed inference framework in which the posterior distribution of labeling variable corresponding to each observation, conditioned on all history appearance and spatio-temporal evidence made in the whole networks, is calculated based solely on local information processing on each camera and mutual information exchanging between neighboring cameras. In our framework, the number of objects presenting in the monitored region, i.e. the sampling space of labeling variables, does not need to be specified beforehand. Instead, it can be determined automatically on the fly. In addition, we make no assumption about the appearance distribution of a single object, but use similarity scores between appearance pairs, given by advanced object re-identification algorithm, as appearance likelihood for inference. This feature makes our method very flexible and competitive when observing condition undergoes large changes across camera views. To cope with the problem of missing detection, which is critical for distributed inference, we consider an enlarged neighborhood of each camera during inference and use a mixture model to describe the higher order spatio-temporal constraints. The robustness of the algorithm against missing detection is improved at the cost of slightly increased computation and communication burden at each camera node. Finally, we demonstrate the effectiveness of our method through experiments on an indoor Office Building dataset and an outdoor Campus Garden dataset.