Accurate estimation of the traffic state over a network is essential since it is the starting point for designing and implementing any traffic management strategy. Hence, traffic operators and users of a transportation network can make reliable decisions such as influence/change route or mode choice. However, the problem of traffic state estimation from various sensors within an urban environment is very complex for several different reasons, such as availability of sensors, different noise levels, different output quantities, sensor accuracy, heterogeneous data fusion, and many more. To provide a better understanding of this problem, we organized an experimental campaign with video measurement in an area within the urban network of Zurich, Switzerland. We focus on capturing the traffic state in terms of traffic flow and travel times by ensuring measurements from established thermal cameras by the city's authorities, processed video data, and the Google Distance Matrix. We assess the different data sources, and we propose a simple yet efficient Multiple Linear Regression (MLR) model to estimate travel times with fusion of various data sources. Comparative results with ground-truth data (derived from video measurements) show the efficiency and robustness of the proposed methodology.