We propose GeoFusion, a SLAM-based scene estimation method for building an object-level semantic map in dense clutter. In dense clutter, objects are often in close contact and severe occlusions, which brings more false detections and noisy pose estimates from existing perception methods. To solve these problems, our key insight is to consider geometric consistency at the object level within a general SLAM framework. The geometric consistency is defined in two parts: geometric consistency score and geometric relation. The geometric consistency score describes the compatibility between object geometry model and observation point cloud. Meanwhile it provides a reliable measure to filter out false positives in data association. The geometric relation represents the relationship (e.g. contact) between geometric features (e.g. planes) among objects. The geometric relation makes the graph optimization for poses more robust and accurate. GeoFusion can robustly and efficiently infer the object labels, 6D object poses and spatial relations from continutous noisy semantic measurements. We quantitatively evaluate our method using observations from a Fetch mobile manipulation robot. Our results demonstrate greater robustness against false estimates than frame-by-frame pose estimation from the state-of-the-art convolutional neural network.