Perception is the main bottleneck to perform autonomous mobile manipulation tasks, especially in cluttered and unstructured environment. In this paper, we propose a novel two-stage paradigm that leverage both CNN object prior and generative sampling to perform object detection and 6D pose estimation. Our two-stage approach builds upon both CNN and generative sampling-based local search method to achieve sampling the network density, or SAND filter. We show the quantitative results that SAND effectively improve object detection result by reducing false positive and false negative recognitions, and further produces accurate pose estimation. We also conduct extensive categorical object sorting experiments to show our method is able to produce accurate and reliable detections and object poses.