In large-scale sensor networks, simultaneously operating all the sensors is power-consuming and computationally expensive. It is often necessary to adaptively select or activate a few sensors at a time. A greedy selection (GS) algorithm is widely used to select sensors in homogeneous sensor networks. It is guaranteed a worst-case performance $(1 - 1/e) \approx 63\%$ of the optimal solution when the performance metric is submodular. However, in heterogeneous sensor networks (HSNs), where the sensors can have different precision and operating costs, the sensor selection problem has not been explored sufficiently well. In this paper, a joint greedy selection (JGS) algorithm is proposed to compute the best possible subset of sensors in HSNs. We derive theoretical guarantees for the worst-case error of JGS for submodular performance metrics for an HSN consisting of two sets of sensors: a set with expensive high-precision sensors and a set of cheap low-precision sensors. A limit on the number of sensors from each class is stipulated, and we propose algorithms to solve the sensor selection problem and assess their theoretical performance guarantees. We show that the worst-case relative error approaches $(1 - 1/e)$ when the stipulated number of high-precision sensors is much smaller than that of low-precision sensors. To compare the JGS algorithm with existing methods, we propose a frame potential-based submodular performance metric that considers both the correlation among the measurements as well as the heterogeneity of the sensors. Experimentally, we show that the JGS algorithm results in $4$-$10$ dB lower error than existing methods.