Multi-human 3D pose estimation plays a key role in establishing a seamless connection between the real world and the virtual world. Recent efforts adopted a two-stage framework that first builds 2D pose estimations in multiple camera views from different perspectives and then synthesizes them into 3D poses. However, the focus has largely been on developing new computer vision algorithms on the offline video datasets without much consideration on the energy constraints in real-world systems with flexibly-deployed and battery-powered cameras. In this paper, we propose an energy-efficient edge-assisted multiple-camera system, dubbed E$^3$Pose, for real-time multi-human 3D pose estimation, based on the key idea of adaptive camera selection. Instead of always employing all available cameras to perform 2D pose estimations as in the existing works, E$^3$Pose selects only a subset of cameras depending on their camera view qualities in terms of occlusion and energy states in an adaptive manner, thereby reducing the energy consumption (which translates to extended battery lifetime) and improving the estimation accuracy. To achieve this goal, E$^3$Pose incorporates an attention-based LSTM to predict the occlusion information of each camera view and guide camera selection before cameras are selected to process the images of a scene, and runs a camera selection algorithm based on the Lyapunov optimization framework to make long-term adaptive selection decisions. We build a prototype of E$^3$Pose on a 5-camera testbed, demonstrate its feasibility and evaluate its performance. Our results show that a significant energy saving (up to 31.21%) can be achieved while maintaining a high 3D pose estimation accuracy comparable to state-of-the-art methods.