Heterogeneous computing systems provide high performance and energy efficiency. However, to optimally utilize such systems, solutions that distribute the work across host CPUs and accelerating devices are needed. In this paper, we present a performance and energy aware approach that combines AI planning heuristics for parameter space exploration with a machine learning model for performance and energy evaluation to determine a near-optimal system configuration. For data-parallel applications our approach determines a near-optimal host-device distribution of work, number of processing units required and the corresponding scheduling strategy. We evaluate our approach for various heterogeneous systems accelerated with GPU or the Intel Xeon Phi. The experimental results demonstrate that our approach finds a near-optimal system configuration by evaluating only about 7% of reasonable configurations. Furthermore, the performance per Joule estimation of system configurations using our machine learning model is more than 1000x faster compared to the system evaluation by program execution.