Oceans are the essential lifeblood of the Earth: they provide over 70% of the oxygen and over 97% of the water. Plankton and corals are two of the most fundamental components of ocean ecosystems, the former due to their function at many levels of the oceans food chain, the latter because they provide spawning and nursery grounds to many fish populations. Studying and monitoring plankton distribution and coral reefs is vital for environment protection. In the last years there has been a massive proliferation of digital imagery for the monitoring of underwater ecosystems and much research is concentrated on the automated recognition of plankton and corals. In this paper, we present a study about an automated system for monitoring of underwater ecosystems. The system here proposed is based on the fusion of different deep learning methods. We study how to create an ensemble based of different CNN models, fine tuned on several datasets with the aim of exploiting their diversity. The aim of our study is to experiment the possibility of fine-tuning pretrained CNN for underwater imagery analysis, the opportunity of using different datasets for pretraining models, the possibility to design an ensemble using the same architecture with small variations in the training procedure. The experimental results are very encouraging, our experiments performed on 5 well-knowns datasets (3 plankton and 2 coral datasets) show that the fusion of such different CNN models in a heterogeneous ensemble grants a substantial performance improvement with respect to other state-of-the-art approaches in all the tested problems. One of the main contributions of this work is a wide experimental evaluation of famous CNN architectures to report performance of both single CNN and ensemble of CNNs in different problems. Moreover, we show how to create an ensemble which improves the performance of the best single model.