Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning (FL) systems. While both of them have attracted great research interest with specific strategies developed, no known solution manages to address them in a unified framework. To universally overcome both challenges, we propose SmartFL, a generic approach that optimizes the server-side aggregation process with a small amount of proxy data collected by the service provider itself via a subspace training technique. Specifically, the aggregation weight of each participating client at each round is optimized using the server-collected proxy data, which is essentially the optimization of the global model in the convex hull spanned by client models. Since at each round, the number of tunable parameters optimized on the server side equals the number of participating clients (thus independent of the model size), we are able to train a global model with massive parameters using only a small amount of proxy data (e.g., around one hundred samples). With optimized aggregation, SmartFL ensures robustness against both heterogeneous and malicious clients, which is desirable in real-world FL where either or both problems may occur. We provide theoretical analyses of the convergence and generalization capacity for SmartFL. Empirically, SmartFL achieves state-of-the-art performance on both FL with non-IID data distribution and FL with malicious clients. The source code will be released.