Nowadays, the Internet of Things (IoT) has become one of the most important technologies which enables a variety of connected and intelligent applications in smart cities. The smart decision making process of IoT devices not only relies on the large volume of data collected from their sensors, but also depends on advanced optimization theories and novel machine learning technologies which can process and analyse the collected data in specific network structure. Therefore, it becomes practically important to investigate how different optimization algorithms and machine learning techniques can be leveraged to improve system performance. As one of the most important vertical domains for IoT applications, smart transportation system has played a key role for providing real-world information and services to citizens by making their access to transport facilities easier and thus it is one of the key application areas to be explored in this thesis. In a nutshell, this thesis covers three key topics related to applying mathematical optimization and deep learning methods to IoT networks. In the first topic, we propose an optimal transmission frequency management scheme using decentralized ADMM-based method in a IoT network and introduce a mechanism to identify anomalies in data transmission frequency using an LSTM-based architecture. In the second topic, we leverage graph neural network (GNN) for demand prediction for shared bikes. In particular, we introduce a novel architecture, i.e., attention-based spatial temporal graph convolutional network (AST-GCN), to improve the prediction accuracy in real world datasets. In the last topic, we consider a highway traffic network scenario where frequent lane changing behaviors may occur with probability. A specific GNN based anomaly detector is devised to reveal such a probability driven by data collected in a dedicated mobility simulator.