We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class \[ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ W\textbf{x} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{T\times d}, \textbf{v} \in \mathbb{R}^{T}\right\} \] where the spectral norm of $W$ and $\textbf{v}$ is bounded by $O(1)$, the Frobenius norm of $W$ is bounded from its initialization by $R > 0$, and $\sigma$ is a Lipschitz activation function. We prove that if $\sigma$ is element-wise, then the sample complexity of $\mathcal{H}$ is width independent and that this complexity is tight. Moreover, we show that the element-wise property of $\sigma$ is essential for width-independent bound, in the sense that there exist non-element-wise activation functions whose sample complexity is provably width-dependent. For the upper bound, we use the recent approach for norm-based bounds named Approximate Description Length (ADL) by arXiv:1910.05697. We further develop new techniques and tools for this approach, that will hopefully inspire future works.