This paper studies the problem of how efficiently functions in the Sobolev spaces $\mathcal{W}^{s,q}([0,1]^d)$ and Besov spaces $\mathcal{B}^s_{q,r}([0,1]^d)$ can be approximated by deep ReLU neural networks with width $W$ and depth $L$, when the error is measured in the $L^p([0,1]^d)$ norm. This problem has been studied by several recent works, which obtained the approximation rate $\mathcal{O}((WL)^{-2s/d})$ up to logarithmic factors when $p=q=\infty$, and the rate $\mathcal{O}(L^{-2s/d})$ for networks with fixed width when the Sobolev embedding condition $1/q -1/p<s/d$ holds. We generalize these results by showing that the rate $\mathcal{O}((WL)^{-2s/d})$ indeed holds under the Sobolev embedding condition. It is known that this rate is optimal up to logarithmic factors. The key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth, which may be of independent interest.