In this paper, we leverage polar codes and the well-established channel polarization to design capacity-achieving codes with a certain constraint on the weights of all the columns in the generator matrix (GM) while having a low-complexity decoding algorithm. We first show that given a binary-input memoryless symmetric (BMS) channel $W$ and a constant $s \in (0, 1]$, there exists a polarization kernel such that the corresponding polar code is capacity-achieving with the \textit{rate of polarization} $s/2$, and the GM column weights being bounded from above by $N^s$. To improve the sparsity versus error rate trade-off, we devise a column-splitting algorithm and two coding schemes for BEC and then for general BMS channels. The \textit{polar-based} codes generated by the two schemes inherit several fundamental properties of polar codes with the original $2 \times 2$ kernel including the decay in error probability, decoding complexity, and the capacity-achieving property. Furthermore, they demonstrate the additional property that their GM column weights are bounded from above sublinearly in $N$, while the original polar codes have some column weights that are linear in $N$. In particular, for any BEC and $\beta <0.5$, the existence of a sequence of capacity-achieving polar-based codes where all the GM column weights are bounded from above by $N^\lambda$ with $\lambda \approx 0.585$, and with the error probability bounded by $O(2^{-N^{\beta}} )$ under a decoder with complexity $O(N\log N)$, is shown. The existence of similar capacity-achieving polar-based codes with the same decoding complexity is shown for any BMS channel and $\beta <0.5$ with $\lambda \approx 0.631$.