We consider a subclass of $n$-player stochastic games, in which players have their own internal state/action spaces while they are coupled through their payoff functions. It is assumed that players' internal chains are driven by independent transition probabilities. Moreover, players can only receive realizations of their payoffs but not the actual functions, nor can they observe each others' states/actions. Under some assumptions on the structure of the payoff functions, we develop efficient learning algorithms based on Dual Averaging and Dual Mirror Descent, which provably converge almost surely or in expectation to the set of $\epsilon$-Nash equilibrium policies. In particular, we derive upper bounds on the number of iterates that scale polynomially in terms of the game parameters to achieve an $\epsilon$-Nash equilibrium policy. Besides Markov potential games and linear-quadratic stochastic games, this work provides another interesting subclass of $n$-player stochastic games that under some assumption provably admit polynomial-time learning algorithm for finding their $\epsilon$-Nash equilibrium policies.