The main goal in many fields in empirical sciences is to discover causal relationships among a set of variables from observational data. PC algorithm is one of the promising solutions to learn the underlying causal structure by performing a number of conditional independence tests. In this paper, we propose a novel GPU-based parallel algorithm, called cuPC, to accelerate an order-independent version of PC. The cuPC algorithm has two variants, cuPC-E and cuPC-S, which parallelize conditional independence tests over the pairs of variables under the tests, and over the conditional sets, respectively. In particular, cuPC-E offers two degrees of parallelization by performing tests of multiple pairs of variables and also the tests of each pair in parallel. In the other hand, cuPC-S reuses the results of computations of a test for a given conditional set in other tests on the same conditional set. Experiment results on GTX 1080 GPU show two to three orders of magnitude speedup. For instance, in one of the most challenging benchmarks, cuPC-S reduces the runtime from about 73 hours to about one minute and achieves a significant speedup factor of about 4000 X.