Massive Machine-Type Communications (mMTC) features a massive number of low-cost user equipments (UEs) with sparse activity. Tailor-made for these features, grant-free random access (GF-RA) serves as an efficient access solution for mMTC. However, most existing GF-RA schemes rely on strict synchronization, which incurs excessive coordination burden for the low-cost UEs. In this work, we propose a receiver design for asynchronous GF-RA, and address the joint user activity detection (UAD) and channel estimation (CE) problem in the presence of asynchronization-induced inter-symbol interference. Specifically, the delay profile is exploited at the receiver to distinguish different UEs. However, a sample correlation problem in this receiver design impedes the factorization of the joint likelihood function, which complicates the UAD and CE problem. To address this correlation problem, we design a partially uni-directional (PUD) factor graph representation for the joint likelihood function. Building on this PUD factor graph, we further propose a PUD message passing based sparse Bayesian learning (SBL) algorithm for asynchronous UAD and CE (PUDMP-SBL-aUADCE). Our theoretical analysis shows that the PUDMP-SBL-aUADCE algorithm exhibits higher signal-to-interference-and-noise ratio (SINR) in the asynchronous case than in the synchronous case, i.e., the proposed receiver design can exploit asynchronization to suppress multi-user interference. In addition, considering potential timing error from the low-cost UEs, we investigate the impacts of imperfect delay profile, and reveal the advantages of adopting the SBL method in this case. Finally, extensive simulation results are provided to demonstrate the performance of the PUDMP-SBL-aUADCE algorithm.