Device activity detection in the emerging cell-free massive multiple-input multiple-output (MIMO) systems has been recognized as a crucial task in machine-type communications, in which multiple access points (APs) jointly identify the active devices from a large number of potential devices based on the received signals. Most of the existing works addressing this problem rely on the impractical assumption that different active devices transmit signals synchronously. However, in practice, synchronization cannot be guaranteed due to the low-cost oscillators, which brings additional discontinuous and nonconvex constraints to the detection problem. To address this challenge, this paper reveals an equivalent reformulation to the asynchronous activity detection problem, which facilitates the development of a centralized algorithm and a distributed algorithm that satisfy the highly nonconvex constraints in a gentle fashion as the iteration number increases, so that the sequence generated by the proposed algorithms can get around bad stationary points. To reduce the capacity requirements of the fronthauls, we further design a communication-efficient accelerated distributed algorithm. Simulation results demonstrate that the proposed centralized and distributed algorithms outperform state-of-the-art approaches, and the proposed accelerated distributed algorithm achieves a close detection performance to that of the centralized algorithm but with a much smaller number of bits to be transmitted on the fronthaul links.