While most robotics simulation libraries are built for low-dimensional and intrinsically serial tasks, soft-body and multi-agent robotics have created a demand for simulation environments that can model many interacting bodies in parallel. Despite the increasing interest in these fields, no existing simulation library addresses the challenge of providing a unified, highly-parallelized, GPU-accelerated interface for simulating large robotic systems. Titan is a versatile CUDA-based C++ robotics simulation library that employs a novel asynchronous computing model for GPU-accelerated simulations of robotics primitives. The innovative GPU architecture design permits simultaneous optimization and control on the CPU while the GPU runs asynchronously, enabling rapid topology optimization and reinforcement learning iterations. Kinematics are solved with a massively parallel integration scheme that incorporates constraints and environmental forces. We report dramatically improved performance over CPU-based baselines, simulating as many as 300 million primitive updates per second, while allowing flexibility for a wide range of research applications. We present several applications of Titan to high-performance simulations of soft-body and multi-agent robots.