Positioning accuracy is a critical requirement for vehicle-to-everything (V2X) use cases. Therefore, this paper derives the theoretical limits of estimation for the position and orientation of vehicles in a cooperative vehicle-to-vehicle (V2V) scenario, using a lens-based multiple-input multiple-output (lens-MIMO) system. Following this, we analyze the Cram$\acute{\text{e}}$r-Rao lower bounds (CRLBs) of the position and orientation estimation and explore a received signal model of a lens-MIMO for the particular angle of arrival (AoA) estimation with a V2V geometric model. Further, we propose a lower complexity AoA estimation technique exploiting the unique characteristics of the lens-MIMO for a single target vehicle; as a result, its estimation scheme is effectively extended by the successive interference cancellation (SIC) method for multiple target vehicles. Given these AoAs, we investigate the lens-MIMO estimation capability for the positions and orientations of vehicles. Subsequently, we prove that the lens-MIMO outperforms a conventional uniform linear array (ULA) in a certain configuration of a lens's structure. Finally, we confirm that the proposed localization algorithm is superior to ULA's CRLB as the resolution of the lens increases in spite of the lower complexity.