Some 6G use cases include augmented reality and high-fidelity holograms, with this information flowing through the network. Hence, it is expected that 6G systems can feed machine learning algorithms with such context information to optimize communication performance. This paper focuses on the simulation of 6G MIMO systems that rely on a 3-D representation of the environment as captured by cameras and eventually other sensors. We present new and improved Raymobtime datasets, which consist of paired MIMO channels and multimodal data. We also discuss tradeoffs between speed and accuracy when generating channels via ray-tracing. We finally provide results of beam selection and channel estimation to assess the impact of the improvements in the ray-tracing simulation methodology.