We present a novel reinforcement learning (RL) environment designed to both optimize industrial sorting systems and study agent behavior in evolving spaces. In simulating material flow within a sorting process our environment follows the idea of a digital twin, with operational parameters like belt speed and occupancy level. To reflect real-world challenges, we integrate common upgrades to industrial setups, like new sensors or advanced machinery. It thus includes two variants: a basic version focusing on discrete belt speed adjustments and an advanced version introducing multiple sorting modes and enhanced material composition observations. We detail the observation spaces, state update mechanisms, and reward functions for both environments. We further evaluate the efficiency of common RL algorithms like Proximal Policy Optimization (PPO), Deep-Q-Networks (DQN), and Advantage Actor Critic (A2C) in comparison to a classical rule-based agent (RBA). This framework not only aids in optimizing industrial processes but also provides a foundation for studying agent behavior and transferability in evolving environments, offering insights into model performance and practical implications for real-world RL applications.