Auditory working memory is essential for various daily activities, such as language acquisition, conversation. It involves the temporary storage and manipulation of information that is no longer present in the environment. While extensively studied in neuroscience and cognitive science, research on its modeling within neural networks remains limited. To address this gap, we propose a general framework based on a close-loop predictive coding paradigm to perform short auditory signal memory tasks. The framework is evaluated on two widely used benchmark datasets for environmental sound and speech, demonstrating high semantic similarity across both datasets.