Abstract:Voice plays an important role in our lives by facilitating communication, conveying emotions, and indicating health. Therefore, tracking vocal interactions can provide valuable insight into many aspects of our lives. This paper presents our ongoing efforts to design a new vocal tracking system we call VoCopilot. VoCopilot is an end-to-end system centered around an energy-efficient acoustic hardware and firmware combined with advanced machine learning models. As a result, VoCopilot is able to continuously track conversations, record them, transcribe them, and then extract useful insights from them. By utilizing large language models, VoCopilot ensures the user can extract useful insights from recorded interactions without having to learn complex machine learning techniques. In order to protect the privacy of end users, VoCopilot uses a novel wake-up mechanism that only records conversations of end users. Additionally, all the rest of pipeline can be run on a commodity computer (Mac Mini M2). In this work, we show the effectiveness of VoCopilot in real-world environment for two use cases.