We propose Knowledge-Aware Preprocessing (KAP), a two-stage preprocessing framework tailored for Traditional Chinese non-narrative documents, designed to enhance retrieval accuracy in Hybrid Retrieval systems. Hybrid Retrieval, which integrates Sparse Retrieval (e.g., BM25) and Dense Retrieval (e.g., vector embeddings), has become a widely adopted approach for improving search effectiveness. However, its performance heavily depends on the quality of input text, which is often degraded when dealing with non-narrative documents such as PDFs containing financial statements, contractual clauses, and tables. KAP addresses these challenges by integrating Multimodal Large Language Models (MLLMs) with LLM-driven post-OCR processing, refining extracted text to reduce OCR noise, restore table structures, and optimize text format. By ensuring better compatibility with Hybrid Retrieval, KAP improves the accuracy of both Sparse and Dense Retrieval methods without modifying the retrieval architecture itself.