Enabling robots to work in close proximity with humans necessitates to employ not only multi-sensory information for coordinated and autonomous interactions but also a control framework that ensures adaptive and flexible collaborative behavior. Such a control framework needs to integrate accuracy and repeatability of robots with cognitive ability and adaptability of humans for co-manipulation. In this regard, an intuitive stack of tasks (iSOT) formulation is proposed, that defines the robots actions based on human ergonomics and task progress. The framework is augmented with visuo-tactile perception for flexible interaction and autonomous adaption. The visual information using depth cameras, monitors and estimates the object pose and human arm gesture while the tactile feedback provides exploration skills for maintaining the desired contact to avoid slippage. Experiments conducted on robot system with human partnership for assembly and disassembly tasks confirm the effectiveness and usability of proposed framework.