Beyond diagonal reconfigurable intelligent surface (BD-RIS) is a new advance and generalization of the RIS technique. BD-RIS breaks through the isolation between RIS elements by creatively introducing inter-element connections, thereby enabling smarter wave manipulation and enlarging coverage. However, exploring proper channel estimation schemes suitable for BD-RIS aided communication systems still remains an open problem. In this paper, we study channel estimation and beamforming design for BD-RIS aided multi-antenna systems. We first describe the channel estimation strategy based on the least square (LS) method, derive the mean square error (MSE) of the LS estimation, and formulate the joint pilot sequence and BD-RIS design problem with unique constraints induced by BD-RIS architectures. Specifically, we propose an efficient pilot sequence and BD-RIS design which theoretically guarantees to achieve the minimum MSE. With the estimated channel, we then consider two BD-RIS scenarios and propose beamforming design algorithms. Finally, we provide simulation results to verify the effectiveness of the proposed channel estimation scheme and beamforming design algorithms. We also show that more interelement connections in BD-RIS improves the performance while increasing the training overhead for channel estimation.