In video coding, inter bi-prediction contributes to improve the coding efficiency significantly by producing precise fused prediction block. Although block-wise bi-prediction methods, such as bi-prediction with CU-level weight (BCW), are applied in Versatile Video Coding (VVC), linear fusion-based strategy is still difficult to represent diverse pixel variations inside a block. In addition, a pixel-wise bi-prediction method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, non-linear optical flow equation in BDOF mode is applied under assumptions so this method is yet unable to accurately compensate various kind of bi-prediction block. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing attention mechanism. Furthermore, the knowledge distillation (KD)-based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89% and 4.91% on Y component under random access (RA) and low delay B (LDB), respectively.