We ask whether Adversariality in left-right stereo images can learn to estimate an optimal depth map through a consensus-based loss function and ego-motion. We describe the workflow as merging supervised learning (AS) and unsupervised learning (U) models. The supervised learning model optimizes a depth estimation network with prior knowledge of ground-truth depth maps. Inspired by the recent works on adversarial neural networks, we formulate the supervised model as an adversarial learning task. Thus we generate two depth maps from left-right stereo images, respectively. Based on their learning behavior–that is, the loss function, we take the then optimal depth map. On the contrary, the unsupervised learning module has no knowledge of the ground-truth depth yet optimizes the depth estimation network using 3D geometry. Our framework is trained and benchmarked on the KITTI driving dataset.