Deep reinforcement learning (DRL) has been widely studied in single agent learning but require further development and understanding in the multi-agent field. As one of the most complex swarming settings, competitive learning evaluates the performance of multiple teams of agents cooperating to achieve certain goals while surpassing the rest of group candidates. Such dynamical complexity makes the multi-agent problem hard to solve even for niche DRL methods. Within a competitive framework, we study state-of-the-art actor-critic and Q algorithms and analyze in depth their variants (e.g., prioritization, dual networks, etc.) in terms of performance and convergence. For completeness of discussion, we present and assess an asynchronous and prioritized version of proximal policy optimization actor-critic technique (P3O) against the other benchmarks. Results prove that Q-based approaches are more robust and reliable than actor-critic configurations for the given setting. In addition, we suggest incorporating local team communication and combining DRL with direct search optimization to improve learning, especially in challenging scenarios with partial observations.