Observational studies are often used to understand relationships between exposures and outcomes. When analyzing such data, propensity score (PS) and balance weighting are commonly used techniques that aim to reduce the imbalances between exposure groups by weighting the groups to look alike on the observed confounders. There is now a plethora of available methods to estimate PS and balancing weights as well as rich guidance on how to properly employ these in an analysis. In such studies, unbiased and robust estimation of the causal treatment effect is not guaranteed unless several conditions hold. The literature has shown that accurate inference requires these key criteria: 1) the treatment allocation mechanism is known (e.g., no unobserved confounders), 2) the relationship between the baseline covariates and the outcome is known (if one is to rely on the outcome model itself), 3) adequate balance (comparability) of baseline covariates is achieved post-weighting, 4) a proper set of covariates to control for confounding bias is known, and 5) a large enough sample size is available. In this article, we use simulated data of various sizes to investigate the influence of these five criteria on statistical inference. We have two notable new findings to help improve practice. First, our findings provide important new evidence that the maximum Kolmogorov-Smirnov statistic is the proper statistical measure to assess balance on the baseline covariates, in contrast to the mean standardised mean difference used in many applications, and 0.1 is a suitable threshold to consider as acceptable balance. Second, we also find a clear recommendation that 60 − 80 observations, per confounder per treatment group, are required to obtain a reliable and unbiased estimation of the causal treatment effect.