In recent years, Attention transformers have proven to be instrumental in Natural Language Processing (NLP) based tasks like sentence classification, and language translation. However, their application has been recently extended to large-scale object recognition tasks. In this work, Vision Transformer with attention has been investigated for the detection of human falls and ADLs (Activities of Daily Living) from time series-based signals. The Vision Transformer model has been trained and validated using the accelerometer signals of waist-worn Inertial Measurement Unit (IMU) sensors obtained from the IMU Falls dataset. The model is also trained and validated on the popularly used SiSFall dataset. The model is also investigated by independently training on 3 different cases of patch size and attention heads. It is observed that a larger patch size has resulted in significant performance deterioration. Additionally, smaller patch size took longer to train and was computationally expensive. The model performed (best-case) with an Accuracy (%) of 99.9 ± 0.1 and a True Positive Rate (%) of 99.9 ± 0.1 on the SFU-IMU dataset and with an Accuracy (%) of 99.8 ± 0.25 and a True Positive Rate (%) of 99.87 ± 0.3 on the SISFALL dataset. Overall, the results show that Transformers are highly robust in the detection of human falls and non-falls/ADL subject to the appropriate patch size.