In recent years, attention transformers have proven to be instrumental in natural language processing (NLP)-based tasks such as sentence classification and language translation. However, their application has been recently extended to large-scale object recognition tasks. In this work, Vision Transformer with attention has been investigated for the detection of human falls and ADLs (Activities of Daily Living) from time series-based signals. The Vision Transformer model has been trained and validated using the acceleration signals of waist-worn Inertial Measurement Unit (IMU) sensors obtained from the IMU Falls dataset. The model is also trained and validated on the popular SiSFall dataset. The model is also investigated by independently training three different cases of patch size and attention heads. It was observed that a larger patch size resulted in significant performance deterioration. Additionally, a smaller patch size took longer to train and was computationally expensive. The model performed (best case) with an Accuracy (%) of 99.9 ± 0.1 and a True Positive Rate (%) of 99.9 ± 0.1 on the SFU-IMU dataset and with an Accuracy (%) of 99.8 ± 0.25 and a true positive rate (%) of 99.87 ± 0.3 on the SISFALL dataset. Overall, the results show that Transformers are highly robust in the detection of human falls and nonfalls/ADLs, subject to the appropriate patch size.