VISION TRANSFORMER-BASED FALL DETECTION: A SPATIAL TEMPORAL ATTENTION MECHANISM FOR ROBUST VIDEO ANALYSIS
Abstract
Background. Fall detection is a critical challenge in healthcare and elderly care, as delayed response often leads to severe injuries. With ageing populations, fall-related admissions continue to rise, increasing demands on automated monitoring. Approaches based on wearable devices or conventional classifiers produce frequent false alarms and show limited adaptability. Video-based systems offer broader coverage but still require models that capture posture and motion changes without handcrafted features. Vision Transformers, originally developed for image recognition, provide a promising alternative by leveraging self-attention to model complex dependencies across spatial and temporal dimensions.
Materials and Methods. A Vision Transformer framework was applied to model spatial and temporal patterns in human motion. Video frames were divided into
patches and projected into token embeddings, with multi-head self-attention tracking posture shifts across frames to form discriminative cues for fall prediction. Training was conducted on multiple public datasets with diverse backgrounds and subject body types. The model was compared with logistic regression and CNN baselines trained on identical data splits.
Results and Discussion. The Vision Transformer achieved 99.1% accuracy on the primary dataset and 97.9% on the UR Fall Detection Dataset, surpassing logistic regression, CNN, and LSTM baselines. It maintained higher precision and recall in indoor and outdoor scenes and reduced false alarm rates. Stable performance under rapid movement and variable lighting demonstrated robustness gains. Cross-dataset evaluation confirmed effective transfer of learned spatial-temporal representations to unseen environments.
Conclusion. Vision Transformers offer an effective approach for real-time, non-invasive fall detection in clinical and home settings. Their capacity to capture spatial-temporal motion patterns through self-attention, without handcrafted features, supports broader deployment in intelligent surveillance systems. The proposed framework demonstrates strong generalization across datasets and recording conditions. Future work will target edge-device optimization and multi-modal data integration.
Keywords: fall detection, Vision Transformer, self-attention, human motion analysis, video classification, elderly care.
Full Text:
PDFReferences
[1] Igual, R., Medrano, C., & Plaza, I. (2013). Challenges, issues and trends in fall detection systems. Biomedical Engineering Online, 12(1), 66. https://doi.org/10.1186/1475-925x-12-66
[2] Delahoz, Y. S., & Labrador, M. A. (2014). Survey on fall detection and fall prevention using wearable and external sensors. Sensors, 14(10), 19806–19842. https://doi.org/10.3390/s141019806
[3] Ebrahimi, F., Rousseau, J., & Meunier, J. (2025). Mobility anomaly detection with intelligent video surveillance. In L. Deligiannidis, F. G. Mohammadi, F. Shenavarmasouleh, S. Amirian, & H. R. Arabnia (Eds.), Image processing, computer vision, and pattern recognition and information and knowledge engineering (CCIS vol. 2262, pp. 189–202). Springer. https://doi.org/10.1007/978-3-031-85933-5_13
[4] Gupta, R., Valencia, X. P. B., Goyal, L. M., & Kumar, J. (2025). Ambient assisted living (AAL) technologies: Transitioning from healthcare 4.0 to healthcare 5.0. CRC Press. https://doi.org/10.1201/9781003520184
[5] Wagner, J., Mazurek, P., & Morawski, R. Z. (2022). Non-invasive monitoring of elderly persons: Systems based on impulse-radar sensors and depth sensors. Springer. https://doi.org/10.1007/978-3-030-96009-4
[6] Ahmad, I., Asghar, Z., Kumar, T., Li, G., Manzoor, A., Mikhaylov, K., Shah, S. A., Höyhtyä, M., Reponen, J., & Huusko, J. (2022). Emerging technologies for next generation remote health care and assisted living. IEEE Access, 10, 56094–56132. https://doi.org/10.36227/techrxiv.19382876
[7] Islam, M. M., Tayan, O., Islam, M. R., Islam, M. S., Nooruddin, S., Kabir, M. N., & Islam, M. R. (2020). Deep learning based systems developed for fall detection: A review. IEEE Access, 8, 166117–166137. https://doi.org/10.1109/access.2020.3021943
[8] Roy, D., Komini, V., & Girdzijauskas, S. (2023). Classifying falls using out-of-distribution detection in human activity recognition. AI Communications, 36(4), 251–267. https://doi.org/10.3233/aic-220205
[9] Mulo, J., Liang, H., Qian, M., Biswas, M., Rawal, B., Guo, Y., & Yu, W. (2025). Navigating challenges and harnessing opportunities: Deep learning applications in internet of medical things. Future Internet, 17(3), 107. https://doi.org/10.3390/fi17030107
[10] Fernandez-Bermejo, J., Martinez-Del-Rincon, J., Dorado, J., Toro, X. D., Santofimia, M. J., & Lopez, J. C. (2024). Edge computing transformers for fall detection in older adults. International Journal of Neural Systems, 34(05), 2450026. https://doi.org/10.1142/s0129065724500266
[11] Núñez-Marcos, A., & Arganda-Carreras, I. (2024). Transformer-based fall detection in videos. Engineering Applications of Artificial Intelligence, 32(2), 101–115. https://doi.org/10.1016/j.engappai.2024.107937
[12] Wang, X., Ellul, J., & Azzopardi, G. (2020). Elderly fall detection systems: A literature survey. Frontiers in Robotics and AI, 7, 71. https://doi.org/10.3389/frobt.2020.00071
[13] Rahman, N. N., Mahi, A. B. S., Mistry, D., Al Masud, S. M. R., Saha, A. K., Rahman, R., & Islam, M. R. (2025). FallVision: A benchmark video dataset for fall detection. Data in Brief, 59, 111440. https://doi.org/10.1016/j.dib.2025.111440
[14] Wang, X. (2024). EGOFALLS: A visual-audio dataset and benchmark for fall detection using egocentric cameras. In Pattern recognition. ICPR 2024. Lecture notes in computer science. Springer. https://doi.org/10.1007/978-3-031-78166-7_16
[15] Luo, Z., Jia, S., Niu, H., Zhao, Y., Zeng, X., & Dong, G. (2024). Elderly fall detection algorithm based on improved YOLOv5s. Information Technology and Control, 53(2), 601–618. https://doi.org/10.5755/j01.itc.53.2.36336
[16] Kaur, N., Rani, S., & Kaur, S. (2024). Real-time video surveillance-based human fall detection system using hybrid haar cascade classifier. Multimedia Tools and Applications, 83, 71599–71617. https://doi.org/10.1007/s11042-024-18305-w
[17] Wang, Y., & Deng, T. (2024). Enhancing elderly care: Efficient and reliable real-time fall detection algorithm. Digital Health, 10, 1–11. https://doi.org/10.1177/20552076241233690
[18] Fula, V., & Moreno, P. (2024). Wrist-based fall detection: Towards generalization across datasets. Sensors, 24, 1679. https://doi.org/10.3390/s24051679
[19] Cao, Y., Guo, M., Sun, J., Chen, X., & Qiu, J. (2024). Fall detection based on LCNN and fusion model of weights using human skeleton and optical flow. Signal, Image and Video Processing, 18, 833–841. https://doi.org/10.1007/s11760-023-02776-9
[20] Abro, I. A., & Jalal, A. (2024). Multi-modal sensors fusion for fall detection and action recognition in indoor environment. In 2024 3rd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE) (pp. 1–6). IEEE. https://doi.org/10.1109/etecte63967.2024.10823705
[21] Xu, Z., Liang, C., Wang, J., Ruan, L., Li, J., Dong, Y., Ding, W., & Song, J. (2024). LiFall: Passive indoor fall detection system based on illumination and visible light communication networks. In Photonics & Electromagnetics Research Symposium (pp. 1–10). https://doi.org/10.1109/piers62282.2024.10618192
[22] Piñeiro, M., Araya, D., Ruete, D., & Taramasco, C. (2024). Low-cost LIDAR-based monitoring system for fall detection. IEEE Access, 12, 72051–72065. https://doi.org/10.1109/access.2024.3401651
[23] Yang, X., Zhang, S., Ji, W., Song, Y., He, L., & Xue, H. (2024). SMA-GCN: A fall detection method based on spatio-temporal relationship. Multimedia Systems, 30, 90–105. https://doi.org/10.1007/s00530-024-01293-0
[24] Ha, T. V., Nguyen, H. M., Thanh, S. H., & Nguyen, B. T. (2024). Fall detection using mixtures of convolutional neural networks. Multimedia Tools and Applications, 83, 18091–18118. https://doi.org/10.1007/s11042-023-16214-y
[25] Gaya-Morey, F. X., Manresa-Yee, C., & Buades-Rubio, J. M. (2024). Deep learning for computer vision-based activity recognition and fall detection of the elderly: A systematic review. Applied Intelligence, 54, 8983–9000. https://doi.org/10.1007/s10489-024-05645-1
[26] Jiang, Z., Al-Qaness, M. A. A., Al-Alimi, D., Ewees, A. A., Abd Elaziz, M., Dahou, A., & Helmi, A. M. (2024). Fall detection systems for internet of medical things based on wearable sensors: A review. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2024.3421336
[27] Tang, J., He, B., Xu, J., Tan, T., Wang, Z., Zhou, Y., & Jiang, S. (2024). Synthetic IMU datasets and protocols can simplify fall detection experiments and optimize sensor configuration. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 32, 1233–1245. https://doi.org/10.1109/tnsre.2024.3370396
[28] Ursul, I., & Pereymybida, A. (2025). Source code for sensor-based fall detection dataset processing and analysis [Software]. GitHub. https://github.com/ivanursul/fall-detection-phd
[29] Huang, X., Li, X., Yuan, L., Jiang, Z., Jin, H., Wu, W., Cai, R., Zheng, M., & Bai, H. (2025). SDES-YOLO: A high-precision and lightweight model for fall detection in complex environments. Scientific Reports, 15, 2026. https://doi.org/10.1038/s41598-025-86593-9
[30] Ren, H., & Lan, P. (2025). BMR-YOLO: A deep learning approach for fall detection in complex environments. PLOS One, 20(11), e0335992. https://doi.org/10.1371/journal.pone.0335992
DOI: http://dx.doi.org/10.30970/eli.33.12
Refbacks
- There are currently no refbacks.

Electronics and information technologies / Електроніка та інформаційні технології