Violence Recognition on Videos Using Two-stream 3D CNN with Custom Spatiotemporal Crop

doi:10.21203/rs.3.rs-1947129/v2

Download PDF

Research Article

Violence Recognition on Videos Using Two-stream 3D CNN with Custom Spatiotemporal Crop

https://doi.org/10.21203/rs.3.rs-1947129/v2

This work is licensed under a CC BY 4.0 License

Journal Publication

published 05 Jul, 2023

Read the published version in Multimedia Tools and Applications →

Version 2

posted

You are reading this latest preprint version

Violence may happen anywhere. One of the ways to know and oversee

the violence in some places is by installing Closed-circuit Television

(CCTV). The recorded video captured by CCTV can be used as proof

in a law court. Violence video classification is also one of the topics

being discussed in deep learning. The latest violence video dataset is

RWF-2000. That dataset contains violent and non-violent videos, 5 seconds

duration, 30 frames per second, with the amount of 2000 videos.

That publication also has the best accuracy of 87.25% by their proposed

method. In this study, we will use a Residual Network known to

have the advantage of solving the vanishing gradient problem. Beside

that, we also implement transfer learning from Kinetics and Kinetics

+ Moments in Time pre-trained data. We also test the number

of frames and the location of the sampling frame range. RGB and

optical flow inputs are separately trained with different configurations.

The RGB input best accuracy is 89.25% with pre-trained Kinetics +

Moments in Time, using frame location 49-149. The optical flow input

best accuracy is 88.5% with pre-trained Kinetics, using 74 frames. We

also try to sum the output of both inputs making accuracy of 90.5%.

Theoretical Computer Science

Video Classification

Resnet

Transfer Learning

Freeze Layer

Download PDF

Journal Publication

published 05 Jul, 2023

Read the published version in Multimedia Tools and Applications →

Version 2

posted

You are reading this latest preprint version

Violence Recognition on Videos Using Two-stream 3D CNN with Custom Spatiotemporal Crop

Status:

Journal Publication

Version 2

Abstract

Full Text

Status:

Journal Publication

Version 2