Deepfake Video Detection Using Convolutional Vision Transformer

Wodajo, Deressa

Deepfake Video Detection Using Convolutional Vision Transformer

dc.contributor.advisor	Atnafu, Solomon (PhD)
dc.contributor.author	Wodajo, Deressa
dc.date.accessioned	2020-12-21T06:46:44Z
dc.date.accessioned	2023-11-04T12:23:06Z
dc.date.available	2020-12-21T06:46:44Z
dc.date.available	2023-11-04T12:23:06Z
dc.date.issued	11/11/2020
dc.description.abstract	The rapid advancement of deep learning models that can generate and synthesis hyper-realistic videos known as Deepfakes and their ease of access to the general public have raised concern from all concerned bodies to their possible malicious intent use. Deep learning techniques can now generate faces, swap faces between two subjects in a video, alter facial expressions, change gender, and alter facial features, to list a few. These powerful video manipulation methods have potential use in many fields. However, they also pose a looming threat to everyone if used for harmful purposes such as identity theft, phishing, and scam. Therefore, it is important to tell whether a specific video is real or manipulated to deter and mitigate the risks posed by Deepfakes. Thus, in this thesis work, we present a system that detects whether a specific video is real or Deepfake. The proposed system has two components: the preprocessing and the detection component. The preprocessing prepares the video dataset for the detection stage. In the preprocessing, the face region is extracted in 224 x 224 RGB format. Data augmentation is applied to increase the dataset and also increase the accuracy of the model. For the detection component, we use a Convolutional Neural Network (CNN) and Vision Transformer (ViT). The CNN has only convolutional operations (without a fully connected layer), and its purpose is to extract learnable features. The ViT takes in the learned features as input and further encodes them for the final detection purposes. The proposed system is implemented using PyTorch, an open-source machine learning library. The DeepFake Detection Challenge Dataset (DFDC) was used to train, validate, and test the model. The DFDC dataset contains 119,154 videos created using publicly available video generation deep learning models. Our model was trained on 162,174 face images extracted from the video dataset. Ninety percent of the face images are augmented during training and validation. We tested the model on 400 unseen videos, and have achieved 91.5 percent accuracy, an AUC value of 0.91, and a loss value of 0.32. Our contribution is that we have added a CNN module to the ViT architecture and have achieved a competitive result on the DFDC dataset.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/24209
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Deep Learning	en_US
dc.subject	Deepfakes	en_US
dc.subject	Deepfakes Video Detection	en_US
dc.subject	Cnn	en_US
dc.subject	Transformer	en_US
dc.subject	Vision Transformer	en_US
dc.subject	Convolutional Vision Transformer	en_US
dc.subject	Gan	en_US
dc.title	Deepfake Video Detection Using Convolutional Vision Transformer	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Deressa Wodajo 2020.pdf
Size:: 2.95 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer Science