Deepfake Video Detection Using Convolutional Vision Transformer

dc.contributor.advisorAtnafu, Solomon (PhD)
dc.contributor.authorWodajo, Deressa
dc.date.accessioned2020-12-21T06:46:44Z
dc.date.accessioned2023-11-04T12:23:06Z
dc.date.available2020-12-21T06:46:44Z
dc.date.available2023-11-04T12:23:06Z
dc.date.issued11/11/2020
dc.description.abstractThe rapid advancement of deep learning models that can generate and synthesis hyper-realistic videos known as Deepfakes and their ease of access to the general public have raised concern from all concerned bodies to their possible malicious intent use. Deep learning techniques can now generate faces, swap faces between two subjects in a video, alter facial expressions, change gender, and alter facial features, to list a few. These powerful video manipulation methods have potential use in many fields. However, they also pose a looming threat to everyone if used for harmful purposes such as identity theft, phishing, and scam. Therefore, it is important to tell whether a specific video is real or manipulated to deter and mitigate the risks posed by Deepfakes. Thus, in this thesis work, we present a system that detects whether a specific video is real or Deepfake. The proposed system has two components: the preprocessing and the detection component. The preprocessing prepares the video dataset for the detection stage. In the preprocessing, the face region is extracted in 224 x 224 RGB format. Data augmentation is applied to increase the dataset and also increase the accuracy of the model. For the detection component, we use a Convolutional Neural Network (CNN) and Vision Transformer (ViT). The CNN has only convolutional operations (without a fully connected layer), and its purpose is to extract learnable features. The ViT takes in the learned features as input and further encodes them for the final detection purposes. The proposed system is implemented using PyTorch, an open-source machine learning library. The DeepFake Detection Challenge Dataset (DFDC) was used to train, validate, and test the model. The DFDC dataset contains 119,154 videos created using publicly available video generation deep learning models. Our model was trained on 162,174 face images extracted from the video dataset. Ninety percent of the face images are augmented during training and validation. We tested the model on 400 unseen videos, and have achieved 91.5 percent accuracy, an AUC value of 0.91, and a loss value of 0.32. Our contribution is that we have added a CNN module to the ViT architecture and have achieved a competitive result on the DFDC dataset.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/24209
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectDeep Learningen_US
dc.subjectDeepfakesen_US
dc.subjectDeepfakes Video Detectionen_US
dc.subjectCnnen_US
dc.subjectTransformeren_US
dc.subjectVision Transformeren_US
dc.subjectConvolutional Vision Transformeren_US
dc.subjectGanen_US
dc.titleDeepfake Video Detection Using Convolutional Vision Transformeren_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Deressa Wodajo 2020.pdf
Size:
2.95 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections