In an article published in the open-access journal Sensors, researchers introduced a new-generation approach employing a one-class classification framework for assessing the authenticity of facial photos. Such a generation technique was developed to create a compatible and feasible structure with new data and other datasets rather than relying heavily on the dataset used for training.
Various filter enhancement techniques were selected as fundamental pseudo-image-generating techniques for data enhancement. The primary network was upgraded to a multi-channel convolutional neural network (MCCNN), allowing it to accept various preprocessed data separately, extract attention maps, and produce feature maps. The authors used weakly supervised learning (WSL) techniques to integrate attention cropping and dropping to the input as a first innovation in training the core network. Also, the authors trained the primary network in two steps as a second inventive technique.
In the first step, a binary classification loss function was employed to screen out fake facial characteristics produced by a well-established generative adversarial network (GAN). Then the various generative adversarial network (GAN) types or fake face-generating techniques were dealt with using a one-class classification loss function.
The study showed that the suggested approach increased cross-domain detection effectiveness while keeping source-domain accuracy. These findings offered a feasible way to increase the accuracy of responses when determining the authenticity of facial images, offering a significant theoretical and applied contribution.
Developments in the Generative Adversarial Network (GAN) Domain
Researchers have created numerous efficient ways in response to the quickly evolving fake image production technology. Various issues that have arisen during research and development have also been explored. The digital image forensics study aims to spot a fake image and prevent damage.
Most false face detection algorithms operate at the pixel level. Using a standard convolutional neural network (CNN) is a viable solution to this issue because binary classification is a crucial component of image analysis. In this technique, the image is first pre-processed with a high-pass filter, and the pre-processed data is then trained on a five-layer neural network. Despite being a basic CNN model, it performs well enough to identify homology-related false images. However, the method's fragility is a significant disadvantage.
The generative adversarial network (GAN) has significantly developed due to its use in forensic research. As a result of the ongoing evolution of the generative adversarial network (GAN), created virtual images are now increasingly difficult to distinguish with just the human eye. A somewhat unfavorable impact on society has recently been caused by disseminating fake photographs of various celebrities, avatars, and even politicians on multiple media channels, frequently without the audience's knowledge.
Researchers have conducted extensive related research on this network based on various contemporary technologies in the field of image recognition to build detection tools that would make the existing generative adversarial network (GAN) models more potent. These features included pixel points, manual feature extraction, frequency maps, neural networks, and co-occurrence matrices. Although there are now numerous efficient ways to find solutions, there is still an opportunity for development because of the issues with poor precision and challenging generalization.
The objective of this study was to create a technique for accurately determining the authenticity of a facial image under the premise of a one-class classification problem. Basic pseudo-image-generating techniques of Gaussian noise, Gaussian blur, and homomorphic filter enhancement were employed. The leading network used to receive various preprocessed data independently, produce feature maps and extract attention maps was an enhanced multi-channel convolutional neural network. Weakly supervised learning (WSL) data augmentation techniques provided attention cropping and dropping to the data.
The primary network was trained in two phases. It was trained using a binary classification loss function to filter standard fake facial features produced by well-known generative adversarial network (GAN) models. Subsequently, it was also trained using a one-class classification technique to deal with various generative adversarial network (GAN) models or unidentified fake face generation techniques.
Data augmentation was typically used to enhance the number of attributes and the variety of the dataset to increase the performance of deep learning algorithms and reduce the occurrence of over-fitting. Additionally, techniques for data augmentation, including cropping, scaling, flipping, and Gaussian noise, were used in the computer vision field.
These traditional methods of data augmentation approached the data augmentation mechanism at random. The weakly supervised data augmentation network (WS-DAN) proposed the notion that weakly supervised learning (WSL) could be used to produce an attention map that showed the salient elements of the target that needed to be detected during training. Later, this generated attention map was used in the training phase to target weakly supervised learning (WSL) data augmentation.
For issues with portrait detection, a unique multi-channel convolutional neural network (MCCNN) architecture based on LightCNN was created. The multi-channel convolutional neural network (MCCNN) design addressed the incapacity to train a deep architecture from the start when the dataset was small. Thus, a suggested multi-channel neural network that used a facial recognition model that was already trained was created.
The multi-channel convolutional neural network (MCCNN) served as a model for the proposed network's fundamental structure. In addition to the essential training steps of the original network, it also contained a data augmentation module based on weakly supervised learning (WSL) and a one-class classification module that considered two training phases. The first phase included a binary classification, while the second dealt with the one-class classification. There were four blocks in it.
In this paper, the authors addressed the problem of fake faces generated by generative adversarial network (GAN) models. They also introduced a weakly supervised learning (WSL) data augmentation mechanism for fine-grained classification to find the discrepancies between real and fake faces. Finally, a characteristic model of the actual faces produced by the generative adversarial network (GAN) was acquired, and a conventional decision based on probability was used.
Instead of employing a single machine cycle for training, the essence of the self-attention mechanism was to give machine deep learning a focus akin to that of humans. The application of a self-attention layer in natural language was initially proposed to enable the model to consider the complete meaning of a sentence rather than just the link between nearby words.
Prospects of the Study
This study suggested a one-class classification approach for identifying fake faces produced by a generative adversarial network (GAN). It targeted the issues of challenging cross-domain detection and high vulnerability in fake face detection. The main emphasis of the model was on the fine-grained extraction of existing features while avoiding extracting only local data features.
This study also discussed a technique to tell if a human face image was real, fake, or artificial. All the facial image data utilized here came from available data collections. The authors neither identified people from face images nor verified if an image was of a particular person. As a result, this study faced no ethical problems.
The suggested method augmented the data using the fundamental Gaussian noise, Gaussian blur, and homomorphic filter enhancement techniques to boost the model's resilience. A weakly supervised attention (WSA) technique was employed during the initial phase of feature extraction to improve the data by changing the critical areas of the model discriminator.
The resulting attention map was then used to remove the low-impact material. After collecting all the data, the channels of the Multi-Channel Convolutional Neural Network (MCCNN) were fed with the collected data. A self-attention layer was used to connect the pixels, which increased the recognition's precision.
Li, S., Dutta, V., He, X., Matsumaru, T. (2022) Deep Learning Based One-Class Detection System for Fake Faces Generated by GAN Network. Sensors, 22(20), 7767. https://www.mdpi.com/1424-8220/22/20/7767