In an article published in the journal Remote Sensing, researchers proposed a deep-separation-guided progressive reconstruction network that achieves accurate remote sensing image (RSI) segmentation.
Study: Deep-Separation Guided Progressive Reconstruction Network for Semantic Segmentation of Remote Sensing Images. Image Credit: Connect world/Shutterstock.com
The goal of semantic segmentation is to categorize the pixels in a picture. Semantic segmentation is essential for several remote sensing applications, including urban planning, land cover categorization, and scene comprehension. Semantic segmentation of remote sensing images (RSIs) is progressively adopting deep learning (DL) techniques due to the success of DL and the encouraging findings on several semantic segmentation benchmarks comprising real photos.
However, an RSI is much bigger than a normal, natural picture for computer vision applications since it comprises objects of various sizes and depicts complicated settings. Additionally, issues with multi-scale alterations might be made worse by the slanted viewpoint of RSIs during data collecting, which can cause scale fluctuations in objects collected at various distances.
Convolutional neural networks have ushered in a new age of computer vision with the ongoing development of DL. Most networks use encoder-decoder topologies for tasks like semantic segmentation.
Typical encoders include the transformer, ResNet, and VGG. These encoders can extract features, although their capacity is rather limited. Therefore, managing the decoder's ability to reconstruct the features is critical for network performance gains. Different architectures have been developed for decoding, although usually, bottom-up approaches are utilized following feature extraction. For example, UNet is a typical decoder and the foundation for many later-created networks.
Several studies have combined features with various scales after extraction to enhance feature reconstruction. These techniques include typical convolutional and upsampling layers in the decoder. Although it is easy to construct, this kind of decoder is ineffective. Furthermore, the encoder's features include various degrees of significance at various resolutions. In other words, present techniques are unable to benefit from these properties. Therefore, in the design model, it is essential to understand how to rebuild features effectively and collaborate.
Digital Surface Model (DSM)
Several previous studies proposed various methods for semantic segmentation. For instance, a DSM (Digital Surface Model) was used to handle quality variations across multimodality RSI datasets as auxiliary information to improve the segmentation performance of the model on single-modal data. To combine multimodal information and thoroughly explore the characteristics of various phases, another research developed the feature separation and aggregation models.
Novel Network Architecture DGPRNet
In this study, researchers suggest a deep separation module (DSEM)-based deep separation-guided progressive reconstruction network (DGPRNet) for the semantic segmentation of RSIs. They created a progressive reconstruction block (PRB) based on atrous spatial pyramid pooling (ASPP) with numerous convolutional layers integrating diverse receptive fields for remodeling properties at each resolution to enhance feature reconstruction.
The PRBs employ deconvolution to change the resolution, rising through each block until the input picture is solved, in contrast to other approaches reliant on upsampling to do so. Additionally, the proposed deep separation module (DSEM) analyzes semantic data such that pixels belonging to the same class are grouped, and the separation between pixels from different classes is maximized to improve the forward guidance of deep semantic features to shallow layers.
How the Study was Conducted
Improving Feature Reconstruction
A PRB based on ASPP was included in the decoder to improve feature reconstruction and lower error rates. The decoding output of each block was obtained by processing five features in parallel using several convolution layers with various ratios and then expanding the feature resolution by deconvolution.
To highlight semantic information and leverage deep semantic features, the suggested DSEM processed the last three semantic characteristics from the decoder. Multi-supervision was used to segment DGPRNet, which increased each module's capacity for reconstruction.
Potsdam and Vaihingen RSI Datasets
Semantic segmentation tests were conducted to assess the effectiveness of the proposed DGPRNet using the Potsdam and Vaihingen RSI datasets. 38 patches of 6000 x 6000 pixels made up the Potsdam dataset; 17 patches were used for training, and seven for testing. The Vaihingen dataset also included 33 pictures of 2494 x 2064 pixel resolution divided into 16 training patches and five testing patches.
They utilized the average intersection over union and the average pixel accuracy of each class as performance metrics for assessment. The intersection over union method was used between the prediction and target areas to acquire the ideal segmentation weight. The intersection over union was the primary indicator used to train and test various algorithms on the two RSI datasets.
To semantically partition remote sensing pictures, this study developed a novel network architecture called DGPRNet by investigating the connections between and within classes of deep features and reducing feature reconstruction loss in the decoder. Before decoding, nearby intermediate features were first supplemented to enhance the expression of multi-scale characteristics.
Improving Deep Learning and Segmentation of Remote Sensing Images - MondayThen, PRB was created and implemented at five stages in the decoder to capture specific characteristics from various receiving fields at various resolutions. This reduced error and maintained accuracy throughout reconstruction. Finally, to use deep features in recognizing objects of various sizes, the suggested DSEM discriminated and aggregated interclass and intraclass characteristics based on semantic features. According to experimental findings on two RSI datasets, DGPRNet beat 11 cutting-edge techniques, including the most recent semantic segmentation techniques.
Jiabao Ma, Wujie Zhou, Xiaohong Qian and Lu Yu (2022) Deep-Separation Guided Progressive Reconstruction Network for Semantic Segmentation of Remote Sensing Images. Remote Sensing. https://www.mdpi.com/2072-4292/14/21/5510