Zhang, Shuo and Ni, Qiang and Han, Jungong (2023) Deep Neural Network Compression with Filter Pruning. PhD thesis, Lancaster University.
Abstract
The rapid development of convolutional neural networks (CNNs) in computer vision tasks has inspired researchers to apply their potential to embedded or mobile devices. However, it typically requires a large amount of computation and memory footprint, limiting their deployment in those resource-limited systems. Therefore, how to compress complex networks while maintaining competitive performance has become the focus of attention in recent years. On the subject of network compression, filter pruning methods that achieve structured compact model by finding and removing redundant filters, have attracted widespread attention. Inspired by previous dedicated works, this thesis focuses on the way to obtain the compact model while maximizing the retention of the original model performance. In particular, aiming at the limitations of choosing filters on the existing popular pruning methods, several novel filter pruning strategies are proposed to find and remove redundant filters more accurately to reduce the performance loss of the model caused by pruning. For instance, the filter pruning method with an attention mechanism (Chapter 3), data-dependent filter pruning guided by LSTM (Chapter 4), and filter pruning with uniqueness mechanism in the frequency domain (Chapter 5). This thesis first addresses the filter pruning issue from a global perspective. To this end, we propose a new scheme, termed Pruning Filter with an Attention Mechanism (PFAM). That is, by establishing the dependency/relationship between filters at each layer, we explore the long-term dependence between filters via attention module in order to choose the tobe-pruned filters. Unlike prior approaches that identify the to-be-pruned filters simply based on their intrinsic properties, the less correlated filters are first pruned after the pruning step in the current training epoch and then reconstructed and updated during the subsequent training epoch. Thus, the compressed network model can be achieved without the requirement for a pre-trained model since input data can be manipulated with the maximum information maintained when the original training strategy is executed. Next, it is noticed that most existing pruning algorithms seek to prune the filter layer by layer. Specifically, they guide filter pruning at each layer by setting a global pruning rate, which indicates that each convolutional layer is treated equally without regard to its depth and width. In this situation, we argue that the convolutional layers in the network also have varying degrees of significance. Besides, we propose that selecting the appropriate layers for pruning is more reasonable since it can result in more complexity reduction with less performance loss by keeping and removing more filters in those critical and nonsignificant layers, respectively. In order to do this, long short-term memory (LSTM) is employed to learn the hierarchical properties of a network and to generalize a global network pruning scheme. On top of that, we present a data-dependent soft pruning strategy named Squeeze-Excitation-Pruning (SEP), which does not physically prune any filters but removes specific kernels involved in calculating forward and backward propagations based on the pruning scheme. Doing so can further decrease the model’s performance decline while achieving a deep model compression. Lastly, we transfer the concept of relationship from the filter level to the feature map level because the feature maps can reflect the comprehensive information of both input data and filters. Hence, we propose Filter Pruning with Uniqueness Mechanism in the Frequency Domain (FPUM) to serve as a guideline for the filter pruning strategy by generating the correlation between feature maps. Specifically, we first transfer features to the frequency domain by Discrete Cosine Transform (DCT). Then, for each feature map, we compute a uniqueness score, which measures its probability of being replaced by others. Doing so allows us to prune the filters corresponding to the low-uniqueness maps without significant performance degradation. In addition, our strategy is more resistant to noise than spatial methods, further enhancing the network’s compactness while maintaining performance, as the critical pruning clues are more concentrated following DCT.