Fp-nas: Fast probabilistic neural architecture search

Z Yan, X Dai, P Zhang, Y Tian… - Proceedings of the …, 2021 - openaccess.thecvf.com
Proceedings of the IEEE/CVF Conference on Computer Vision and …, 2021openaccess.thecvf.com
Abstract Differential Neural Architecture Search (NAS) requires all layer choices to be held
in memory simultaneously; this limits the size of both search space and final architecture. In
contrast, Probabilistic NAS, such as PARSEC, learns a distribution over high-performing
architectures, and uses only as much memory as needed to train a single model.
Nevertheless, it needs to sample many architectures, making it computationally expensive
for searching in an extensive space. To solve these problems, we propose a sampling …
Abstract
Differential Neural Architecture Search (NAS) requires all layer choices to be held in memory simultaneously; this limits the size of both search space and final architecture. In contrast, Probabilistic NAS, such as PARSEC, learns a distribution over high-performing architectures, and uses only as much memory as needed to train a single model. Nevertheless, it needs to sample many architectures, making it computationally expensive for searching in an extensive space. To solve these problems, we propose a sampling method adaptive to the distribution entropy, drawing more samples to encourage explorations at the beginning, and reducing samples as learning proceeds. Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude. We call this method Fast Probabilistic NAS (FP-NAS). Compared with PARSEC, it can sample 64% fewer architectures and search 2.1 x faster. Compared with FBNetV2, FP-NAS is 1.9 x-3.5 x faster, and the searched models outperform FBNetV2 models on ImageNet. FP-NAS allows us to expand the giant FBNetV2 space to be wider (ie larger channel choices) and deeper (ie more blocks), while adding Split-Attention block and enabling the search over the number of splits. When searching a model of size 0.4 G FLOPS, FP-NAS is 132x faster than EfficientNet, and the searched FP-NAS-L0 model outperforms EfficientNet-B0 by 0.7% accuracy. Without using any architecture surrogate or scaling tricks, we directly search large models up to 1.0 G FLOPS. Our FP-NAS-L2 model with simple distillation outperforms BigNAS-XL with advanced in-place distillation by 0.7% accuracy using similar FLOPS.
openaccess.thecvf.com
以上显示的是最相近的搜索结果。 查看全部搜索结果