Preprint / Version 1

Adaptive Neural Architecture Search for Task-Specific Model Compression in Edge AI Applications

Authors

  • Madhuri Margam Capital One, United States

Keywords:

Neural Architecture Search, Model Compression, Edge AI, Lightweight Models, Inference Latency, Adaptive Optimization

Abstract

The increasing demand for real-time artificial intelligence on edge devices has accelerated the need for compact yet highly performant deep learning models. Traditional model compression techniques such as pruning, quantization, and knowledge distillation, although useful, often fail to generalize across various tasks and edge hardware configurations. Neural Architecture Search (NAS) has emerged as a promising alternative by automating the design of efficient architectures. In this paper, we propose an Adaptive NAS framework specifically tailored for task-specific model compression in Edge AI applications.

The core contribution of this study lies in integrating task profiling into the NAS process to generate lightweight architectures optimized for specific deployment environments. We evaluate our method using real-world edge devices, such as Jetson Nano and Raspberry Pi 4, across classification and detection tasks. Experimental results show that our approach consistently outperforms hand-designed lightweight models in terms of accuracy, latency, and energy efficiency. The proposed framework facilitates a scalable and intelligent method for edge-centric AI deployment.

References

[1] Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. NIPS, 28, 1135–1143.

[2] Howard, A. G., et al. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

[3] Sandler, M., et al. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. CVPR, 4510–4520.

[4] He, K., et al. (2016). Deep residual learning for image recognition. CVPR, 770–778.

[5] Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. ICML, 6105–6114.

[6] Zoph, B., & Le, Q. V. (2016). Neural architecture search with reinforcement learning. ICLR.

[7] Gajula, S. (2024). Adaptive zero trust architecture for securing financial microservices. Computer Fraud & Security, 2024(12), 643–655. https://doi.org/10.52710/CFS.845

[8] Liu, H., et al. (2018). DARTS: Differentiable architecture search. ICLR.

[9] Iandola, F. N., et al. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters. arXiv preprint arXiv:1602.07360.

[10] Wu, J., et al. (2019). FBNet: Hardware-aware efficient convnet design via differentiable NAS. CVPR, 10734–10742.

[11] Cai, H., et al. (2019). ProxylessNAS: Direct NAS on target task and hardware. ICLR.

[12] Elsken, T., et al. (2019). Neural architecture search: A survey. JMLR, 20(55), 1–21.

[13] Li, H., et al. (2017). Pruning filters for efficient convnets. ICLR.

[14] Yang, T., et al. (2018). NetAdapt: Platform-aware neural network adaptation for mobile applications. ECCV, 285–300.

[15] Gajula, S. (2024). Cybersecurity risk prediction using graph neural networks. Journal of Information Systems Engineering and Management, 9(4), 3301–3315. https://doi.org/10.52783/JISEM.V9I4S.13885

[16] Choi, Y., et al. (2018). Channel pruning for accelerating very deep neural networks. CVPR, 2730–2738.

[17] Zhou, A., et al. (2019). Heuristic NAS: Efficient neural architecture search with deep reinforcement learning. AAAI, 5666–5673.

Posted

2025-06-12