Publication

Joint Compression of Multi-model Systems for Edge Devices

Mathieu Salzmann
2025
Conference Papers

Abstract

Deploying deep learning models for computer vision (CV) often requires a careful trade-off between performance, efficiency, and cost. In the context of edge inference of machine learning systems, this trade-off translates into a challenge, as computing and memory constraints imposed by the hardware severely limit the size and architecture of implementable models. These constraints are exacerbated in scenarios where a single edge device must concurrently host several vision models running in series or in parallel. In this work, we benchmark model compression strategies for the joint deployment of multiple CV models in two steps. First, we consider the problem of detecting human faces in unfavorable imaging conditions as a prototypical CV task requiring the concurrent implementation of multiple image restoration and detection models. Second, we evaluate the performance of pruning and quantization techniques for model compression in the context of our prototypical restoration and detection multi-model system, and propose Joint Multi-Model Compression (JMMC), an adaptation of Quantization Aware Training (QAT) and pruning techniques in which the multi-model system is fine-tuned as a single unit with an adapted loss function.

Official source

https://infoscience.epfl.ch/entities/publication/9debf6c2-30a5-4fbe-8529-bcf267d9de79

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.