Trimmer : Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters

Borowiec, Damian and Yeung, Ging-Fung and Friday, Adrian and Harper, R.H.R. and Garraghan, Peter (2022) Trimmer : Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters. In: Proceedings - 2022 IEEE 15th International Conference on Cloud Computing, CLOUD 2022 :. IEEE, ESP, pp. 374-384. ISBN 9781665481380

[thumbnail of Trimmer - Borowiec (CLOUD 22)]

Text (Trimmer - Borowiec (CLOUD 22))
CLOUD22_Trimmer_Author_Accepted_Version.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.
Download (1MB)

Abstract

Cloud datacenters capable of provisioning high performance Machine Learning-as-a-Service (MLaaS) at reduced resource cost is achieved via auto-tuning: automated tensor program optimization of Deep Learning models to minimize inference latency within a hardware device. However given the extensive heterogeneity of Deep Learning models, libraries, and hardware devices, performing auto-tuning within Cloud datacenters incurs a significant time, compute resource, and energy cost of which state-of-the-art auto-tuning is not designed to mitigate. In this paper we propose Trimmer, a high performance and cost-efficient Deep Learning auto-tuning framework for Cloud datacenters. Trimmer maximizes DL model performance and tensor program cost-efficiency by preempting tensor program implementations exhibiting poor optimization improvement; and applying an ML-based filtering method to replace expensive low performing tensor programs to provide greater likelihood of selecting low latency tensor programs. Through an empirical study exploring the cost of DL model optimization techniques, our analysis indicates that 26-43% of total energy is expended on measuring tensor program implementations that do not positively contribute towards auto-tuning. Experiment results show that Trimmer achieves high auto-tuning cost-efficiency across different DL models, and reduces auto-tuning energy use by 21.8-40.9% for Cloud clusters whilst achieving DL model latency equivalent to state-of-the-art techniques.

Item Type:

Contribution in Book/Report/Proceedings

Additional Information:

©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Uncontrolled Keywords:

Data Sharing Template/yes

Subjects:

?? deep learningcloud datacentermlaasmachine learning systemsenergysustainable aiyes ??

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

170274

Deposited By:

ep_importer_pure

Deposited On:

12 Oct 2022 14:20

Refereed?:

Yes

Published?:

Published

Last Modified:

13 Dec 2025 13:41

URI:

https://eprints.lancs.ac.uk/id/eprint/170274

Altmetric