Off-Chip Memory Allocation for Neural Processing Units

Описание

Тип публикации: статья из журнала

Год издания: 2024

Идентификатор DOI: 10.1109/access.2024.3352900

Аннотация: Many modern Systems-on-Chip (SoCs) are equipped with specialized Machine Learning (ML) accelerators that use both on-chip and off-chip memory to execute neural networks. While on-chip memory usually has a hard limit, off-chip memory is often considered large enough to hold the network's inputs, outputs, weights, and any intermediatПоказать полностьюe results that may occur during model execution. This assumption may not hold for edge devices, such as smartphones, which usually have a limit on the amount of memory a process can use. In this study, we propose a novel approach for minimizing a neural network's off-chip memory usage by introducing a tile-aware allocator capable of reusing memory occupied by parts of a tensor before the entire tensor expires. We describe the necessary conditions for such an off-chip memory allocation approach and provide the results, showing that it can save up to 33% of the peak off-chip memory usage in some common network architectures.

Ссылки на полный текст

Издание

Журнал: IEEE Access

Выпуск журнала: Т. 12

Номера страниц: 9931-9939

ISSN журнала: 21693536

Издатель: Institute of Electrical and Electronics Engineers Inc.

Персоны

  • Kvochko Andrey
  • Maltsev Evgenii
  • Balyshev Artem
  • Malakhov Stanislav
  • Efimov Alexander

Вхождение в базы данных