Authors:  Brian Hsiao 11 and Derek Chiou2,∗

JSA-Vol. 4 (2025),

1 University of Illinois at Urbana-Champaign, Urbana, IL, United States.

2 The University of Texas at Austin, Austin, TX, United States

* Correspondence: derek@utexas.edu

Received: 9 June 2024; Accepted: 13 April 2025; Published: 10 June 2025.

Abstract: Field-programmable gate arrays (FPGAs) offer compelling advantages in performance-per-watt and architectural flexibility, yet their adoption remains limited by long development times and lack of software programmability. GPU overlays have emerged as a promising solution to reduce time-to-solution by enabling software-like programming models on reconfigurable hardware. Recent work on domain-specialized GPU overlays, such as PDL-FGPU, demonstrates that specialization can significantly improve performance for persistent deep learning workloads. However, static specialization introduces performance degradation for non-target workloads and limits portability across application domains. This paper proposes an adaptive multi-domain GPU overlay architecture that supports runtime-configurable macro-functional units, enabling efficient execution across diverse computational domains without sacrificing programmability or development productivity. The proposed architecture integrates domain-aware macro-unit selection, compiler-assisted kernel classification, and lightweight runtime reconfiguration mechanisms. Through architectural analysis, compiler design considerations, and representative case studies, this work demonstrates how adaptive overlays can bridge the gap between performance specialization and general-purpose flexibility. The proposed approach advances FPGA overlay design toward practical, production-ready accelerators for heterogeneous workloads.

Keywords: FPGA overlays, GPU overlay, adaptive accelerators, macro-functional units, runtime reconfiguration, heterogeneous computing

Leave A Comment

All fields marked with an asterisk (*) are required