Design and Emulation Methodology for Atomic-Scale Systolic Arrays: An LLM Accelerator Case Study in Silicon DB Logic

Published in IEEE International Conference on Nanotechnology (IEEE NANO), 2026

LLMs increasingly strain memory bandwidth and compute resources as CMOS scaling plateaus. Emerging technologies such as atomic-scale computing with silicon dangling bonds (DBs) promise ultra-dense, low-power logic, yet application-level validation still lacks an executable, clock-driven hardware emulation framework. To address this gap, this work introduces a cross-layer flow that compiles register-transfer level (RTL) Verilog to a clock-driven, Verilator-based emulator exposed to Python via a co-simulation hardware abstraction layer (HAL). DB-aware RTL rules formalized in this work ensure representative emulation across the full systolic array, while allowing the same RTL to drive logic synthesis through fiction, a technology-specific EDA toolkit, to yield dot-accurate DB layouts. As a representative use case, a ternary DB matrix multiply unit (MXU) is designed in Verilog to target BitNet b1.58 acceleration, achieving up to 34× area reduction compared to prior DB MXUs and generating LLM tokens under cycle-accurate software emulation while matching GPU-baseline outputs. This bridges layout-centric studies and workload-driven evaluation, enabling reproducible, cross-layer accelerator design for this emerging technology.

Download here