KuanNet — Multi-Agent RL with Echo State Networks for Chiplet TSV Assignment

KuanNet

Knowledge-Unified Attention Neural Network — Multi-Agent Reinforcement Learning with Echo State Networks for Chiplet TSV Assignment

Published — IEEE TVLSI 2026

Xiaomeng Wang^1,*, Zhen Zhou², Yang Yi¹

¹Bradley Dept. of ECE & Institute for Advanced Computing, Virginia Tech · ²Intel Corporation, Chandler, AZ · ^*Corresponding author

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2026

Paper PDF IEEE Xplore BibTeX Code & Data Contact

Venue

IEEE TVLSI 2026

Accepted

30 March 2026

DOI

10.1109/TVLSI.2026.3681746

Code

github.com/xmwa

Abstract

Chiplet-based architectures require efficient Through-Silicon Via (TSV) assignment to optimize interconnect performance and system integration. Unlike traditional 3D integrated circuits, heterogeneous chiplet systems demand coordination across dies with varying sizes and functionalities, creating exponentially complex solution spaces that challenge existing optimization methods.

This paper introduces KuanNet (Knowledge-Unified Attention Neural Network), a multi-agent reinforcement learning framework integrating Echo State Networks (ESN) with attention mechanisms for chiplet TSV assignment. The key innovation is a knowledge-unified architecture with temporal-static decomposition: temporal features shared across agents are processed through both ESN reservoirs and skip connections, while static features remain agent-private, enabling coordinated decisions with temporal memory and spatial awareness.

Building on multi-agent deep deterministic policy gradient (MADDPG) with K-head attention critics, KuanNet demonstrates superior optimization performance over the state-of-the-art baseline across standard benchmark circuits of varying scale and complexity. Ablation studies validate individual component contributions of the KuanNet architecture.

Keywords

Chiplet design Heterogeneous integration Through-silicon via (TSV) TSV assignment Placement optimization Multi-agent reinforcement learning Echo state network Attention mechanism

Key Results

Evaluated on five industry-standard benchmarks (MCNC ami33, ami49; GSRC n100, n200, n300) across 3-tier and 4-tier configurations for both homogeneous 3D IC and heterogeneous chiplet topologies — 20 benchmark-design combinations in total.

20×–223×

Larger wirelength reductions vs. state-of-the-art ATT-TA baseline on 3-tier 3D IC configurations (average: 76×).

12×–266×

Larger wirelength reductions on 4-tier configurations (average: 82×).

4–6×

Fewer trainable temporal parameters than LSTM / GRU alternatives — fixed ESN reservoirs require only a linear readout.

Rank 1.50

ESN wins best overall average rank across four design configurations against LSTM, GRU, state-history concatenation, and EMA baselines.

Why it works — in one paragraph

Off-policy multi-agent RL with a shared replay buffer makes backpropagation-through-time (BPTT) infrastructurally impractical: each agent needs temporal reasoning, but you can't afford to store trajectories and run truncated BPTT across a MADDPG replay batch. KuanNet sidesteps the whole problem by using a fixed Echo State Network reservoir for the temporal pathway — only the linear readout is trained. This drops a temporal module into any feedforward-based multi-agent architecture with zero changes to the training loop, loss functions, or replay buffer, while still providing history-dependent reasoning that plain feedforward networks lack.

Method at a glance

Problem — what's being optimized

For each net crossing a die-to-die interface, routing distance is computed via a minimum-spanning-tree (MST) heuristic over the TSV positions and net pins. The objective sums MST wirelength contributions across every net and every interface; each agent perturbs its assigned TSV location to minimise the shared total.

Minimum spanning tree for TSV routing distance computation — Figure 1. Minimum spanning tree (MST) for TSV routing distance computation — 35 TSV locations connected via MST structure (red edges) to minimize total wirelength. [PDF]

Temporal-static decomposition

Observations are split into two streams:

Temporal features (shared across agents): wirelength evolution, optimization-trajectory history. Processed through both an ESN reservoir and a skip connection, giving coordinated decisions with temporal memory.
Static features (agent-private): local spatial position, die-boundary neighborhood. Keeps each agent's policy focused on its own chiplet interface without conflating spatial context across dissimilar dies.

Figure 2. Knowledge-unified neural network architecture. Temporal features connect to the ESN reservoir via curved pathways (left) while also providing skip connections to the readout layer. The ESN reservoir state, temporal features, and static features are unified through concatenation before the final readout transformation. [PDF]

Training backbone

Multi-Agent Deep Deterministic Policy Gradient (MADDPG) with K-head attention critics. 20,000 episodes × 50 steps per episode. Gumbel-Softmax exploration with temperature annealing from 3 → 0.01. PyTorch on Apple Silicon (M4, Metal Performance Shaders).

KuanNet multi-agent architecture with attention — Figure 3. KuanNet multi-agent architecture with knowledge-unified processing. Each agent maintains an actor–critic pair with ESN + MLP dual-pathway input layers. The attention mechanism (zoomed view) processes all agents' actions for coordination. [PDF]

Action space

Each agent chooses between a local 8-neighborhood move or one of 3 randomly-sampled distant candidate locations — combining local refinement with global exploration. Sensitivity sweeps confirm 8-neighborhood × 3 distant candidates as the robust operating point.

Benchmarks & Setup

Benchmark	Source	Blocks	Configurations
ami33	MCNC	33	3-tier / 4-tier 3D IC + heterogeneous chiplet
ami49	MCNC	49	3-tier / 4-tier 3D IC + heterogeneous chiplet
n100	GSRC	100	3-tier / 4-tier 3D IC + heterogeneous chiplet
n200	GSRC	200	3-tier / 4-tier 3D IC + heterogeneous chiplet
n300	GSRC	300	3-tier / 4-tier 3D IC + heterogeneous chiplet

Initial floorplans generated with FlexPlanner. Initial TSV placement: greedy centroid-based allocation to the closest valid empty grid location, random fallback. Chiplet benchmarks available at github.com/xmwa/placement_datasets.

3D IC vs Chiplet architecture comparison on GSRC n300 — Figure 5. Structural comparison of 3D IC (left) vs. Chiplet (right) architectures using the GSRC n300 benchmark. The 3D IC stacks four identical dies vertically, while the Chiplet architecture features a horizontally-split top layer (5 total chiplets). Coloured rectangles represent modules; white regions show empty spaces for TSV placement. [PDF]

Downloads

Paper

IEEE Xplore →

Paper PDF

Full text (PDF)

Benchmarks

placement_datasets

Code

Coming soon

Slides

Coming soon

Awesome list

awesome-chiplets

Reproducibility. Experiments were run on Apple Mac mini M4 (16 GB unified memory), Python 3.12, PyTorch with the Metal Performance Shaders (MPS) backend, and Hydra for configuration management. Training uses 20,000 episodes of 50 steps each with a 1M-entry replay buffer.

How to cite

Plain text

Xiaomeng Wang, Zhen Zhou, and Yang Yi. "Finetune Chiplet Design Floorplan via KuanNet," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2026.

BibTeX

@article{wang2026kuannet,
  title   = {Finetune Chiplet Design Floorplan via {KuanNet}},
  author  = {Wang, Xiaomeng and Zhou, Zhen and Yi, Yang},
  journal = {IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
  year    = {2026},
  doi     = {10.1109/TVLSI.2026.3681746},
  url     = {https://doi.org/10.1109/TVLSI.2026.3681746}
}

Authors

Xiaomeng Wang ORCID 0000-0001-8822-003X

Ph.D. Candidate, Bradley Department of ECE & Institute for Advanced Computing, Virginia Tech. Research focus: ML-driven chiplet / 3D-IC physical design, reservoir computing for EDA. Corresponding author. Contact: scholar@wangxm.com · www.wangxm.com · github.com/xmwa

Zhen Zhou ORCID 0000-0002-3014-8167 · Senior Member, IEEE

Intel Corporation, Chandler, AZ.

Yang Yi ORCID 0000-0002-1354-0204 · Senior Member, IEEE

Professor, Bradley Department of ECE & Institute for Advanced Computing, Virginia Tech. BRICCS Lab.

More from the authors

R2CTA: Reinforcement Learning and Reservoir Computing based Chiplets TSV Assignment · Paper

Xiaomeng Wang and Yang Yi. In Proc. 26th International Symposium on Quality Electronic Design (ISQED), pp. 1–7, IEEE, 2025. Project page →

Transforming AI Landscape with Neuromorphic Computing and Chiplets · Book chapter

Xiaomeng Wang, Zhen Zhou, and Yang Yi. In Energy-Efficient Devices and Circuits for Neuromorphic Computing, pp. 405–428, Elsevier, 2026.

Practical Tips for Machine Learning Research and Development · Blog

X. Wang. (2025). Practical Tips for Machine Learning Research and Development. [Online]. Available: blog.wangxm.com/2025/02/practical-tips-for-machine-learning-research-and-development/

Acknowledgments

This work was supported in part by the U.S. National Science Foundation (NSF) under Grants CCF-1750450, ECCS-1731928, ECCS-2128594, ECCS-2314813, and CCF-1937487.

The authors thank the BRICCS Lab at Virginia Tech for computational resources and technical discussions.

With thanks to lab members for discussions and feedback

Alberta Dadeboe

Meizi Song

Zeyuan Hou

Md Rubel Sarkar

This work is dedicated to my mother, YuKuan — the name of this framework carries hers. Thank you, Mom, for everything.