Abstract
Chiplet-based architectures require efficient Through-Silicon Via (TSV) assignment to optimize interconnect performance and system integration. Unlike traditional 3D integrated circuits, heterogeneous chiplet systems demand coordination across dies with varying sizes and functionalities, creating exponentially complex solution spaces that challenge existing optimization methods.
This paper introduces KuanNet (Knowledge-Unified Attention Neural Network), a multi-agent reinforcement learning framework integrating Echo State Networks (ESN) with attention mechanisms for chiplet TSV assignment. The key innovation is a knowledge-unified architecture with temporal-static decomposition: temporal features shared across agents are processed through both ESN reservoirs and skip connections, while static features remain agent-private, enabling coordinated decisions with temporal memory and spatial awareness.
Building on multi-agent deep deterministic policy gradient (MADDPG) with K-head attention critics, KuanNet demonstrates superior optimization performance over the state-of-the-art baseline across standard benchmark circuits of varying scale and complexity. Ablation studies validate individual component contributions of the KuanNet architecture.
Keywords
Key Results
Evaluated on five industry-standard benchmarks (MCNC ami33, ami49; GSRC n100, n200, n300) across 3-tier and 4-tier configurations for both homogeneous 3D IC and heterogeneous chiplet topologies — 20 benchmark-design combinations in total.
Why it works — in one paragraph
Off-policy multi-agent RL with a shared replay buffer makes backpropagation-through-time (BPTT) infrastructurally impractical: each agent needs temporal reasoning, but you can't afford to store trajectories and run truncated BPTT across a MADDPG replay batch. KuanNet sidesteps the whole problem by using a fixed Echo State Network reservoir for the temporal pathway — only the linear readout is trained. This drops a temporal module into any feedforward-based multi-agent architecture with zero changes to the training loop, loss functions, or replay buffer, while still providing history-dependent reasoning that plain feedforward networks lack.
Method at a glance
Problem — what's being optimized
For each net crossing a die-to-die interface, routing distance is computed via a minimum-spanning-tree (MST) heuristic over the TSV positions and net pins. The objective sums MST wirelength contributions across every net and every interface; each agent perturbs its assigned TSV location to minimise the shared total.
Temporal-static decomposition
Observations are split into two streams:
- Temporal features (shared across agents): wirelength evolution, optimization-trajectory history. Processed through both an ESN reservoir and a skip connection, giving coordinated decisions with temporal memory.
- Static features (agent-private): local spatial position, die-boundary neighborhood. Keeps each agent's policy focused on its own chiplet interface without conflating spatial context across dissimilar dies.
Training backbone
Multi-Agent Deep Deterministic Policy Gradient (MADDPG) with K-head attention critics. 20,000 episodes × 50 steps per episode. Gumbel-Softmax exploration with temperature annealing from 3 → 0.01. PyTorch on Apple Silicon (M4, Metal Performance Shaders).
Action space
Each agent chooses between a local 8-neighborhood move or one of 3 randomly-sampled distant candidate locations — combining local refinement with global exploration. Sensitivity sweeps confirm 8-neighborhood × 3 distant candidates as the robust operating point.
Benchmarks & Setup
| Benchmark | Source | Blocks | Configurations |
|---|---|---|---|
| ami33 | MCNC | 33 | 3-tier / 4-tier 3D IC + heterogeneous chiplet |
| ami49 | MCNC | 49 | 3-tier / 4-tier 3D IC + heterogeneous chiplet |
| n100 | GSRC | 100 | 3-tier / 4-tier 3D IC + heterogeneous chiplet |
| n200 | GSRC | 200 | 3-tier / 4-tier 3D IC + heterogeneous chiplet |
| n300 | GSRC | 300 | 3-tier / 4-tier 3D IC + heterogeneous chiplet |
Initial floorplans generated with FlexPlanner. Initial TSV placement: greedy centroid-based allocation to the closest valid empty grid location, random fallback. Chiplet benchmarks available at github.com/xmwa/placement_datasets.
Downloads
How to cite
Plain text
Xiaomeng Wang, Zhen Zhou, and Yang Yi. "Finetune Chiplet Design Floorplan via KuanNet," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2026.
BibTeX
@article{wang2026kuannet,
title = {Finetune Chiplet Design Floorplan via {KuanNet}},
author = {Wang, Xiaomeng and Zhou, Zhen and Yi, Yang},
journal = {IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
year = {2026},
doi = {10.1109/TVLSI.2026.3681746},
url = {https://doi.org/10.1109/TVLSI.2026.3681746}
}
More from the authors
R2CTA: Reinforcement Learning and Reservoir Computing based Chiplets TSV Assignment · Paper
Xiaomeng Wang and Yang Yi. In Proc. 26th International Symposium on Quality Electronic Design (ISQED), pp. 1–7, IEEE, 2025. Project page →
Transforming AI Landscape with Neuromorphic Computing and Chiplets · Book chapter
Xiaomeng Wang, Zhen Zhou, and Yang Yi. In Energy-Efficient Devices and Circuits for Neuromorphic Computing, pp. 405–428, Elsevier, 2026.
Practical Tips for Machine Learning Research and Development · Blog
X. Wang. (2025). Practical Tips for Machine Learning Research and Development. [Online]. Available: blog.wangxm.com/2025/02/practical-tips-for-machine-learning-research-and-development/
Acknowledgments
This work was supported in part by the U.S. National Science Foundation (NSF) under Grants CCF-1750450, ECCS-1731928, ECCS-2128594, ECCS-2314813, and CCF-1937487.
The authors thank the BRICCS Lab at Virginia Tech for computational resources and technical discussions.
With thanks to lab members for discussions and feedback
This work is dedicated to my mother, YuKuan — the name of this framework carries hers. Thank you, Mom, for everything.