I am a Postdoctoral Associate in the Department of Electrical and Computer Engineering at Duke University, working with Prof. Yiran Chen and Prof. Hai (Helen) Li on hardware–software co-design and next-generation computing systems. My research focuses on computer architecture and accelerator design for efficient machine learning and data-intensive workloads, including sparsity- and quantization-aware architectures, as well as energy-efficient hardware for deep learning systems. I develop novel architectural techniques to bridge the gap between emerging algorithmic models and scalable hardware, with applications ranging from large-scale model acceleration to neural network processors and high-performance matrix multiplication architectures.
I received my Ph.D. degree in Computer Science and Engineering from Shanghai Jiao Tong University under the supervision of Prof. Jingwen Leng in September 2023. My research has been published at top-tier computer architecture conferences, including ISCA, MICRO, HPCA, and ASPLOS, with an up-to-date publication and citation record available on my Google Scholar .
🔥 News
- 2025.11: 🎉🎉 Two papers were accepted to HPCA 2026.
- 2025.10: 🎉 Nominated for the 2025 Outstanding Postdoc Award at Duke University.
- 2025.09: 🎉 One paper was accepted to ASP-DAC 2026.
- 2025.03: 🎉🎉🎉 Three papers were accepted to ISCA 2025.
- 2024.11: 🎉🎉🎉 Three papers were accepted to HPCA 2025.
- 2024.03: 🎉 I received the **2023 Shanghai Jiao Tong University Outstanding Doctoral Dissertation Award (15 recipients university-wide, <1% per year; 2023年度上海交通大学优秀博士学位论文,全校共15人).
- 2023.11: 🎉🎉 Two papers were accepted to ASPLOS 2024.
📝 Publications
Conference:
-
HPCA 2026Chiyue Wei=, Cong Guo=*, Junyao Zhang, Haoxuan Shan, Yifan Xu, Ziyue Zhang, Yudong Liu, Qinsi Wang, Changchun Zhou, Hai “Helen” Li, Yiran Chen; Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models. (=Equal Contribution, *Corresponding Author) -
HPCA 2026Yuzhe Fu, Changchun Zhou*, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo*, Hai “Helen’’ Li, Yiran Chen; FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing. (*Corresponding Author) -
ASP-DAC 2026Haoxuan Shan, Cong Guo*, Chiyue Wei, Feng Cheng, Junyao Zhang, Hai “Helen’’ Li, Yiran Chen; Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix Multiplication. (*Corresponding Author) ISCA 2025Cong Guo*, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai Li, Yiran Chen; Transitive Array: An Efficient GEMM Accelerator with Result Reuse. (*Corresponding Author)ISCA 2025Chiyue Wei, Bowen Duan, Cong Guo*, Jingyang Zhang, Qingyue Song, Hai Li, Yiran Chen; Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks. (*Corresponding Author)ISCA 2025Feng Cheng, Cong Guo*, Chiyue Wei, Junyao Zhang, Changchun Zhou, Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu, Hai Li, Yiran Chen; Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression. (*Corresponding Author)HPCA 2025Chiyue Wei, Cong Guo*, Feng Cheng, Shiyu Li, Hao Yang, Hai Li, Yiran Chen; Prosperity: Accelerating Spiking Neural Networks via Product Sparsity. (*Corresponding Author)HPCA 2025Zihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni, Yangjie Zhou, Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Jingwen Leng, Chen Jin; VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.HPCA 2025Weiming Hu, Haoyan Zhang, Cong Guo, Yu Feng, Renyang Guan, Zhendong Hua, Zihan Liu, Yue Guan, Minyi Guo, Jingwen Leng; MANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.ASPLOS 2024Cong Guo, Rui Zhang, Jiale Xu, Jingwen Leng, Zihan Liu, Ziyu Huang, Minyi Guo, Hao Wu, Shouren Zhao, Junping Zhao, Ke Zhang; GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching.ASPLOS 2024Zihan Liu, Wentao Ni, Jingwen Leng, Yu Feng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu; JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping.ISCA 2023Cong Guo, Jiaming Tang, Weiming Hu, Jingwen Leng, Chen Zhang, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu; OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization.-
MICRO 2022Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu; ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization.(2023 IEEE Micro Top Picks Honorable Mention)
ICLR 2022Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo; SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation.ICCD 2022Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo; Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training.MSN 2022Mustafa Tarik Sanic, Cong Guo, Jingwen Leng, Minyi Guo, Weiyin Ma; Towards Reliable AI Applications via Algorithm-Based Fault Tolerance on NVDLA. (Best Paper Award)CF 2022Yangjie Zhou, Yaoxu Song, Jingwen Leng, Zihan Liu, Weihao Cui, Zhendong Zhang, Cong Guo, Quan Chen, Li Li, Minyi Guo; AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs.IISWC 2021Yangjie Zhou, Mengtian Yang, Cong Guo, Jingwen Leng, Yun Liang, Quan Chen, Minyi Guo, Yuhao Zhu; Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators.ISCA 2021Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng; Dual-side Sparse Tensor Core,SC 2020Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu; Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity.DAC 2020Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo; Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration.CVPR 2019Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu; Adversarial Defense Through Network Profiling Based Path Extraction.
Journal:
IEEE Circuits and Systems Magazine 2025Cong Guo, Feng Cheng, Zhixu Du, James Kiessling, Jonathan Ku, Shiyu Li, Ziru Li, Mingyuan Ma, Tergel Molom-Ochir, Benjamin Morris, Haoxuan Shan, Jingwei Sun, Yitu Wang, Chiyue Wei, Xueying Wu, Yuhao Wu, Hao Frank Yang, Jingyang Zhang, Junyao Zhang, Qilin Zheng, Guanglei Zhou, Hai Li, Yiran Chen; A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models.IEEE Transactions on Computers 2025Chen Zhang, Yang Wang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng, Guangyu Sun, Zhigang Ji, Runsheng Wang, Yuan Xie, Ru Huang; DSTC: Dual-Side Sparse Tensor Core for DNNs Acceleration on Modern GPU Architectures.IEEE Transactions on Computers 2024Cong Guo, Fengchen Xue, Jingwen Leng, Yuxian Qiu, Yue Guan, Weihao Cui, Quan Chen, Minyi Guo; Accelerating Sparse DNNs Based on Tiled GEMM.
🏆 Honors and Awards
- 2025.10 Nominee for the 2025 Outstanding Postdoctoral Award (24 nominees university-wide), Duke University
- 2024.03 Outstanding Doctoral Dissertation Award (15 recipients university-wide), Shanghai Jiao Tong University
- 2023.07 IEEE Micro Top Picks from 2022 Computer Architecture Conferences Honorable Mention
- 2023.06 Outstanding Doctoral Graduates, Shanghai Jiao Tong University
- 2022.08 Excellent Ph.D. Scholarship of Yang Yuanqing Education Fund (Top-3/500+), Shanghai Jiao Tong University
- 2020.11 Ph.D. National Scholarship (Top-8/500+), Ministry of Education, PRC
- 2020.07 DAC2020 Richard Newton Young Student Fellow, Design Automation Conference
- 2018.11 VMware Scholarship, Shanghai Jiao Tong University
- 2017.11 National Second Prize, The 14th China Post-Graduate Mathematical Contest in Modeling
👔 Academic Service
Journal Reviewer
- ACM Transactions on Embedded Computing Systems (TECS)
- ACM Transactions on Architecture and Code Optimization (TACO)
- IEEE Computer Architecture Letters (CAL)
- IEEE Transactions on Computers (TC)
- IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
- IEEE Transactions on Circuits and Systems for Artificial Intelligence (TCASAI)
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI)
- Journal of Systems Architecture (JSA)
- Science China Information Sciences (SCIS)
Conference Service
- Technical Program Committee (TPC) Member, DAC 2026
💻 Experience
- 2023.12 - Now, Postdoctoral associate, Department of ECE, Duke University.
- 2021.06 - 2023.12, Research intern, Shanghai Qi Zhi Institute.
- 2023.04 - 2023.09, Reaserch intern, ANT Group (AliPay).
- 2020.06 - 2021.05, Research intern, Microsoft Research Asia (Beijing).
- 2019.05 - 2019.12, Intern, NVIDIA (Shanghai).
📖 Educations
- 2020.09 - 2023.09, Ph.D in Computer Science, Department of Computer Science and Engineering, Shanghai Jiao Tong University.
- 2017.09 - 2020.03, M.E. in Computer Technology, Department of Computer Science and Engineering, Shanghai Jiao Tong University.
- 2012.09 - 2016.06, B.S. in Computer Science, College of Computer Science and Software Engineering, Shenzhen University.