I am a Postdoctoral Associate in the Department of Electrical and Computer Engineering at Duke University, working with Prof. Yiran Chen and Prof. Hai (Helen) Li. I received my Ph.D. in Computer Science from Shanghai Jiao Tong University in 2023, supervised by Prof. Jingwen Leng.


My research focuses on computer architecture, especially scalable hardware–software co-design for efficient AI systems. I have developed sparsity- and quantization-aware architectures for model compression and overall efficiency. Recently, I have been exploring architectural designs tailored for large language models (LLMs).


Over the past five years, I have published 14 papers at the four flagship computer architecture conferences (ISCA, MICRO, HPCA, and ASPLOS), among which 9 are first- or corresponding-author publications (ISCA ×4, HPCA ×3, ASPLOS ×1, MICRO ×1). My work has received an HPCA 2026 Best Paper Nomination and was selected as an IEEE 2022 Micro Top Pick (Honorable Mention). An up-to-date publication and citation record is available on my Google Scholar .

🔥 News

  • 2026.01:  🎉 One paper was accepted to ICLR 2026.
  • 2026.01:  🔥 Our HPCA 2026 paper (Focus) was nominated for Best Paper (one of four nominees, 4/119 accepted).
  • 2025.11:  🎉🎉 Two papers were accepted to HPCA 2026.
  • 2025.10:  🎉 Nominated for the 2025 Outstanding Postdoc Award at Duke University.
  • 2025.09:  🎉 One paper was accepted to ASP-DAC 2026.
  • 2025.03:  🎉🎉🎉 Three papers were accepted to ISCA 2025.
  • 2024.11:  🎉🎉🎉 Three papers were accepted to HPCA 2025.
  • 2024.03:  🎉 I received the 2023 Shanghai Jiao Tong University Outstanding Doctoral Dissertation Award
    (15 recipients university-wide, <1% per year; 2023年度上海交通大学优秀博士学位论文,全校共15人).
  • 2023.11:  🎉🎉 Two papers were accepted to ASPLOS 2024.

💻 Experience

  • 2023.12 - Now, Postdoctoral associate, Department of ECE, Duke University.
  • 2021.06 - 2023.12, Research intern, Shanghai Qi Zhi Institute.
  • 2023.04 - 2023.09, Reaserch intern, ANT Group (AliPay).
  • 2020.06 - 2021.05, Research intern, Microsoft Research Asia (Beijing).
  • 2019.05 - 2019.12, Intern, NVIDIA (Shanghai).

📝 Publications

Selected Publications

*: Corresponding Author; =: Equal Contribution

[1] ICLR 2026 Xinhua Chen=, Sitao Huang=, Cong Guo=*, Chiyue Wei, Yintao He, Jianyi Zhang, Hai “Helen” Li, Yiran Chen; DPad: Efficient Diffusion Language Models with Suffix Dropout. In International Conference on Learning Representations (ICLR), 2026.

[2] HPCA 2026 Chiyue Wei=, Cong Guo=*, Junyao Zhang, Haoxuan Shan, Yifan Xu, Ziyue Zhang, Yudong Liu, Qinsi Wang, Changchun Zhou, Hai “Helen” Li, Yiran Chen; Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models. In IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2026. (Best Paper Nomination)

[3] HPCA 2026 Yuzhe Fu, Changchun Zhou*, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo*, Hai “Helen’’ Li, Yiran Chen; FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing. In IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2026.

[4] ASP-DAC 2026 Haoxuan Shan, Cong Guo*, Chiyue Wei, Feng Cheng, Junyao Zhang, Hai “Helen’’ Li, Yiran Chen; Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix Multiplication. In Asia and South Pacific Design Automation Conference (ASP-DAC), 2026.

[5] ISCA 2025 Cong Guo*, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai Li, Yiran Chen; Transitive Array: An Efficient GEMM Accelerator with Result Reuse. In International Symposium on Computer Architecture (ISCA), 2025.

[6] ISCA 2025 Chiyue Wei, Bowen Duan, Cong Guo*, Jingyang Zhang, Qingyue Song, Hai Li, Yiran Chen; Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks. In International Symposium on Computer Architecture (ISCA), 2025.

[7] ISCA 2025 Feng Cheng, Cong Guo*, Chiyue Wei, Junyao Zhang, Changchun Zhou, Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu, Hai Li, Yiran Chen; Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression. In International Symposium on Computer Architecture (ISCA), 2025.

[8] HPCA 2025 Chiyue Wei, Cong Guo*, Feng Cheng, Shiyu Li, Hao Yang, Hai Li, Yiran Chen; Prosperity: Accelerating Spiking Neural Networks via Product Sparsity. In IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025.

[9] IEEE CAS Mag 2025 Cong Guo, Feng Cheng, Zhixu Du, James Kiessling, Jonathan Ku, Shiyu Li, Ziru Li, Mingyuan Ma, Tergel Molom-Ochir, Benjamin Morris, Haoxuan Shan, Jingwei Sun, Yitu Wang, Chiyue Wei, Xueying Wu, Yuhao Wu, Hao Frank Yang, Jingyang Zhang, Junyao Zhang, Qilin Zheng, Guanglei Zhou, Hai Li, Yiran Chen; A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models. In IEEE Circuits and Systems Magazine (CAS Mag), 2025.

[10] ASPLOS 2024 Cong Guo, Rui Zhang, Jiale Xu, Jingwen Leng, Zihan Liu, Ziyu Huang, Minyi Guo, Hao Wu, Shouren Zhao, Junping Zhao, Ke Zhang; GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024.

[11] IEEE TC 2024 Cong Guo, Fengchen Xue, Jingwen Leng, Yuxian Qiu, Yue Guan, Weihao Cui, Quan Chen, Minyi Guo; Accelerating Sparse DNNs Based on Tiled GEMM. In IEEE Transactions on Computers (TC), 2024.

[12] ISCA 2023 Cong Guo$^=$, Jiaming Tang$^=$, Weiming Hu, Jingwen Leng, Chen Zhang, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu; OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization. In International Symposium on Computer Architecture (ISCA), 2023.

[13] MICRO 2022 Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu; ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization. In IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022. (2022 IEEE Micro Top Picks Honorable Mention)

[14] ICLR 2022 Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo; SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation. In International Conference on Learning Representations (ICLR), 2022.

[15] ICCD 2022 Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo; Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training. In IEEE International Conference on Computer Design (ICCD), 2022.

[16] SC 2020 Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu; Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020.

[17] DAC 2020 Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo; Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration. In Design Automation Conference (DAC), 2020.

Collaborative Publications

[18] ASPLOS 2026 Weiming Hu, Zihan Zhang, Haoyan Zhang, Chen Zhang, Cong Guo, Yu Feng, Tianchi Hu, Guanglin Li, Guipeng Hu, Junsong Wang, Jingwen Leng; M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2026.

[19] SC 2025 Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Peng Chen, Mohamed Wahib, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Yun Lin, Jin Song Dong, Wenxi Zhu, Minwen Deng; A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2025.

[20] IEEE TCAD 2025 Yangjie Zhou, Zhihui Zhang, Shuwen Lu, Cong Guo, Jingwen Leng, Feng Zhang, Yufei Ma, Yun Liang, Minyi Guo; A Full-Stack Framework for GNN Acceleration via Partition-Compiler-Architecture Co-Design. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2025.

[21] HPCA 2025 Zihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni, Yangjie Zhou, Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Jingwen Leng, Chen Jin; VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference. In IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025.

[22] HPCA 2025 Weiming Hu, Haoyan Zhang, Cong Guo, Yu Feng, Renyang Guan, Zhendong Hua, Zihan Liu, Yue Guan, Minyi Guo, Jingwen Leng; MANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type. In IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025.

[23] ICCV 2025 Linshen Liu, Boyan Su, Junyue Jiang, Guanlin Wu, Cong Guo, Ceyu Xu, Hao Frank Yang; Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge. In International Conference on Computer Vision (ICCV), 2025.

[24] ASPLOS 2024 Zihan Liu, Wentao Ni, Jingwen Leng, Yu Feng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu; JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024.

[25] IEEE TC 2024 Chen Zhang, Yang Wang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng, Guangyu Sun, Zhigang Ji, Runsheng Wang, Yuan Xie, Ru Huang; DSTC: Dual-Side Sparsity Tensor Core for DNNs Acceleration on Modern GPU Architectures. In IEEE Transactions on Computers (TC), 2024.

[26] CF 2023 Yangjie Zhou, Yaoxu Song, Jingwen Leng, Zihan Liu, Weihao Cui, Zhendong Zhang, Cong Guo, Quan Chen, Li Li, Minyi Guo; AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs. In Computing Frontiers (CF), 2023.

[27] MSN 2022 Mustafa Tarik Sanic, Cong Guo, Jingwen Leng, Minyi Guo, Weiyin Ma; Towards Reliable AI Applications via Algorithm-Based Fault Tolerance on NVDLA. In International Conference on Mobility, Sensing and Networking (MSN), 2022. (Best Paper Award)

[28] IISWC 2021 Yangjie Zhou, Mengtian Yang, Cong Guo, Jingwen Leng, Yun Liang, Quan Chen, Minyi Guo, Yuhao Zhu; Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators. In IEEE International Symposium on Workload Characterization (IISWC), 2021.

[29] ISCA 2021 Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng; Dual-side Sparse Tensor Core. In International Symposium on Computer Architecture (ISCA), 2021.

[30] CVPR 2019 Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu; Adversarial Defense Through Network Profiling Based Path Extraction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

🏆 Honors and Awards

  • 2026.01 HPCA 2026 Best Paper Nomination
  • 2025.10 Nominee for the 2025 Outstanding Postdoctoral Award (24 nominees university-wide), Duke University
  • 2024.03 Outstanding Doctoral Dissertation Award (15 recipients university-wide), Shanghai Jiao Tong University
  • 2023.07 IEEE Micro Top Picks from 2022 Computer Architecture Conferences Honorable Mention
  • 2023.06 Outstanding Doctoral Graduates, Shanghai Jiao Tong University
  • 2022.08 Excellent Ph.D. Scholarship of Yang Yuanqing Education Fund (Top-3/500+), Shanghai Jiao Tong University
  • 2020.11 Ph.D. National Scholarship (Top-8/500+), Ministry of Education, PRC
  • 2020.07 DAC2020 Richard Newton Young Student Fellow, Design Automation Conference
  • 2018.11 VMware Scholarship, Shanghai Jiao Tong University
  • 2017.11 National Second Prize, The 14th China Post-Graduate Mathematical Contest in Modeling

👔 Academic Service

Journal Reviewer

  • ACM Transactions on Embedded Computing Systems (TECS)
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • IEEE Computer Architecture Letters (CAL)
  • IEEE Transactions on Computers (TC)
  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
  • IEEE Transactions on Circuits and Systems for Artificial Intelligence (TCASAI)
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI)
  • Journal of Systems Architecture (JSA)
  • Science China Information Sciences (SCIS)

Conference Service

  • Technical Program Committee (TPC) Member, DAC 2026

📖 Educations

  • 2020.09 - 2023.09, Ph.D in Computer Science, Department of Computer Science and Engineering, Shanghai Jiao Tong University.
  • 2017.09 - 2020.03, M.E. in Computer Technology, Department of Computer Science and Engineering, Shanghai Jiao Tong University.
  • 2012.09 - 2016.06, B.S. in Computer Science, College of Computer Science and Software Engineering, Shenzhen University.