Publications

(2024). EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture (🔥📣New Paper & Project🔥📣! ). International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) in conjunction with (ESWEEK), RALEIGH, NC, USA, Sept. 29-Oct. 4, 2024. Also appears as part of the ESWEEK-TCAD Special Issue, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (IEEE TCAD).

PDF Slides

(2024). CHEF: A Framework for Deploying Heterogeneous Models on Clusters with Heterogeneous FPGAs (🔥📣New Paper & Project🔥📣! ). International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) in conjunction with (ESWEEK), RALEIGH, NC, USA, Sept. 29-Oct. 4, 2024. Also appears as part of the ESWEEK-TCAD Special Issue, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (IEEE TCAD).

PDF

(2024). Reducing Smart Phone Environmental Footprints with In-Memory Processing (🔥📣New Paper & Project🔥📣! ). International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) in conjunction with (ESWEEK), RALEIGH, NC, USA, Sept. 29-Oct. 4, 2024.

PDF

(2023). AIM: Accelerating Arbitrary-precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP (🔥📣New Paper & Project🔥📣! ). Proceedings of the 42nd IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2023, October 29, 2023 - November 2, 2023, San Francisco, CA, USA. Full Paper Accepted (acceptance ratio is 21 percent).

PDF Cite IEEE Code Slides Video

(2023). High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives (🔥📣New Paper & Project🔥📣! ). Proceedings of the 60th ACM/IEEE Design Automation Conference, San Francisco, California, USA, (DAC ’23), July 9–13, 2023, San Francisco, CA, USA. Full Paper Accepted (acceptance ratio is 23 percent).

PDF Cite IEEE Code Slides Video

(2020). Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit. 2020 IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 20).

PDF Cite

(2018). Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks (🔥Best Paper). IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 38, Issue: 11, Nov. 2019).

PDF Cite

(2018). Latte: Locality Aware Transformation for High-Level Synthesis. 2018 IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 18), short paper acceptance ratio: 7/48 = 14.6%.

PDF Cite Slides SlidesWithAudio Poster

(2018). ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA. 2018 IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 18), full paper acceptance ratio: 22/106 = 20.7%.

PDF Cite

(2017). Bandwidth Optimization Through On-Chip Memory Restructuring for HLS. 54th Annual Design Automation Conference (ACM DAC 17), acceptance rate: 161/676 = 24%.

PDF Cite

(2016). Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication. 24th IEEE International Symposium on Field-Programmable Custom Computing Machines (IEEE FCCM 16), acceptance rate: 32/133 = 24%.

PDF Cite Slides SlidesWithAudio Poster

(2016). ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architecture. 24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ACM/SIGDA FPGA 16).

Cite PDF Poster

(2014). A Fully Pipelined and Dynamically Composable Architecture of CGRA. 22nd IEEE International Symposium on Field-Programmable Custom Computing Machines (IEEE FCCM 14).

PDF Cite Slides