ASISys

Welcome to ASISys – an open-source organization dedicated to advancing system research and development in Artificial Super Intelligence (ASI). While ASI has not yet been fully realized, our vision is to create foundational systems and techniques that push the boundaries of current AI and lay the groundwork for the future emergence of ASI.

We focus on scalable, efficient, and adaptive AI systems that evolve over time, improving the efficacy and efficiency of both AI training and serving. Our work includes developing architectures, systems, algorithms, and tools that are essential for the transition from narrow AI to super intelligent systems.

Projects

Adrenaline

Accelerating LLM Inference in Prefill-Decoding Disaggregation with Attention Offloading

Publications

arXiv

Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation

Yunkai Liang^†, Zhangyu Chen^†, Pengfei Zuo, Zhi Zhou, Xu Chen, and Zhou Yu

arXiv preprint arXiv:2503.20552, 2025

Bib PDF Code

@article{Liang2025InjectingAI,
  title = {Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation},
  author = {Liang, Yunkai and Chen, Zhangyu and Zuo, Pengfei and Zhou, Zhi and Chen, Xu and Yu, Zhou},
  journal = {arXiv preprint arXiv:2503.20552},
  year = {2025},
}

arXiv

Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving

Qihui Zhou, Peiqi Yin, Pengfei Zuo, and James Cheng

arXiv preprint arXiv:2503.00392, 2025

Bib PDF

@article{Zhou2025ProgressiveSA,
  title = {Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving},
  author = {Zhou, Qihui and Yin, Peiqi and Zuo, Pengfei and Cheng, James},
  journal = {arXiv preprint arXiv:2503.00392},
  year = {2025},
}

AAAI

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference

Zhuomin He^†, Yizhen Yao^†, Pengfei Zuo, Bin Gao, Qinya Li, Zhenzhe Zheng, and Fan Wu

In Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025

Bib PDF Code

@inproceedings{He2025AdaSkipAS,
  title = {AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference},
  author = {He, Zhuomin and Yao, Yizhen and Zuo, Pengfei and Gao, Bin and Li, Qinya and Zheng, Zhenzhe and Wu, Fan},
  booktitle = {Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI)},
  year = {2025},
}

USENIX ATC

Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention

Bin Gao, Zhuomin He, Puru Sharma, Qingxuan Kang, Djordje Jevdjic, Junbo Deng, Xingkun Yang, Zhou Yu, and Pengfei Zuo

In Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC), 2024

Bib PDF Video

@inproceedings{Gao2024CostEfficientLL,
  title = {Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention},
  author = {Gao, Bin and He, Zhuomin and Sharma, Puru and Kang, Qingxuan and Jevdjic, Djordje and Deng, Junbo and Yang, Xingkun and Yu, Zhou and Zuo, Pengfei},
  booktitle = {Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC)},
  year = {2024},
}