Hi, I’m Qinghui. I’m a Software Engineer based in Sydney, Australia, currently at Microsoft, where I focus on building LLM infrastructure and intelligent agents. My professional passion lies at the intersection of large-scale distributed systems and runtime performance optimization.
Before my current role, I was at Tencent Cloud, working as a core engineer for Serverless system node agents. My work centered on extreme performance tuning—achieving the goal of managing over 2,000 containers on a single bare-metal node with near-zero overhead and ultra-low latency.
Earlier in my career, I spent 3 years at Alibaba Cloud, where I was responsible for building internal multi-tenant Serverless infrastructure and maintaining the group’s core container engines. My work supported a wide range of internal business units and was instrumental in ensuring the stability and performance of the infrastructure during Double 11 (Global Shopping Festival), handling some of the world’s most intense traffic spikes.
With deep expertise in Kubernetes, containerd, and LLM inference runtime, I’ve spent much of my career conducting my work in the open-source community. I’m a firm believer in the power of open source, sharing knowledge, and constantly experimenting with the “bleeding edge” of technology.
I started this blog to document the practical insights and technical lessons I’ve learned from the trenches of cloud-native engineering and AI infrastructure. I’m always excited to connect and dive deep into system architecture—feel free to reach out!
Related Projects
- Kaito — Kubernetes AI Toolchain Operator for simplified LLM inference deployment
- VirtualCluster — A Kubernetes multi-tenancy solution enabling each tenant to run in a dedicated control plane
- PouchContainer — An efficient enterprise-grade container engine by Alibaba Cloud
- containerd — An industry-standard container runtime