AI for Infrastructure · AIOps · Failure Diagnosis · Root Cause Analysis · Time Series Analysis
Google Scholar · GitHub · Personal Website · Email: li_zeyan [at] icloud.com
I am currently an R&D engineer at ByteDance, working on AI for Infrastructure and AIOps. My work focuses on intelligent alerting, log intelligence, failure diagnosis, root cause analysis, and large-model-based agents for production infrastructure systems.
I received my Ph.D. in Computer Science from Tsinghua University in 2023, advised by Prof. Dan Pei. My doctoral research focused on AIOps, failure management, anomaly detection, and root cause analysis for large-scale online service systems.
Before my Ph.D. study, I received my Bachelor’s degree in Computer Science from Tsinghua University in 2018.
My research has been published in conferences and journals including ESEC/FSE, WWW, KDD, ISSRE, INFOCOM, SIGMOD Conference Companion, PVLDB, and ICSE.
我目前任职于字节跳动,从事 AI for Infrastructure 与 AIOps 相关研发工作,主要关注智能告警、日志智能化、故障诊断、根因定位,以及面向生产系统的大模型 Agent。
我于 2023 年获得清华大学计算机科学与技术博士学位,导师为裴丹教授。博士期间的研究方向包括 AIOps、故障管理、异常检测以及大规模在线服务系统中的根因分析。
在攻读博士学位之前,我于 2018 年获得清华大学计算机科学与技术学士学位。
我的研究成果发表于 ESEC/FSE、WWW、KDD、ISSRE、INFOCOM、SIGMOD、PVLDB 和 ICSE 等会议与期刊。
算法工程师,AI for Infrastructure / AIOps
2023 年 6 月至今
从事面向大规模生产基础设施系统的 AIOps 算法研发与工程落地。近期工作覆盖智能告警、日志解析、自动化故障诊断,以及大模型 Agent 在基础设施运维场景中的应用。
代表性方向包括:
算法工程师实习生
2019 年 1 月 – 2022 年 6 月
参与银行信息系统中的根因服务定位与故障诊断相关研究和系统建设。
计算机科学与技术博士
2018 年 8 月 – 2023 年 6 月
导师:裴丹教授
研究方向:智能运维、故障诊断、根因定位、异常检测
计算机科学与技术学士
2014 年 8 月 – 2018 年 7 月
Zeyan Li, Jie Song, Tieying Zhang, Tao Yang, Xiongjun Ou, Yingjie Ye, Pengfei Duan, Muchen Lin, and Jianjun Chen.
Adaptive and Efficient Log Parsing as a Cloud Service.
SIGMOD Conference Companion, 2025.
Zeyan Li, Nengwen Zhao, Mingjie Li, Xianglin Lu, Lixin Wang, Dongdong Chang, Xiaohui Nie, Li Cao, Wenchi Zhang, Kaixin Sui, Yanhua Wang, Xu Du, Guoqiang Duan, and Dan Pei.
Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems.
ESEC/FSE, 2022.
Zeyan Li, Junjie Chen, Rui Jiao, Nengwen Zhao, Zhijun Wang, Shuwei Zhang, Yanjun Wu, Long Jiang, Leiqin Yan, Zikai Wang, Zhekang Chen, Wenchi Zhang, Xiaohui Nie, Kaixin Sui, and Dan Pei.
Practical Root Cause Localization for Microservice Systems via Trace Analysis.
IWQoS, 2021.
Zeyan Li, Chengyang Luo, Yiwei Zhao, Yongqian Sun, Kaixin Sui, Xiping Wang, Dapeng Liu, Xing Jin, Qi Wang, and Dan Pei.
Generic and Robust Localization of Multi-dimensional Root Causes.
ISSRE, 2019.
Zeyan Li, Wenxiao Chen, and Dan Pei.
Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder.
IPCCC, 2018.
Zhe Xie, Zeyan Li, Xiao He, Shenglin Zhang, Longlong Xu, Yuzhuo Yang, Tieying Zhang, Jianjun Chen, Rui Shi, and Dan Pei.
FoundRoot: Towards Foundation Model for Root Cause Analysis via Structured Deep Thinking.
ICSE, 2026.
Zhe Xie, Zeyan Li, Xiao He, Longlong Xu, Xidao Wen, Tieying Zhang, Jianjun Chen, Rui Shi, and Dan Pei.
ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning.
Proceedings of the VLDB Endowment, 2025.
Changhua Pei, Zexin Wang, Fengrui Liu, Zeyan Li, Yang Liu, Xiao He, Rong Kang, Tieying Zhang, Jianjun Chen, Jianhui Li, Gaogang Xie, and Dan Pei.
Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis.
WWW Companion, 2025.
Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, and Dan Pei.
Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition.
KDD, 2022.
Nengwen Zhao, Honglin Wang, Zeyan Li, Xiao Peng, Gang Wang, Zhu Pan, Yong Wu, Zhen Feng, Xidao Wen, Wenchi Zhang, Kaixin Sui, and Dan Pei.
An Empirical Investigation of Practical Log Anomaly Detection for Online Service Systems.
ESEC/FSE, 2021.
Qingyang Yu, Nengwen Zhao, Mingjie Li, Zeyan Li, Honglin Wang, Wenchi Zhang, Kaixin Sui, and Dan Pei.
A Survey on Intelligent Management of Alerts and Incidents in IT Services.
Journal of Network and Computer Applications, 2024.
Zhenhe Yao, Haowei Ye, Changhua Pei, Guang Cheng, Guangpei Wang, Zhiwei Liu, Hongwei Chen, Hang Cui, Zeyan Li, Jianhui Li, and Gaogang Xie.
SparseRCA: Unsupervised Root Cause Analysis in Sparse Microservice Testing Traces.
ISSRE, 2024.
Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, and Honglin Qiao.
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications.
WWW, 2018.
Xianglin Lu, Zhe Xie, Zeyan Li, Mingjie Li, Xiaohui Nie, Nengwen Zhao, Qingyang Yu, Shenglin Zhang, Kaixin Sui, Lin Zhu, and Dan Pei.
Generic and Robust Performance Diagnosis via Causal Inference for OLTP Database Systems.
CCGrid, 2022.
Ruming Tang, Zheng Yang, Zeyan Li, Weibin Meng, Haixin Wang, Qi Li, Yongqian Sun, Dan Pei, Tao Wei, Yanfei Xu, and Yan Liu.
ZeroWall: Detecting Zero-Day Web Attacks through Encoder-Decoder Recurrent Neural Networks.
INFOCOM, 2020.
Wenxiao Chen, Haowen Xu, Zeyan Li, Dan Pei, Jie Chen, Honglin Qiao, Yang Feng, and Zhaogang Wang.
Unsupervised Anomaly Detection for Intricate KPIs via Adversarial Training of VAE.
INFOCOM, 2019.
CN202110622067. 裴丹,李则言。
调用链异常检测方法、计算机设备以及可读存储介质。
CN202110319752. 裴丹,李则言。
基于条件变分自动编码器的 KPI 异常检测方法和装置。
CN202010727337. 李则言,张文池,程博,黄成,陈哲康,沈梦家,隋楷心,刘大鹏。
一种故障定位方法、装置、电子设备及存储介质。