I am a third-year PhD candidate at the Singapore University of Technology and Design (SUTD) (in collaboration with MIT & Zhejiang University), advised by Prof. Roy Ka-Wei Lee. My research focuses on long-form generation and long-context capabilities of LLMs, spanning data generation, chain-of-thought and planning, RL-based training and alignment, and end-to-end evaluation for long text, code, and reasoning.

I am currently a research intern at ByteDance Seed, working on large-scale pretraining data synthesis and Skill-Augmented Pretraining. Previously, I interned at Kimi (Moonshot AI) on Kimi-K2 long-context capabilities, and at Zhipu AI on the GLM-4.x series via long-form generation with RL.

My papers have received citations.

๐Ÿ’ผ I am actively seeking full-time positions. Feel free to reach out at mozhu621@gmail.com

๐Ÿ”ฅ News

  • 2026.01: ย ๐Ÿ† LongWriter-Zero accepted to ICLR 2026 as Oral (top ~1.8%) โ€” pure RL for ultra-long text generation without SFT!
  • 2025.12: ย ๐Ÿš€ Started as Algorithm Research Intern at ByteDance Seed, working on LLM pretraining data synthesis and Skill-Augmented Pretraining.
  • 2026: ย ๐Ÿ“„ Seed2.0 Model Card released โ€” ByteDance Seed.
  • 2026: ย ๐Ÿ“„ Kimi K2.5: Visual Agentic Intelligence released โ€” Moonshot AI.
  • 2025.06 โ€“ 2025.12: ย ๐Ÿš€ Algorithm Research Intern at Kimi (Moonshot AI), contributing to Kimi-K2.5 long-context capabilities.
  • 2025.01: ย ๐ŸŽ‰ LongGenBench accepted to ICLR 2025 main track.
  • 2024.09: ย ๐Ÿš€ Joined Zhipu AI as Algorithm Research Intern, contributing to the GLM-4.1/4.5 series.

๐Ÿ’ผ Experience

ByteDance Seed โ€” Algorithm Research Intern Dec. 2025 โ€“ Present ย ยทย  Beijing

  • Participated in LLM Pretrain-Level data synthesis research; designed high-quality pretraining data pipelines and explored the impact of large-scale synthetic data on model capabilities.
  • Researched Skill-Augmented Pretraining: built structured Skill Libraries and explored skill-data integration to boost model knowledge and capability expression.

Kimi (Moonshot AI) โ€” Algorithm Research Intern Jun. โ€“ Dec. 2025 ย ยทย  Beijing

  • Deeply involved in iterating Kimi-K2.5 for long-context capabilities, covering long-text/code synthetic data construction and Long Code Generation data pipelines.
  • Follow-up works include Kimi-Linear and Kimi-K2-Thinking.

Zhipu AI โ€” Algorithm Research Intern Sep. 2024 โ€“ Jun. 2025 ย ยทย  Beijing

  • Deeply contributed to the GLM-Zero series serving GLM-4.1/4.5; work covered long-chain CoT data construction, Reward Model design, RLHF alignment, and end-to-end benchmark evaluation.
  • Proposed SuperWriter: agent-guided hierarchical SFT + hierarchical DPO for long-form writing.
  • Proposed LongWriter-Zero: pure RL strategy for ultra-long text generation (ICLR 2026 Oral).

๐ŸŽ“ Education

  • Sep. 2023 โ€“ Present โ€ƒ Ph.D. in Natural Language Processing, Singapore University of Technology and Design (SUTD). Advisor: Prof. Roy Ka-Wei Lee.
  • Sep. 2024 โ€“ Jul. 2025 โ€ƒ Visiting Ph.D. Student, Tsinghua University (THU), Beijing.
  • Sep. 2018 โ€“ Jun. 2022 โ€ƒ B.Sc. in Mathematics, Huazhong Agricultural University (HZAU), Wuhan.

๐Ÿ“ Selected Publications

* denotes equal contribution ย |ย  For complete list see Google Scholar

Technology Reports

  • Co-author โ€” Kimi Linear: An Expressive, Efficient Attention Architecture. Moonshot AI, 2025.
  • Co-author โ€” GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models. Zhipu AI, 2025.
  • Co-author โ€” Kimi K2.5: Visual Agentic Intelligence. Moonshot AI, 2026.
  • Co-author โ€” Seed2.0 Model Card: Towards Intelligence Frontier for Real-World Complexity. ByteDance Seed, 2026.

Papers