Biography

I am Hongyu Wang (王鸿钰 in Chinese), a third-year Ph.D candidate at Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) under the supervision of Professor Xilin Chen. I received my B.Eng. degree from Department of Computer Science and Technology, University of Science and Technology of China (USTC). I was advised by associate researcher Chao Qian at USTC. I am a research intern under the supervision of Dr. Furu Wei and Shuming Ma at General Artificial Intelligence group (GenAI), MSR-Asia from Aug. 2021 to present.

I have great interest on the following topics:

  1. Scale efficiently! Efficient architecture for the large-scale foundation models
  2. Multimodal reasoning, robotics

Contact: why0711@mail.ustc.edu.cn

News:

  • [04/2025] BitNet v2, native 4-bit activations for 1-bit LLMs.
  • [04/2025] Introducing BitNet b1.58 2B4T, the first native 1-bit LLM trained at scale! Model weights and technical report are public! Cooking larger models now…
  • [11/2024] BitNet a4.8, enabling 4-bit activations for 1-bit LLMs. BitNet a4.8 has only 55% active parameters and further supports 3-bit KV cache without extra training.
  • [10/2024] bitnet.cpp, the official inference framework for BitNet b1.58! Run a 100B BitNet b1.58 model on a single CPU at a human reading speed!
  • [07/2024] Q-Sparse, the fully Sparsely-Activated LLM.
  • [04/2024] DeepNet is accepted as the regular paper by TPAMI 2024.
  • [03/2024] BitNet b1.58: Training Tips, Code and FAQ.
  • [02/2024] BitNet b1.58, the first ternary LLM that matches the performance of FP16 LLM with siginificant reduction of inference cost (latency, memory, throughput, and energy consumption)
  • [10/2023] BitNet, the first binary LLM that has competitive performance of FP16 LLM and SoTA 8-bit quantization methods
  • [05/2023] Magneto is accepted by ICML 2023
  • [11/2022] TorchScale: Transformers at Scale
  • [10/2022] Magneto, foundation Transformer, outperforms the de facto Transformer variants designed for various applications, including language modeling (i.e., BERT, and GPT), machine translation, vision pretraining (i.e., BEiT), speech recognition, and multimodal pretraining (i.e., BEiT-3).
  • [03/2022] DeepNet, Scaling Transformers to 1,000 Layers! Outperform M2M-100 by 5 BLEU point on the massive multilingual benchmarks.
  • [08/2021] Start my internship at MSR-Asia ~