Biography

I am Hongyu Wang (王鸿钰 in Chinese), a second-year Ph.D candidate at VIPL group, Chinese Academy of Sciences (CAS) under the supervision of Professor Xilin Chen and Professor Ruiping Wang. I received my B.Eng. degree from Department of Computer Science and Technology, University of Science and Technology of China (USTC). I was advised by associate researcher Chao Qian at USTC. I was a research intern under the supervision of Dr. Furu Wei and Shuming Ma at Natural Language Computing group, MSR-Asia from Aug. 2021 to Jan. 2023.

I have great interest on the following topics:

Scale efficiently! Efficient architecture for the large-scale foundation models
Expert-level multimodal reasoning and understanding

Contact: why0711@mail.ustc.edu.cn

News:

[04/2024] DeepNet is accepted as the regular paper by TPAMI 2024.
[03/2024] BitNet b1.58: Training Tips, Code and FAQ.
[02/2024] BitNet b1.58, the first ternary LLM that matches the performance of FP16 LLM with siginificant reduction of inference cost (latency, memory, throughput, and energy consumption)
[10/2023] BitNet, the first binary LLM that has competitive performance of FP16 LLM and SoTA 8-bit quantization methods
[05/2023] Magneto is accepted by ICML 2023
[11/2022] TorchScale: Transformers at Scale
[10/2022] Magneto, foundation Transformer, outperforms the de facto Transformer variants designed for various applications, including language modeling (i.e., BERT, and GPT), machine translation, vision pretraining (i.e., BEiT), speech recognition, and multimodal pretraining (i.e., BEiT-3).
[03/2022] DeepNet, Scaling Transformers to 1,000 Layers! Outperform M2M-100 by 5 BLEU point on the massive multilingual benchmarks.
[08/2021] Start my internship at MSR-Asia ~