Digital Event Horizon

Revolutionizing Large Language Model Training: The Power of Hub Buckets and Delta Weight Sync

In a major breakthrough, Hugging Face has developed a novel method for training large language models without sharing clusters or VPNs. By leveraging Hub Buckets and Delta Weight Sync protocols, researchers can now train models more efficiently and effectively, paving the way for future advancements in natural language processing and artificial intelligence.

Researchers at Hugging Face developed a novel method for shipping trillion parameters between a trainer and rollout server without sharing the same cluster, RDMA, or VPN.

The innovative approach uses Hugging Face's Hub Bucket system to store and transfer weights, reducing the weight transfer payload from 1.2 GB to 20-35 MB per step.

The key to this success lies in the Delta Weight Sync (DWS) protocol, which encodes only changed elements of the weights as a sparse safetensors file.

The DWS protocol enables efficient and secure weight transfer between the trainer and rollout server by shipping only delta values.

Results from a fully disaggregated training experiment on three separate machines show impressive reductions in training time, with per-step payload decreasing to 20-35 MB.

In a groundbreaking development, researchers at Hugging Face have successfully implemented a novel method for shipping trillion parameters between a trainer and a rollout server without sharing the same cluster, RDMA, or VPN. This achievement marks a significant milestone in the field of large language model training, allowing for faster, more efficient, and more scalable training processes.

The innovative approach involves using Hugging Face's Hub Bucket system to store and transfer weights between the trainer and rollout server. By leveraging the hub bucket architecture, the researchers were able to significantly reduce the size of the weight transfer payload, from 1.2 GB to just 20-35 MB per step. This reduction in payload size has a direct impact on the overall cost and efficiency of large language model training.

The key to this success lies in the Delta Weight Sync (DWS) protocol, which encodes only the changed elements of the weights as a sparse safetensors file. This approach takes advantage of the fact that approximately 98% of weights do not change between consecutive optimizer steps. By shipping only the delta values, the researchers were able to significantly reduce the amount of data transferred and processed during training.

The DWS protocol is made possible by Hugging Face's implementation of a specialized weight transfer engine called DeltaWeightTransferEngine. This extension class allows vLLM to seamlessly integrate with the hub bucket system, enabling efficient and secure weight transfer between the trainer and rollout server.

To demonstrate the effectiveness of this approach, the researchers ran a fully disaggregated training experiment on three separate machines, none of which shared a network connection. The results showed that the proposed method achieved impressive reductions in training time, with the per-step payload decreasing from 1.2 GB to just 20-35 MB.

This achievement marks an exciting breakthrough in large language model training and has significant implications for the field. By leveraging Hugging Face's Hub Bucket system and Delta Weight Sync protocol, researchers can now train larger models more efficiently and effectively, paving the way for further advancements in natural language processing and artificial intelligence.

Related Information:

https://www.digitaleventhorizon.com/articles/Revolutionizing-Large-Language-Model-Training-The-Power-of-Hub-Buckets-and-Delta-Weight-Sync-deh.shtml

https://huggingface.co/blog/delta-weight-sync

Published: Wed May 27 09:13:54 2026 by llama3.2 3B Q4_K_M

Today's AI/ML headlines are brought to you by ThreatPerspective

Revolutionizing Large Language Model Training: The Power of Hub Buckets and Delta Weight Sync