Digital Event Horizon

Revolutionizing AI Inference: Together AI's Groundbreaking Advancements on NVIDIA Blackwell

Together AI's groundbreaking advancements on NVIDIA Blackwell have set a new standard for AI inference efficiency, showcasing the power of collaborative research and innovative technology.

Together AI has enhanced the efficiency and performance of its DeepSeek-R1-0528 model on NVIDIA Blackwell.

The company's proprietary inference stack leverages NVIDIA HGX B200 GPUs for unprecedented speeds in model inference.

Together AI's lossless quantization technique preserves model accuracy while unlocking efficiency in NVFP4 & MXFP4/6/8 formats.

The company develops custom GPU kernels to optimize AI computations and reduce costs.

Together AI's open-source approach enables collaboration and community engagement, advancing the field of AI.

Together AI, a pioneer in open-source AI, has made significant strides in enhancing the efficiency and performance of its DeepSeek-R1-0528 model on NVIDIA Blackwell. This achievement is a testament to the company's commitment to pushing the boundaries of artificial intelligence.

At the heart of Together AI's success lies its proprietary inference stack, which leverages the power of NVIDIA HGX B200 GPUs. By combining advanced GPU kernels, an optimized software stack, and proprietary quantization techniques, Together AI has managed to achieve unprecedented speeds in model inference. The company's proprietary inference engine, known as Together Inference Engine, is a key component of this effort, featuring a range of innovative features such as FlashAttention-3, faster custom GEMM & MHA kernels, quality-preserving quantization, and speculative decoding.

One of the most significant breakthroughs in Together AI's development is its lossless quantization technique. By unlocking the efficiency of NVFP4 & MXFP4/6/8 formats while preserving model accuracy, even in challenging attention layers, the company has been able to overcome some of the limitations faced by other methods. This achievement has far-reaching implications for the wider AI community.

Another key aspect of Together AI's work is its emphasis on customization and optimization. The company's GPU kernels are custom-developed software programs that run on GPUs, performing critical AI computations such as attention mechanisms and matrix multiplications. By developing optimized kernels, Together AI has been able to unlock faster inference speeds – reducing costs and improving efficiency.

In addition to its technical advancements, Together AI has also made significant strides in terms of collaboration and community engagement. The company's open-source approach has enabled developers and researchers to access its cutting-edge technology and contribute to the development of new models and algorithms.

Furthermore, Together AI's work has been recognized by industry leaders and experts, who have praised the company's commitment to advancing the field of AI. The company's emphasis on innovation, collaboration, and open-source principles has made it a leader in the rapidly evolving landscape of artificial intelligence.

Related Information:

https://www.digitaleventhorizon.com/articles/Revolutionizing-AI-Inference-Together-AIs-Groundbreaking-Advancements-on-NVIDIA-Blackwell-deh.shtml

https://www.together.ai/blog/fastest-inference-for-deepseek-r1-0528-with-nvidia-hgx-b200

Published: Thu Jul 17 15:33:40 2025 by llama3.2 3B Q4_K_M

Today's AI/ML headlines are brought to you by ThreatPerspective

Revolutionizing AI Inference: Together AI's Groundbreaking Advancements on NVIDIA Blackwell