
Research & Papers
Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication
Asif RazzaqMarkTechPost
AI Summary
UC Berkeley's UCCL team has released mKernel, a fused kernel library that combines intra-node NVLink communication, inter-node RDMA, and dense compute operations into a single persistent CUDA kernel for improved multi-GPU and multi-node performance. This represents a research breakthrough in GPU-driven communication optimization for distributed computing systems.
This article was originally published on MarkTechPost. Read the full story at the source.
Read Full Article at MarkTechPostRelated Articles

RSI is the new AGI — and it’s just as hard to pin down
TechCrunch AI

A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System
MarkTechPost

Sakana AI Proposes DiffusionBlocks: a Block-wise Training Framework That Converts Residual Networks into Independently Trainable Denoising Modules
MarkTechPost

MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters
MarkTechPost