Torch

Published on
November 23, 2025
How to run custom CUDA kernels with Torch
torch cuda torch
Showcase of a simple way of running custom CUDA kernels in PyTorch with extensions and a quick way of benchmarking them with respect to native functions
Published on
November 15, 2025
Faster LLM inference with KV cache, speculative decoding and torch.compile
llm torch
A brief overview of some of the techniques that can make a model faster without much code change
Published on
November 3, 2025
Home-grown Qwen3 model with a bit of sprinkles on top
llm torch qwen3
Implementation of Qwen3 using PyTorch with some inference based optimizations

How to run custom CUDA kernels with Torch