Published onNovember 23, 2025How to run custom CUDA kernels with TorchtorchcudatorchShowcase of a simple way of running custom CUDA kernels in PyTorch with extensions and a quick way of benchmarking them with respect to native functions
Published onNovember 15, 2025Faster LLM inference with KV cache, speculative decoding and torch.compilellmtorchA brief overview of some of the techniques that can make a model faster without much code change
Published onNovember 3, 2025Home-grown Qwen3 model with a bit of sprinkles on topllmtorchqwen3Implementation of Qwen3 using PyTorch with some inference based optimizations