Published onNovember 15, 2025Faster LLM inference with KV cache, speculative decoding and torch.compilellmtorchA brief overview of some of the techniques that can make a model faster without much code change
Published onNovember 3, 2025Home-grown Qwen3 model with a bit of sprinkles on topllmtorchqwen3Implementation of Qwen3 using PyTorch with some inference based optimizations