Institute of Poorly Optimized GPU Code

Blog Tags Projects About

Llm

Published on
November 15, 2025
Faster LLM inference with KV cache, speculative decoding and torch.compile
llm torch
A brief overview of some of the techniques that can make a model faster without much code change
Published on
November 3, 2025
Home-grown Qwen3 model with a bit of sprinkles on top
llm torch qwen3
Implementation of Qwen3 using PyTorch with some inference based optimizations