I came across your video and blog while researching the new nano-vllm project. It uses the flash_attn_varlen_func. The project is supposed to be about 1500 lines of code, so I thought it would be easier to understand. I'll follow along as you decode this stuff. Keep up the good work.
@PatternRecognition-s2p