vishal
Week 1 as ColBERT Maintainer: Key Findings from Reviewing Open Issues & PRs
9:10
vishal
Live Coding: Using an LLM for Retrieval in my AgentFastbook Project
53:40
vishal
Error Analysis: Haiku 3 5 and Sonnet 4 Text Decomposition
1:37:12
vishal
Debugging Flash Attention in LLM Foundry: A Surprising 20% Slow Down Using flash_attn_varlen_func
28:01
vishal
portfolio-llm: Building a Professional Portfolio You Can Chat With
25:55
vishal
Can an LLM Curate a Dataset? Live-Evaluating Haiku's Text Decomposition
1:00:00
vishal
Takeaways from Gemini's Deep Research Report for Small Batch Training
34:04
vishal
Finding My Moat: How Boring Tasks, Expert Advice and the AI Evals Course are Shaping My AI Projects
28:53
vishal
Research Paper Summary: Tulu 3
1:07:03
vishal
Proof, Pricing, and Passion: Finding My Path in Machine Learning
21:11
vishal
Cross-Entropy Loss Explained
40:39
vishal
Research Paper Summary - Group Normalization
45:41
vishal
HuggingFace's Default KV Cache and the flash_attn_varlen_func Docstring
1:07:53
vishal
First Experiments with the fastai Imagenette Dataset
14:36
vishal
Understanding the Mean Shift Clustering Algorithm (and PyTorch Broadcasting) | fastai course Part 2
44:05
vishal
Exploring Precision in ColBERT Indexing and Retrieval
18:43
vishal
LLM-Foundry uses flash_attn_varlen_func by default. BinPackCollator does naive sequence packing.
21:43
vishal
Understanding Eager Bidirectional Attention via the Attention Mask 🎭
26:09
vishal
The Evolution of Matrix Multiplication, Part 2: PyTorch and Numba on the GPU | fastai course Part 2
26:59
vishal
Understanding ColBERT's ivf.pid.pt: Inspecting Intermediate Artifacts from _build_ivf & optimize_ivf
50:20
vishal
Do RAGatouille and ColBERT Produce the Same Index and Retrieval Scores? A Deep Dive Comparison
19:13
vishal
TIL: Understanding LLM Foundry's BinPackCollator (Sequence Packing for 95% Token Efficiency!)
19:12
vishal
Improving LLM Judge Alignment: Enhancing TinyScale Lab Evaluation Agreement to 94%
23:06
vishal
Technical Report Summary: Nomic Embed
26:38
vishal
Understanding Sequence Packing: Initial Musings
21:17
vishal
Building an LLM Judge Agreement App: 7 Iterations from Basic to Full Functionality
15:06
vishal
Evaluating First Attempt LLM Judge Scores: Improving Claude Haiku Alignment for Story Scoring
44:00
vishal
Manual Scoring Results for TinyStories Models: Grammar, Reasoning, and Emergent Capabilities
36:39
vishal
Look at Your Data: Building an LM Scoring App with FastHTML
40:23
vishal
TSL: Curating Evaluation Prompts, Defining Scoring Criteria + Designing LLM Judge Prompt Template
30:15
vishal
TinyScale Lab Update: Setting Eval Targets + Generation Completions for LLM Judge Development
19:41
vishal
TinyScaleLab Project Update: Training Cost Analysis and Evaluation Infrastructure Plans
9:24
vishal
TinyScaleLab: Exploring the Connection Between Training Dynamics and Model Capabilities
26:51
vishal
Research Paper Summary: Small-scale proxies for large-scale Transformer training instabilities
1:04:03
vishal
My Second-Place Winning Tiny Model Hackathon Journey: Pre-Training from Scratch
22:47
vishal
LossInspector: A Deep Dive Into LLM-Foundry's Next-Token Prediction with a Custom Composer Callback
21:19
vishal
The Evolution of Matrix Multiplication: 12,000x Numba Speedup 🚀 | fastai Course Lesson 11
28:41
vishal
Research Paper Summary: TinyStories
1:23:34
vishal
Look at Your Data: Manual Validation of Retrieval Metrics
47:38
vishal
Creating a Custom Composer Callback to Track Data Types in LLM Training | Mixed Precision Deep Dive
46:10
vishal
Paper Reading: Small-scale proxies for large-scale Transformer training instabilities
1:16:36
vishal
Paper Reading: Overtrained Language Models Are Harder to Fine-Tune
1:36:58
vishal
Paper Reading: SmolLM2
55:01
vishal
Exploring Sequential and Merged Linear Layer Forward Passes
20:16
vishal
TIL: Using PyTorch's register_forward_hook to Trace Floating Point Errors
9:21
vishal
Debugging Un-Merged and Merged LoRA Model Output Differences
26:21
vishal
LoraModel.merge_and_unload Deep Dive
18:10
vishal
RAGatouille/ColBERT Indexing Deep Dive
1:05:43
vishal
Recreating Plots from Appendix A.4 of the DoRA Paper for LoRA Learns Less and Forgets Less Models
25:43
vishal
TIL: PeftModel Base Model Behavior
17:43
vishal
Research Paper Summary: Hypencoder: Hypernetworks for Information Retrieval
41:51
vishal
Code Walkthrough - peft DoRA Implementation
26:53
vishal
Research Paper Summary: rsLoRA
19:42
vishal
Research Paper Summary: LoRA Learns Less and Forgets Less
36:57
vishal
Recreating the PLAID ColBERTv2 Scoring Pipeline: From Research Code to RAGatouille
1:14:59
vishal
fastbook-benchmark: ColBERT Search
11:48
vishal
fastbook-benchmark: Single Vector Search
18:09
vishal
fastbook-benchmark: Scoring Retrieval Results
20:36
vishal
fastbook-benchmark: Full Text Search Implementation
16:37
vishal
fastbook-benchmark: Document Processing
17:01
vishal
Introducing the fastbook-benchmark Information Retrieval QA Dataset
4:27
vishal
Implementing Image to Image Generation in Stable Diffusion from Scratch | fastai Part 2
21:27
vishal
Implementing Negative Prompting in Stable Diffusion from Scratch | fastai Part 2
37:34
vishal
fastai - Chapter 8 - Collaborative Filtering Deep Dive Code Walkthrough
38:55
vishal
Implementing a Custom Test Time Augmentation Method using fastai
25:12
vishal
fastai - Chapter 6 - Building Single, Multi-label Classification and Image Regression Models
2:41:31