Week 1 as ColBERT Maintainer: Key Findings from Reviewing Open Issues & PRs
vishal
Week 1 as ColBERT Maintainer: Key Findings from Reviewing Open Issues & PRs
9:10
Live Coding: Using an LLM for Retrieval in my AgentFastbook Project
vishal
Live Coding: Using an LLM for Retrieval in my AgentFastbook Project
53:40
Error Analysis:  Haiku 3 5 and Sonnet 4 Text Decomposition
vishal
Error Analysis: Haiku 3 5 and Sonnet 4 Text Decomposition
1:37:12
Debugging Flash Attention in LLM Foundry: A Surprising 20% Slow Down Using flash_attn_varlen_func
vishal
Debugging Flash Attention in LLM Foundry: A Surprising 20% Slow Down Using flash_attn_varlen_func
28:01
portfolio-llm: Building a Professional Portfolio You Can Chat With
vishal
portfolio-llm: Building a Professional Portfolio You Can Chat With
25:55
Can an LLM Curate a Dataset? Live-Evaluating Haiku's Text Decomposition
vishal
Can an LLM Curate a Dataset? Live-Evaluating Haiku's Text Decomposition
1:00:00
Takeaways from Gemini's Deep Research Report for Small Batch Training
vishal
Takeaways from Gemini's Deep Research Report for Small Batch Training
34:04
Finding My Moat: How Boring Tasks, Expert Advice and the AI Evals Course are Shaping My AI Projects
vishal
Finding My Moat: How Boring Tasks, Expert Advice and the AI Evals Course are Shaping My AI Projects
28:53
Research Paper Summary: Tulu 3
vishal
Research Paper Summary: Tulu 3
1:07:03
Proof, Pricing, and Passion: Finding My Path in Machine Learning
vishal
Proof, Pricing, and Passion: Finding My Path in Machine Learning
21:11
Cross-Entropy Loss Explained
vishal
Cross-Entropy Loss Explained
40:39
Research Paper Summary - Group Normalization
vishal
Research Paper Summary - Group Normalization
45:41
HuggingFace's Default KV Cache and the flash_attn_varlen_func Docstring
vishal
HuggingFace's Default KV Cache and the flash_attn_varlen_func Docstring
1:07:53
First Experiments with the fastai Imagenette Dataset
vishal
First Experiments with the fastai Imagenette Dataset
14:36
Understanding the Mean Shift Clustering Algorithm (and PyTorch Broadcasting) | fastai course Part 2
vishal
Understanding the Mean Shift Clustering Algorithm (and PyTorch Broadcasting) | fastai course Part 2
44:05
Exploring Precision in ColBERT Indexing and Retrieval
vishal
Exploring Precision in ColBERT Indexing and Retrieval
18:43
LLM-Foundry uses flash_attn_varlen_func by default. BinPackCollator does naive sequence packing.
vishal
LLM-Foundry uses flash_attn_varlen_func by default. BinPackCollator does naive sequence packing.
21:43
Understanding Eager Bidirectional Attention via the Attention Mask 🎭
vishal
Understanding Eager Bidirectional Attention via the Attention Mask 🎭
26:09
The Evolution of Matrix Multiplication, Part 2: PyTorch and Numba on the GPU | fastai course Part 2
vishal
The Evolution of Matrix Multiplication, Part 2: PyTorch and Numba on the GPU | fastai course Part 2
26:59
Understanding ColBERT's ivf.pid.pt: Inspecting Intermediate Artifacts from _build_ivf & optimize_ivf
vishal
Understanding ColBERT's ivf.pid.pt: Inspecting Intermediate Artifacts from _build_ivf & optimize_ivf
50:20
Do RAGatouille and ColBERT Produce the Same Index and Retrieval Scores? A Deep Dive Comparison
vishal
Do RAGatouille and ColBERT Produce the Same Index and Retrieval Scores? A Deep Dive Comparison
19:13
TIL: Understanding LLM Foundry's BinPackCollator (Sequence Packing for 95% Token Efficiency!)
vishal
TIL: Understanding LLM Foundry's BinPackCollator (Sequence Packing for 95% Token Efficiency!)
19:12
Improving LLM Judge Alignment: Enhancing TinyScale Lab Evaluation Agreement to 94%
vishal
Improving LLM Judge Alignment: Enhancing TinyScale Lab Evaluation Agreement to 94%
23:06
Technical Report Summary: Nomic Embed
vishal
Technical Report Summary: Nomic Embed
26:38
Understanding Sequence Packing: Initial Musings
vishal
Understanding Sequence Packing: Initial Musings
21:17
Building an LLM Judge Agreement App: 7 Iterations from Basic to Full Functionality
vishal
Building an LLM Judge Agreement App: 7 Iterations from Basic to Full Functionality
15:06
Evaluating First Attempt LLM Judge Scores: Improving Claude Haiku Alignment for Story Scoring
vishal
Evaluating First Attempt LLM Judge Scores: Improving Claude Haiku Alignment for Story Scoring
44:00
Manual Scoring Results for TinyStories Models: Grammar, Reasoning, and Emergent Capabilities
vishal
Manual Scoring Results for TinyStories Models: Grammar, Reasoning, and Emergent Capabilities
36:39
Look at Your Data: Building an LM Scoring App with FastHTML
vishal
Look at Your Data: Building an LM Scoring App with FastHTML
40:23
TSL: Curating Evaluation Prompts, Defining Scoring Criteria + Designing LLM Judge Prompt Template
vishal
TSL: Curating Evaluation Prompts, Defining Scoring Criteria + Designing LLM Judge Prompt Template
30:15
TinyScale Lab Update: Setting Eval Targets + Generation Completions for LLM Judge Development
vishal
TinyScale Lab Update: Setting Eval Targets + Generation Completions for LLM Judge Development
19:41
TinyScaleLab Project Update: Training Cost Analysis and Evaluation Infrastructure Plans
vishal
TinyScaleLab Project Update: Training Cost Analysis and Evaluation Infrastructure Plans
9:24
TinyScaleLab: Exploring the Connection Between Training Dynamics and Model Capabilities
vishal
TinyScaleLab: Exploring the Connection Between Training Dynamics and Model Capabilities
26:51
Research Paper Summary: Small-scale proxies for large-scale Transformer training instabilities
vishal
Research Paper Summary: Small-scale proxies for large-scale Transformer training instabilities
1:04:03
My Second-Place Winning Tiny Model Hackathon Journey: Pre-Training from Scratch
vishal
My Second-Place Winning Tiny Model Hackathon Journey: Pre-Training from Scratch
22:47
LossInspector: A Deep Dive Into LLM-Foundry's Next-Token Prediction with a Custom Composer Callback
vishal
LossInspector: A Deep Dive Into LLM-Foundry's Next-Token Prediction with a Custom Composer Callback
21:19
The Evolution of Matrix Multiplication: 12,000x Numba Speedup 🚀 | fastai Course Lesson 11
vishal
The Evolution of Matrix Multiplication: 12,000x Numba Speedup 🚀 | fastai Course Lesson 11
28:41
Research Paper Summary: TinyStories
vishal
Research Paper Summary: TinyStories
1:23:34
Look at Your Data: Manual Validation of Retrieval Metrics
vishal
Look at Your Data: Manual Validation of Retrieval Metrics
47:38
Creating a Custom Composer Callback to Track Data Types in LLM Training | Mixed Precision Deep Dive
vishal
Creating a Custom Composer Callback to Track Data Types in LLM Training | Mixed Precision Deep Dive
46:10
Paper Reading: Small-scale proxies for large-scale Transformer training instabilities
vishal
Paper Reading: Small-scale proxies for large-scale Transformer training instabilities
1:16:36
Paper Reading: Overtrained Language Models Are Harder to Fine-Tune
vishal
Paper Reading: Overtrained Language Models Are Harder to Fine-Tune
1:36:58
Paper Reading: SmolLM2
vishal
Paper Reading: SmolLM2
55:01
Exploring Sequential and Merged Linear Layer Forward Passes
vishal
Exploring Sequential and Merged Linear Layer Forward Passes
20:16
TIL: Using PyTorch's register_forward_hook to Trace Floating Point Errors
vishal
TIL: Using PyTorch's register_forward_hook to Trace Floating Point Errors
9:21
Debugging Un-Merged and Merged LoRA Model Output Differences
vishal
Debugging Un-Merged and Merged LoRA Model Output Differences
26:21
LoraModel.merge_and_unload Deep Dive
vishal
LoraModel.merge_and_unload Deep Dive
18:10
RAGatouille/ColBERT Indexing Deep Dive
vishal
RAGatouille/ColBERT Indexing Deep Dive
1:05:43
Recreating Plots from Appendix A.4 of the DoRA Paper for LoRA Learns Less and Forgets Less Models
vishal
Recreating Plots from Appendix A.4 of the DoRA Paper for LoRA Learns Less and Forgets Less Models
25:43
TIL: PeftModel Base Model Behavior
vishal
TIL: PeftModel Base Model Behavior
17:43
Research Paper Summary: Hypencoder: Hypernetworks for Information Retrieval
vishal
Research Paper Summary: Hypencoder: Hypernetworks for Information Retrieval
41:51
Code Walkthrough - peft DoRA Implementation
vishal
Code Walkthrough - peft DoRA Implementation
26:53
Research Paper Summary: rsLoRA
vishal
Research Paper Summary: rsLoRA
19:42
Research Paper Summary: LoRA Learns Less and Forgets Less
vishal
Research Paper Summary: LoRA Learns Less and Forgets Less
36:57
Recreating the PLAID ColBERTv2 Scoring Pipeline: From Research Code to RAGatouille
vishal
Recreating the PLAID ColBERTv2 Scoring Pipeline: From Research Code to RAGatouille
1:14:59
fastbook-benchmark: ColBERT Search
vishal
fastbook-benchmark: ColBERT Search
11:48
fastbook-benchmark: Single Vector Search
vishal
fastbook-benchmark: Single Vector Search
18:09
fastbook-benchmark: Scoring Retrieval Results
vishal
fastbook-benchmark: Scoring Retrieval Results
20:36
fastbook-benchmark: Full Text Search Implementation
vishal
fastbook-benchmark: Full Text Search Implementation
16:37
fastbook-benchmark: Document Processing
vishal
fastbook-benchmark: Document Processing
17:01
Introducing the fastbook-benchmark Information Retrieval QA Dataset
vishal
Introducing the fastbook-benchmark Information Retrieval QA Dataset
4:27
Implementing Image to Image Generation in Stable Diffusion from Scratch | fastai Part 2
vishal
Implementing Image to Image Generation in Stable Diffusion from Scratch | fastai Part 2
21:27
Implementing Negative Prompting in Stable Diffusion from Scratch | fastai Part 2
vishal
Implementing Negative Prompting in Stable Diffusion from Scratch | fastai Part 2
37:34
fastai - Chapter 8 - Collaborative Filtering Deep Dive Code Walkthrough
vishal
fastai - Chapter 8 - Collaborative Filtering Deep Dive Code Walkthrough
38:55
Implementing a Custom Test Time Augmentation Method using fastai
vishal
Implementing a Custom Test Time Augmentation Method using fastai
25:12
fastai - Chapter 6 - Building Single, Multi-label Classification and Image Regression Models
vishal
fastai - Chapter 6 - Building Single, Multi-label Classification and Image Regression Models
2:41:31