vishal

Week 1 as ColBERT Maintainer: Key Findings from Reviewing Open Issues & PRs

vishal

Week 1 as ColBERT Maintainer: Key Findings from Reviewing Open Issues & PRs

9:10

Live Coding: Using an LLM for Retrieval in my AgentFastbook Project

vishal

Live Coding: Using an LLM for Retrieval in my AgentFastbook Project

53:40

Error Analysis: Haiku 3 5 and Sonnet 4 Text Decomposition

vishal

Error Analysis: Haiku 3 5 and Sonnet 4 Text Decomposition

1:37:12

Debugging Flash Attention in LLM Foundry: A Surprising 20% Slow Down Using flash_attn_varlen_func

vishal

Debugging Flash Attention in LLM Foundry: A Surprising 20% Slow Down Using flash_attn_varlen_func

28:01

portfolio-llm: Building a Professional Portfolio You Can Chat With

vishal

portfolio-llm: Building a Professional Portfolio You Can Chat With

25:55

Can an LLM Curate a Dataset? Live-Evaluating Haiku's Text Decomposition

vishal

Can an LLM Curate a Dataset? Live-Evaluating Haiku's Text Decomposition

1:00:00

Takeaways from Gemini's Deep Research Report for Small Batch Training

vishal

Takeaways from Gemini's Deep Research Report for Small Batch Training

34:04

Finding My Moat: How Boring Tasks, Expert Advice and the AI Evals Course are Shaping My AI Projects

vishal

Finding My Moat: How Boring Tasks, Expert Advice and the AI Evals Course are Shaping My AI Projects

28:53

Research Paper Summary: Tulu 3

vishal

Research Paper Summary: Tulu 3

1:07:03

Proof, Pricing, and Passion: Finding My Path in Machine Learning

vishal

Proof, Pricing, and Passion: Finding My Path in Machine Learning

21:11

Cross-Entropy Loss Explained

vishal

Cross-Entropy Loss Explained

40:39

Research Paper Summary - Group Normalization

vishal

Research Paper Summary - Group Normalization

45:41

HuggingFace's Default KV Cache and the flash_attn_varlen_func Docstring

vishal

HuggingFace's Default KV Cache and the flash_attn_varlen_func Docstring

1:07:53

First Experiments with the fastai Imagenette Dataset

vishal

First Experiments with the fastai Imagenette Dataset

14:36

Understanding the Mean Shift Clustering Algorithm (and PyTorch Broadcasting) | fastai course Part 2

vishal

Understanding the Mean Shift Clustering Algorithm (and PyTorch Broadcasting) | fastai course Part 2

44:05

Exploring Precision in ColBERT Indexing and Retrieval

vishal

Exploring Precision in ColBERT Indexing and Retrieval

18:43

LLM-Foundry uses flash_attn_varlen_func by default. BinPackCollator does naive sequence packing.

vishal

LLM-Foundry uses flash_attn_varlen_func by default. BinPackCollator does naive sequence packing.

21:43

Understanding Eager Bidirectional Attention via the Attention Mask 🎭

vishal

Understanding Eager Bidirectional Attention via the Attention Mask 🎭

26:09

The Evolution of Matrix Multiplication, Part 2: PyTorch and Numba on the GPU | fastai course Part 2

vishal

The Evolution of Matrix Multiplication, Part 2: PyTorch and Numba on the GPU | fastai course Part 2

26:59

Understanding ColBERT's ivf.pid.pt: Inspecting Intermediate Artifacts from _build_ivf & optimize_ivf

vishal

Understanding ColBERT's ivf.pid.pt: Inspecting Intermediate Artifacts from _build_ivf & optimize_ivf

50:20

Do RAGatouille and ColBERT Produce the Same Index and Retrieval Scores? A Deep Dive Comparison

vishal

Do RAGatouille and ColBERT Produce the Same Index and Retrieval Scores? A Deep Dive Comparison

19:13

TIL: Understanding LLM Foundry's BinPackCollator (Sequence Packing for 95% Token Efficiency!)

vishal

TIL: Understanding LLM Foundry's BinPackCollator (Sequence Packing for 95% Token Efficiency!)

19:12

Improving LLM Judge Alignment: Enhancing TinyScale Lab Evaluation Agreement to 94%

vishal

Improving LLM Judge Alignment: Enhancing TinyScale Lab Evaluation Agreement to 94%

23:06

Technical Report Summary: Nomic Embed

vishal

Technical Report Summary: Nomic Embed

26:38

Understanding Sequence Packing: Initial Musings

vishal

Understanding Sequence Packing: Initial Musings

21:17

Building an LLM Judge Agreement App: 7 Iterations from Basic to Full Functionality

vishal

Building an LLM Judge Agreement App: 7 Iterations from Basic to Full Functionality

15:06

Evaluating First Attempt LLM Judge Scores: Improving Claude Haiku Alignment for Story Scoring

vishal

Evaluating First Attempt LLM Judge Scores: Improving Claude Haiku Alignment for Story Scoring

44:00

Manual Scoring Results for TinyStories Models: Grammar, Reasoning, and Emergent Capabilities

vishal

Manual Scoring Results for TinyStories Models: Grammar, Reasoning, and Emergent Capabilities

36:39

Look at Your Data: Building an LM Scoring App with FastHTML

vishal

Look at Your Data: Building an LM Scoring App with FastHTML

40:23

TSL: Curating Evaluation Prompts, Defining Scoring Criteria + Designing LLM Judge Prompt Template

vishal

TSL: Curating Evaluation Prompts, Defining Scoring Criteria + Designing LLM Judge Prompt Template

30:15

TinyScale Lab Update: Setting Eval Targets + Generation Completions for LLM Judge Development

vishal

TinyScale Lab Update: Setting Eval Targets + Generation Completions for LLM Judge Development

19:41

TinyScaleLab Project Update: Training Cost Analysis and Evaluation Infrastructure Plans

vishal

TinyScaleLab Project Update: Training Cost Analysis and Evaluation Infrastructure Plans

9:24

TinyScaleLab: Exploring the Connection Between Training Dynamics and Model Capabilities

vishal

TinyScaleLab: Exploring the Connection Between Training Dynamics and Model Capabilities

26:51

Research Paper Summary: Small-scale proxies for large-scale Transformer training instabilities

vishal

Research Paper Summary: Small-scale proxies for large-scale Transformer training instabilities

1:04:03

My Second-Place Winning Tiny Model Hackathon Journey: Pre-Training from Scratch

vishal

My Second-Place Winning Tiny Model Hackathon Journey: Pre-Training from Scratch

22:47

LossInspector: A Deep Dive Into LLM-Foundry's Next-Token Prediction with a Custom Composer Callback

vishal

LossInspector: A Deep Dive Into LLM-Foundry's Next-Token Prediction with a Custom Composer Callback

21:19

The Evolution of Matrix Multiplication: 12,000x Numba Speedup 🚀 | fastai Course Lesson 11

vishal

The Evolution of Matrix Multiplication: 12,000x Numba Speedup 🚀 | fastai Course Lesson 11

28:41

Research Paper Summary: TinyStories

vishal

Research Paper Summary: TinyStories

1:23:34

Look at Your Data: Manual Validation of Retrieval Metrics

vishal

Look at Your Data: Manual Validation of Retrieval Metrics

47:38

Creating a Custom Composer Callback to Track Data Types in LLM Training | Mixed Precision Deep Dive

vishal

Creating a Custom Composer Callback to Track Data Types in LLM Training | Mixed Precision Deep Dive

46:10

Paper Reading: Small-scale proxies for large-scale Transformer training instabilities

vishal

Paper Reading: Small-scale proxies for large-scale Transformer training instabilities

1:16:36

Paper Reading: Overtrained Language Models Are Harder to Fine-Tune

vishal

Paper Reading: Overtrained Language Models Are Harder to Fine-Tune

1:36:58

Paper Reading: SmolLM2

vishal

Paper Reading: SmolLM2

55:01

Exploring Sequential and Merged Linear Layer Forward Passes

vishal

Exploring Sequential and Merged Linear Layer Forward Passes

20:16

TIL: Using PyTorch's register_forward_hook to Trace Floating Point Errors

vishal

TIL: Using PyTorch's register_forward_hook to Trace Floating Point Errors

9:21

Debugging Un-Merged and Merged LoRA Model Output Differences

vishal

Debugging Un-Merged and Merged LoRA Model Output Differences

26:21

LoraModel.merge_and_unload Deep Dive

vishal

LoraModel.merge_and_unload Deep Dive

18:10

RAGatouille/ColBERT Indexing Deep Dive

vishal

RAGatouille/ColBERT Indexing Deep Dive

1:05:43

Recreating Plots from Appendix A.4 of the DoRA Paper for LoRA Learns Less and Forgets Less Models

vishal

Recreating Plots from Appendix A.4 of the DoRA Paper for LoRA Learns Less and Forgets Less Models

25:43

TIL: PeftModel Base Model Behavior

vishal

TIL: PeftModel Base Model Behavior

17:43

Research Paper Summary: Hypencoder: Hypernetworks for Information Retrieval

vishal

Research Paper Summary: Hypencoder: Hypernetworks for Information Retrieval

41:51

Code Walkthrough - peft DoRA Implementation

vishal

Code Walkthrough - peft DoRA Implementation

26:53

Research Paper Summary: rsLoRA

vishal

Research Paper Summary: rsLoRA

19:42

Research Paper Summary: LoRA Learns Less and Forgets Less

vishal

Research Paper Summary: LoRA Learns Less and Forgets Less

36:57

Recreating the PLAID ColBERTv2 Scoring Pipeline: From Research Code to RAGatouille

vishal

Recreating the PLAID ColBERTv2 Scoring Pipeline: From Research Code to RAGatouille

1:14:59

fastbook-benchmark: ColBERT Search

vishal

fastbook-benchmark: ColBERT Search

11:48

fastbook-benchmark: Single Vector Search

vishal

fastbook-benchmark: Single Vector Search

18:09

fastbook-benchmark: Scoring Retrieval Results

vishal

fastbook-benchmark: Scoring Retrieval Results

20:36

fastbook-benchmark: Full Text Search Implementation

vishal

fastbook-benchmark: Full Text Search Implementation

16:37

fastbook-benchmark: Document Processing

vishal

fastbook-benchmark: Document Processing

17:01

Introducing the fastbook-benchmark Information Retrieval QA Dataset

vishal

Introducing the fastbook-benchmark Information Retrieval QA Dataset

4:27

Implementing Image to Image Generation in Stable Diffusion from Scratch | fastai Part 2

vishal

Implementing Image to Image Generation in Stable Diffusion from Scratch | fastai Part 2

21:27

Implementing Negative Prompting in Stable Diffusion from Scratch | fastai Part 2

vishal

Implementing Negative Prompting in Stable Diffusion from Scratch | fastai Part 2

37:34

fastai - Chapter 8 - Collaborative Filtering Deep Dive Code Walkthrough

vishal

fastai - Chapter 8 - Collaborative Filtering Deep Dive Code Walkthrough

38:55

Implementing a Custom Test Time Augmentation Method using fastai

vishal

Implementing a Custom Test Time Augmentation Method using fastai

25:12

fastai - Chapter 6 - Building Single, Multi-label Classification and Image Regression Models

vishal

fastai - Chapter 6 - Building Single, Multi-label Classification and Image Regression Models

2:41:31

次のページ