Quantization: How LLMs survive in low precision

「ツール」は右上に移動しました。

利用したサーバー: wtserver1

833いいね 12,804 views回再生

Quantization: How LLMs survive in low precision

In this video, we discuss the fundamentals of model quantization, the technique that allows us to run inference on massive LLMs like DeepSeek-R1 or Qwen.

Among others, we'll discuss:
⚆ What quantization really means (hint: it’s more than just rounding)
⚆ Why integers are faster than floats (with a deep dive into their internal structure)
⚆ How quantization preserves model accuracy
⚆ When to quantize: during training vs after training (PTQ vs QAT)
⚆ A hands-on explanation of scale, zero point, clipping ranges, and fixed-point math

If you enjoyed this, consider subscribing for upcoming videos on:
⚆ Post-training quantization (PTQ)
⚆ Quantization-aware training (QAT)
⚆ Training in low precision (e.g., FP4)
⚆ 1-bit LLMs

#Quantization #MachineLearning #AIOptimization #LLM #NeuralNetworks #QAT #PTQ #DeepLearning #EdgeAI #FixedPoint #BFloat16 #TensorRT #ONNX #AIAccelerators

00:00 Intro
00:50 What
02:10: Why
03:50: Integer vs floating point formats
06:45 When
09:21 How
14:40 Fixed point arithmetic
18:00 Matrix multiplications
20:07 Outro

Quantization: How LLMs survive in low precision

コメント