Living in a world where a world-class top guy posts a 2-hour video for free on how to make such cutting-edge stuff. I barely started this tutorial but at first I just wanted to say thank you mate!
I suggest watching this video multiple times in order to understand how transformers work. This is by far the best hands on explanation + example.
I am a college professor and learning GPT from Andrej. Every time I watch this video, I not only I learn the contents, also how to deliver any topic effectively. I would vote him as the "Best AI teacher in YouTube”. Salute to Andrej for his outstanding lectures.
thanks again for the great lecture. I am able to follow line by line and train it on labmda lab with light effort. Hope to buy you a coffee for all this hard work. Off to the next 4hr GPT2 repro 🧠🏋
Thank you for taking the time to create these lectures. I am sure it takes a lot of time and effort to record and cut these. Your effort to level up the the community is greatly appreciated. Thanks Andrej.
It is difficult to comprehend how lucky we are to have you teaching us. Thank you, Andrej.
I knew only python, math and definitions of NN, GA, ML and DNN. In 2 hours, this lecture has not only given me the understanding of GPT model, but also taught me how to read AI papers and turn them into code, how to use pytoch, and tons of AI definitions. This is the best lecture and practical application on AI. Because it not only gives you an idea of DNN, but also give you code directly from research papers and a final product. Looking forward to more lectures like these. Thanks Andrej Karpathy.
I found an intuitive explanation for Query/Key/Value in Batool Haider's video which said that Q x K.T / |Q||K| is basically computing the cosine similarity between Q and K which is higher if the vectors are pointing in the same or similar direction which is what yields the "affinity". And this Q, K.T product becomes a mask to the V to see what V we should focus on which is why Q x K.T x V yielding high values for correct predictions becomes the target for the neural network and it learns to do just that. And because it pushes the vectors (indirectly) for C towards similarity, strongly connected items land up "closer" together in the embed space. If this intuition is incorrect, I'm happy to hear how so I can learn.
The humbleness of a genius.. he doesn’t even assume ppl to know how dot product is computed.
Wow! Having the ex-lead of ML at Tesla make tutorials on ML is amazing. Thank you for producing these resources!
This lecture answers ALL my questions from the 2017 Attention Is All You Need paper. I am alway curious about the code behind Transformer. This lecture quenched my curiosity with a colab to tinker with. Thank you so much for your effort and time in creating the lecture to spread the knowledge!
I was always scared of Transformer's diagram. Honestly, I never understood how such schema could make sense until this day when Andrej enlightened us with his super teaching power. Thank you so much! Andrej, please save the day again by doing one more class about Stable Diffusion!! Please, you are the best!
I did something like this in 1993. I took a ling text and calculated the probability of one word (i worked with words, not tokens) being after another by parsing the full text. And I successfully created a single layer perceptron parrot which can spew almost meaningful sentences. My professors told me I should not pursue the neural network path because it's practically abandoned. I never trusted them. I'm glad to see neural networks' glorious comeback. Thank you Andrej Karpathy for what you have done for our industry and humanity by popularizing this.
I cannot thank you enough for this material. I've been a spoken language technologist for 20 years and this plus your micro-grad and make more videos has given me a graduate level update in less than 10 hours. Astonishingly well-prepared and presented material. Thank you.
Wow! I knew nothing and now I am enlightened! I actually understand how this AI/ML model works now. As a near 70 year old that just started playing with Python, I am a living example of how effective this lecture is. My humble thanks to Andrej Karpathy for allowing to see into and understand this emerging new world.
Andrej, I cannot comprehend how much effort you have put in making these videos. Humanity is thankful to you for making these publically available and educating us with your wisdom. One thing is to know the stuff and apply it in corp setting and another thing is to use that instead to educate millions for free. This is one of the best kind of charity a CS major can do. Kudos to you and thank you so much for doing this.
I think this style of teaching is much better than a lecture with powerpoint and whiteboard. This way you can actually see what the code is doing instead of guessing what all the math symbols mean. So thank you very much for this video!
Thanks for this well explained and wonderful series! Hope you will cover qunatization for people with low power GPU.
This lecture strikes the perfect balance between being highly educative and accessible. Many thanks, Andrej, for taking the time to walk us through such a lucid construction of Transformer-building-blocks from first principles. Given your responsibilities, you must be swamped, yet you found the time to educate the entire community! Truly inspirational. Cannot thank you enough.
@fgfanta