@3blue1brown

There's a common question I saw here in the comments about masked self-attention, where people ask about cases where it feels like later words should update the meaning of earlier words. For example, in many languages, adjectives come after nouns.

The model can always put the richest meaning into the last token (e.g., early nouns getting baked into later adjectives). For example, @victorlevoso8984 noted below how empirical evidence suggests the meaning of a sentence often gets baked into the embedding of the punctuation mark at its end. Keep in mind that the model doesn't have to conceptualize things the way we humans do, and in all likelihood, it doesn't at all, so I wouldn't over-index on the motivating example given in this video.

@philrod1

I'm a university lecturer with a PhD in AI, and I cannot compete with the quality of this work. Videos like this put the entire higher education system to shame. Fantastic! ❤️

@brunojl2

The volume of work, attention to detail and clarity we get from Grant is staggering. Bravo sir.

@actualBIAS

Are you kidding me? ONE WEEK FOR 2 MASTERPIECES?!
Thank you so much!

@QuantAI-kp8xt

How I wish this video was available when the "Attention is What You Need" paper just came out.  It was really hard to visualize by simply reading the paper.    I read it multiple times but could not figure out what it was trying to do.  

Then subsequently,  Jay Alammar posted a blog post called The illustrated transformer.    That was a huge help for me back then. But this video raises the illustration to an entirely different level.   

Great job!   I'm sure many undergraduates or hobbyist studying machine learning would benefit greatly.

@Steamrick

I've got to say - "Attention Is All You Need" is an incredible title for a research paper.

@BMcLean

This video has probably incleased the number of people in the world able to comprehend these topics by 1-2 orders of magnitude!

@hailking5588

As a graduating PhD student working in Natural Language Processing, I still found that video to be extremely beneficial. Awesome!

@sriramsrinivasan2769

3b1b is the only content producer whose videos I start by first making coffee, then upvoting, then hitting the play button.

@Henry-fv3bc

Attention existed before the 2017 paper "Attention Is All You Need".

The main contribution was that attention was... all you needed for sequence processing (you didn't need recurrence). Self-attention specifically was novel though.

@JonyBetancourt

As director of video content for a major educational publisher, this is some of the best educational content I’ve ever seen. Your content gives me ideas of how to shape the future of undergraduate level STEM videos.  A true legend and inspiration in this space- thank you for the meticulously outstanding work that you do.

@DataRae-AIEngineer

Geez Grant, I spent thousands of dollars on a very good deep learning executive certification from Carnegie Mellon, and your series here is better than their math slides. This series is really turning out great.

@annachester5790

I'm a Computer Science student currently working with a Transformer for my master thesis and this video is absolute gold to me. I think this is the best explanation video I've ever seen. Holy shit, it is so clear and insightful. I'm so looking forward to the third video of the series!!!! The first one was absolutely amazing too. Thank you sooo much for this genius piece of work!!!!

@colinolliver8985

Never before have I felt the urge to support a creator for a specific video, but  you sir have knocked it out the park and this is by far the best educational video I have seen. Not just on YouTube but in my last 20 years of working with data 🎉

@MatheusC1729

I cannot stress enough what a tour de force this is. It's probably one of the best math classes ever done anywhere in the world in all time.

 You're the best in the game and an inspiration for many. So so much thank you, Grant, you're doing God's work here.

@jafetsierra1875

This is pure gold!. Never seen such a good explanation of the attention mechanism before. Thank you for this.

@michaelthompson9862

You not only put out some of the best content on youtube but also give constant shutouts to other content creators that you admire. You are a GOAT 3Blue1Brown.

@prateeksarna9225

I really appreciate how you work so hard, in making these animations to visualize math.

@Otomega1

Just Wow, the educational value of this video is incredible. 
There are so many highly relevant and original ideas to explain abstract concepts and drastically simplify comprehension.
I'm so thankful that you've made this content available to everyone for free. 
I absolutely love it!!

@fluffy7bunny

the fact that this is freely available on YT is insane: thanks for all the amazing work throughout the years.