@mrdbourke

Wow! Looks like I need to upgrade to an M1 Ultra, turns out 16.44 mins is just enough time to make and eat a Vegemite sandwich! 

Hahaha, excellent video Alex, looking forward to the next one :)

@planetnicky11

That was great Alex! Loved seeing you collab
 with Daniel Bourke! I got the M1 MAX and was planning on utilizing all those GPUs for PyTorch workflows!

@Killermike2178

I ran the VGG16-CIFAR10 test on my M1 Max MBP with PyTorch 1.12 stable release. It regularly finishes just under 24 minutes each run, which is a 40% improvement over the nightly builds.

@mattbosley3531

I ran it on my 2019 laptop with an RTX 2070 and got 11.8 minutes. I'm running Manjaro Linux with Pytorch Cuda and Torchvision Cuda installed. I also have a Macbook Pro with an M1 Max which had a time similar to yours.

@olivierma2691

On my m2 max machine the evaluation took less than 10 minutes (9.75 min to be precise). Impressive progress!

@javaxerjack

Thank you Alex, you make experiment just what I want.

@nykz8043

The endscene was perfect:D

@saitaro

Macbook Pro 16 M2 Max 32GB 38 GPU cores result (PyTorch 2.0.0):

torch 2.0.0
device mps
Epoch: 001/001 | Batch 0000/1406 | Loss: 2.6346
Epoch: 001/001 | Batch 0100/1406 | Loss: 2.2348
Epoch: 001/001 | Batch 0200/1406 | Loss: 2.1773
Epoch: 001/001 | Batch 0300/1406 | Loss: 2.3495
Epoch: 001/001 | Batch 0400/1406 | Loss: 2.3165
Epoch: 001/001 | Batch 0500/1406 | Loss: 2.1477
Epoch: 001/001 | Batch 0600/1406 | Loss: 2.0689
Epoch: 001/001 | Batch 0700/1406 | Loss: 2.0424
Epoch: 001/001 | Batch 0800/1406 | Loss: 1.9650
Epoch: 001/001 | Batch 0900/1406 | Loss: 1.9270
Epoch: 001/001 | Batch 1000/1406 | Loss: 1.8402
Epoch: 001/001 | Batch 1100/1406 | Loss: 1.8375
Epoch: 001/001 | Batch 1200/1406 | Loss: 1.8020
Epoch: 001/001 | Batch 1300/1406 | Loss: 1.9095
Epoch: 001/001 | Batch 1400/1406 | Loss: 2.0477
Time / epoch without evaluation: 9.76 min
Epoch: 001/001 | Train: 25.62% | Validation: 25.72% | Best Validation (Ep. 001): 25.72%
Time elapsed: 12.44 min
Total Training Time: 12.44 min
Test accuracy 26.20%
Total Time: 13.16 min

@wyneg.s.rhuntar

Good colab, add freshness to the videos ;)

@alfcnz

What an awesome videocasting steup you have right there! I quite envy it! Haha! 😅😅😅

@No1mrnoobplayer

That last part got me week 😂

@1337CodeMaster

Thanks for putting in the work! These benchmarks are really nice to get an idea of the performance. Some feedback:

- The graph was very hard to read (what does 50 and 100 mean?) and we never got to see the 1 graph I was here for: M1, cpu and gpu on the same graph. I can remember these numbers over time.

- VGG is oooold. Like was said in the video, it was introduced on 2015 (2014 was the paper I think). The point being that unlike conventional software, neural network architectures change heavily over time and this has a big impact if the hardware does not follow that evolution. This means the VGG benchmark is essentially almost worthless as no one uses that anymore and the newer model use completely different layouts, which can or cannot make use of dedicated hardware instructions to do them. Ideally you can run a suite of different model types.

- like was mentioned before in the comments, it could be interesting to compare to tensorflow too :)

@jocalvo

That was amazing thanks guys! I definitely need one of those for my job :)

@SunSin91

haha. I started the cifar10 training, had 80% battery on my macbook pro max version, and 30 min later, 10% battery.

@mlu3D

My man living the life dropping knowledge in his boxers!

@randomhkkid

Deep learning can be heavily bottlenecked by memory bandwidth, this is especially true in CNN architectures like VGG. That explains why there's not much difference between Pro and Max, but then a significant uplift moving to the Ultra.

@ravenclawgamer6367

Titan performed impressively. After all, it's a full generation older and is about to get two generations old in a few months.

@tpkapp

Happy to know that pytorch is finally available for M1 GPUs and also I don't have to throw away my workstation with 2080Ti yet.

@crusnic_corp

My Dell Inspiron 16 Plus with Nvidia Rtx 3060 6GB version took 18 min to run. Note that I had to reduce the Batch Size to 16 instead of 32. Which means, If I could run it using a Batch of 32, it would take around 9 min. And the total price of my laptop is around 1400 EUR. I would suggest ML/Data Science major not to go with a Macbook.

@karoliinasalminen

Looks like Mac Studio with M1 Ultra has a edge on this. 16 mins is not bad vs dedicated GPU. Not as ultra as Apple says but could be good enough for me, an acceptable compromise for everything else that the Mac can do (like I create also music and videos) and the Linux box adequately can't.