@PurpleMindCS

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/PurpleMind/ . You’ll also get 20% off an annual premium subscription.

@Lewehot

I figured out the two digit at a time multiplication and a few things not mentioned in the video when I was 14 y/o in 1982.  I was rotating points of a tie fighter on an Apple II+ 6502 CPU.  I think I got it down to 20 clock cycles to do a 16 bit multiplication on an 8 bit CPU that took 1 to 5 clocks to do a single operation on the CPU.   Reached 50,000 mults per second. Then rotating 3D points required sine and cosine so I had to optimize that as well plus square roots.  Lots of work and experimenting.  I could rotate 35 or so 3D points on Darth’s Tie Fighter at 3 frames a second including time to clear the screen and draw the lines one pixel at a time in assembly code.  This was 1 frame per second faster than Woz who built a 3D rotation utility program.  I didn’t realize the importance of what I was doing, was just trying to rotate a tie fighter to show my geeky friends.

@donaldhobson8873

The last term in the sum is 
n*1.5^k where k=log_2(n)
=n*(2^log_2(1.5))^k
=n*2^(log_2(1.5)*k)
=n*(2^k)^log_2(1.5)
=n*n^log_2(1.5)
=n^(1+log_2(1.5))
=n^(log_2(1.5*2))
=n^log_2(3)

So the last term in the sum, the bottom row of the recursion, takes asymptotically the same amount of compute as the whole thing. (up to a factor of 3.)

@kumarchandresh0

Ah, so nice to see manim being used to explain Programming concepts. Refreshing from all the 3b1b I've been watching.

@ke9tv

Most Python implementations use Comba multiplication rather than the schoolboy method for numbers too small to resort to the Karatsuba method. It has the same big-O time as the schoolboy method, but takes an aggressive approach to carry propagation that reduces the constant factor significantly. (The recursion in the Karatsuba logic also bottoms out into Comba, rather than going all the way down to single precision.)

Lovely explanation! Subscribed.

@Blingsss

This video beautifully illustrates how a mathematical breakthrough can change the course of technology. Karatsuba’s Algorithm remains a cornerstone in computational theory. Pairing these insights with SolutionInn’s study tools is a fantastic way to deepen understanding.

@ClownTZ-hg3ym

2:00
Pi(π) and Euler's Number(e) mentioned😮
Pi(π) number = 3.14159265358979
Euler's Number(e) = 2.71828

@mamiemando

Maybe you could be interested in the paper "addition is all you need" which describes an optimized multiplication for floats.

@skepley

This is a beautifully made video and I enjoyed it. I do want to point out however that as far as I know, classical computers still do not multiply big numbers this way because the FFT is even faster than Karatsuba's algorithm. Interestingly, this is no longer true on a quantum computer where FFT based integer multiplication is seemingly impossible or at least, infeasible. Karatsuba's algorithm on the other hand does appear to be implementable on a quantum computer.

@bananacraft69

We weren't taught some algorithm for multiplication, we were told to memorise the result of every multiplication x*y for 1 <= x, y <= 100

@anjuro

My first thought when I heard this, instead of splitting the numbers in half, split it by powers of 2 instead. For example 321 x 654 would become (256 + 65) * (512 + 142). The key point here is that finding the largest power of 2 that is smaller than each number is trivial (you just read the largest non zero bit) and 3 of the 4 resulting multiplications are also trivial, since they are multiplications by a power of 2 aka, you just bitshift left, so the only calculation which matters is 65*142 which we can apply the same idea on recursively. These resulting numbers are at most less than the power of 2 that was extracted out of the number in the previous step, and so in the worst case scenario we would have to perform logn recursions (this is the worst case, in most cases it will be less, if I am thinking about it correctly you need to perform as many steps as the smallest number of 1 bits in either input number). So I guess this is another nlogn algorithm (the only issue is, I am not 100% sure if there is a fast way in hardware to find the largest 1 bit in a number, even though it ought to be trivial). 

If this algortihm works I think you could improve it even more by a clever preprocessing step. For example 15*15 or in other words 1111 x 1111 in binary is the worst case multiplication for the algorithm for a 4 bit number input, but if we do it like this (16-1) * (16-1) then once again, we have 3 trivial multiplications by a power of 2 and 1 "real" multiplication which in this case also turns out to be trivial -1*-1. If there is an efficient way to choose the preprocessing step it would reduce the complexity further by making the worst case more favourable. One naive idea could be to scan the numbers, and every time we see 2 or more consecutive bits, we would add a bit on our preprocessing number to make those bits roll over, maybe its easier to show with an example, let's say we have the number 11011101 in binary, in this case we would preprocess this number by adding 01000100 to it, so it would become 100100001 which is more favourable to multiply. I guess we don't even have to do this, we could simply flip all of the bits in the input number and if there are less 1 bits in it then we know we can do the multiplication in less steps, otherwise, we use the original number. So in this case, the worst case scenario would be something like this 10101010 x 10101010 which I guess is still nlog(n) but with a smaller coefficient. I'm not 100% sure if the preprocessing step is always going to be faster cause in some cases, all 4 of the multiplications are "real", but maybe it still ends up working out who knows, at the very least I think it makes sense if you only try to roll over the bit sequence with the most consecutive 1 bits in the input number cause that's when 3 of the 4 multiplications are trivial and you are also getting the greatest bang for buck in terms of reducing the iterations. To me it feels like there is probably even more you could do but it probably only makes sense for larger numbers cause the preprocessing does also cost computation. Anyway, thanks for coming to my ted talk.

@MSRomsa

So. Idk if you know anything about Lambda Calculus, but there are logical shortcuts that can be used for both multiples and exponents. You can also store data in Vector types. It's pretty wonderful when you realize that the parameter for a vector doesn't have to represent a linear orthogonal plane in 2d or 3d space. A vector can represent complex and imaginary planes, higher dimensions, and the kicker is that you can just stick a function in as any parameter. Now that parameter isn't just a data point. It's a data curve.

@tahsinabrar-sl7yq

I would love to know what Alphazero has to say on the efficiency of these algorithms.

@Nododysgonnaknow

time complexity isn't a measure of how fast an algorithm is its how efficient it is how  fast does it longer as n approaches infinity

@frechjo

Nice explanation!
I would asume that the change in performance described at 16:12 happens because that's the length of a XMM register, 128 bits.
So it's probably not implementing any multiplication algorithm up to that point, but just using the CPU multiplication for "small" numbers. Once the number exceeds that length, it has to switch to bignums, so there's gonna be some algorithm in place for multiplication.
I would also expect the switch to happen a bit earlier tho, maybe at 126 bits, because of tagging. I'm not sure, but I think Python uses tagged pointers as an optimization (it's pretty standard), so that would take up a couple bits from the total available.
What's a bit weird is seeing the curve rise between 1 and 128 (or 126). I would think that at most one would see a step when (if) it switches between 64 and 128 register math. Maybe it's from the cost of moving between memory and registers?

@kirillsukhomlin3036

Native implementation might use SIMD parallelization, which saves a few binary orders of magnitude.

@warny1978

For results less than 128 bits, processors can multiply in one cycle if they are wired to do so. That migth explain the flat line on python that may use that particularity to improve computing speed. As it's true for x64, it's worth a try on ARM64.

@lugokomor

Your videos are awesome dude!  Glad to find you!

@blackhole37

Can you do the same but with division, exponentiation, and other algorithms ?

@minebuilder1805

I now use Karatsuba's algorythm in my math classes when the numbers get longer than 3 digits, I also calculated C for time complexity and at roughly 5.4 digits lenghth karatsuba get's faster though with Karatsuba's algorythm mixed with 9ths compliment it get's waay easier to write and calculate in less lines. From 3 digits onwards I go back to normal as there I'm reliably calculating the numbers together. Thanks for your great explaination once again. (C for me was something around 3, it was a whole number.)