@hello_interview

Oops - made a mistake at 6:06. The error here is missing a page (with low probability) not re-crawling a page. Thanks @spencer856 and others for calling this out.

@glowfall

I'd add two classic examples of real-world use of Bloom filters:
1. An in-memory cache in front of SSTables stored on disk in Apache Cassandra. This is a clever solution because SSTables are naturally immutable and do not require any synchronization (unlike the previously mentioned example with a generic cache).
2. Malicious URL detection in web browsers. Browsers download a compressed Bloom filter containing known harmful sites and first check the visited URL against it. If the filter indicates a possible match, the browser then performs a secondary verification with the server.

@iChrisBirch

Awesome video, loved the examples and clear explanation.  Mostly the diagrams with speaker in corner.  No unnecessary distracting animations or sound effects.  Great work!

@alby-p8s

Thank you for leaving in the bit with the math, it's nice to know everyone second guesses themselves on it haha

@spencer856

In your web crawler example you say that the bloom filter would make it so you'd re-crawl a page sometimes. The way I was interpreting it was that you'd use the bloom filter to determine if you've already crawled the page, since it gives false positives not false negatives, shouldn't it cause you to skip some pages that you haven't crawled rather than causing you to recrawl pages you have?

@shashwatkumar6965

HyperLogLog keeps count of max "leading zeroes" and not "trailing zeroes". Most hashing algorithms like SHA or Murmur try to do uniform randomization which means every bit has an equal chance of being 0 or 1. This means the chance of hash starting with 1st bit as zero is 1/2, chance of leading 2 zeroes is 1/4, chance of 3 is 1/8. ie chance of k leading zeroes is 1/2^(k+1). 

If we take hashes for n samples, then probability of getting 1 sample's hash with k leading zeroes => n * (1/2^(k+1) = 1. 
Solve for n:
1/n = 1/2^(k+1)   =>   n = 2^(k+1)

so this means if we have max 3 leading zeroes then to get 1 hash with 3 leading zeroes we should have hashed ~= 2^3 unique elements.

@avinashpandey6028

Not sure if cache example is valid. If you have a single server you can keep the bloom filter in memory of the server but if you have many servers(most likely) you may have extract out the bloom filter to an external component which will be same as having to query a cache. So not able to understand how exactly a bloom filter is adding an advantage here.

@souravmondal_

What was that behind the scenes-
 MEGAAAAA!

@Jowettchng

I haven’t watched it, but just wanted to say thank for a new video!! Looking forward to watching it!

@saketp18

I like the approach aim for simplicity and solve for bottleneck, rather then solving imaginary problems.

@AnonCom-g5j

Hey Stefan, do you have any advice for completing a secondary behavioral interview for Meta E4? Should I refine the stories I had rehearsed for the first or should I create entirely new ones? Can you share what improvements interviewers would be looking for?  Any other prep besides watching the conversation with Austen and continuing to practice? Btw, big shoutout to HelloInterview for helping me pass the system design.

@AniketSingh-nx4ds

Please make a system design video on Google docs product. Would be a different architecture from all the existing ones on your channel

@AniketSingh-nx4ds

Can we please get one on database concurreny optimising mechanims. Like discussing single leader replication, multi leader repliaction, plus how write conflicts are handled, abd how to use one in a system design interview

@indrajitvijayakumar6021

Thank you so much for this amazing content! :)

@sunderviswanathan9134

this is the first video I felt I need a 1 hour explanation . Maybe I am slow to process this since I am new but the pace felt too much

@gsriram7

19:20 Fred Again, well played! 😄

@adboateng

Can you please do a system design for a peer-to-peer carsharing/rental marketplace?

@skullTT

hyperloglog only need to keep track of the largest trailing 0s. In terms of space, it needs only one integer for each bucket? Then in real scenario, how many buckets do we use?

@bumbumbole1236

How did Christina caused Index 3  Hash 1 at @10:22 from 1->3

@JenekPES

11:55 if I got this right, hash 1 index 3 should've had a value of 2, as we are incrementing a value each time?