I have given HLD interviews and they are not just about making high-level boxes. What kind of boxes (type of cache, type of DB) and how will they solve the problem (how will cache and rate limiter interact to decide how many requests have come for a user in a window). This definitely needed a more elaborate discussion. Informative otherwise, thanks!
This feels very scripted. It's almost like he is reading out of a reference book.
straight outta the alex xu book. Is this scripted? Doesn't feel like a conversation at all. All rote learning. Just my two cents here, still a good video.
I would wanna hear the discussion of locking or not locking the high throughput cache while writing. Great video overall!
I liked how Huzaifa tried to act dumb the whole time, while being an EM.
The sliding window approach explained seems the same as token bucket. I think in time sliding window, each request will have timestamp and whatever requests are within the window will be eligible to process unless if there is new request and the time window slide towards the new request timestamp removing any older requests (if) out of the window and for those removed requests 423 will be sent. Or else, the window won't be sliding at all unless it can to accommodate the new request other wise it will send 423 for new requests. Either approach is fine depending on requirements.
Did the candidate even design the rate limiter? The most important part of the design is the actual rate limiter component and they just put two boxes called "API Rate Limiter". Maybe interviewer would get enough signal on this, maybe not. Definitely could have been better.
This is more of a presentation than interview. Interview should start by poking interviewer with questions to scope down the design that can be covered for a interview session.
Rate limiter sits in between load balancer and web servers doesn't seem to a neat design at all, because it creates endless of trouble. How do you decide which web server to send a request to when it succeeds rate limit? Does the rate limiter service maintains Web Server auto scale group information? It should be a sit-aside service where LB (or web servers) does a check by calling it. Or it could even be a library in Web Server, but definitely not a pass through component.
Wont the cache here for checking IP being blocked or not for each request be detrimental for the system and on a peak load scenario the window will slide in sub milliseconds...so the request which is in cache may be invalid for the new window duration
I think the part of making the rate limiter distributed could be explained better. What does "one common cache" mean? Also the "read cache" and "write cache" were quite confusing, but the interviewer didn't do her job to dig through.
Thanks for the video! Could it not cause problems to rate limit on IP if multiple users are behind the same IP? Like in the case of CGNAT or VPN or similar?
wouldn't using VPNs result in the user's IP address not being consistent anyway? They get rate limited, so they use their VPN to pretend their somewhere else. I think you would probably need some combination of IP address and user_id.
In a distributed env, I was thinking if the load balancer did geolocation based routing and have each rate limiter in each region , with its own isolated region specific cache ? No?
How does it come that this vid has only 2k views? Awesome content!!
@tryexponent In the video, when asked For the distributed environment, i see the request goes first to load balancer, and then to Rate limiter, but isnt the request first goes to Rate Limiter and then load balancer?
11:30 I think 429 is more suitable status code.
what are the whiteboarding tool that you recommend for system design interviews or brainstorming sessions?
feedback : there was no discussion between these 2 guys.. they were just running at their own speed...
@tryexponent