One of the defining features of CatBoost is its concerted effort to avoid data leakage at all costs. In this video, we'll see how it eliminates a potential threat in Target Encoding by ordering the data and encoding it sequentially. This ordered approach is central to everything CatBoost does and we'll see it again in Part 2 when we talk about how it builds trees.
NOTE: This StatQuest is based on the original CatBoost manuscript... arxiv.org/abs/1706.09516
...and an example provided in the CatBoost documentation...
catboost.ai/en/docs/concepts/algorithm-main-stages…
English
This video has been dubbed using an artificial voice via aloud.area120.google.com/ to increase accessibility. You can change the audio track language in the Settings menu.
Spanish
Este video ha sido doblado al español con voz artificial con aloud.area120.google.com/ para aumentar la accesibilidad. Puede cambiar el idioma de la pista de audio en el menú Configuración.
Portuguese
Este vídeo foi dublado para o português usando uma voz artificial via aloud.area120.google.com/ para melhorar sua acessibilidade. Você pode alterar o idioma do áudio no menu Configurações.
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Patreon: www.patreon.com/statquest
...or...
YouTube Membership: youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
statquest.org/statquest-store/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
twitter.com/joshuastarmer
0:00 Awesome song and introduction
1:56 A slight problem with k-fold target encoding
3:42 Ordered Target Encoding
Corrections:
4:09 It is also worth noting that if there were more than 2 target values, for example, if Loves Troll 2 could be 0, 1 and 2, then, when calculating the OptionCount for a sample with Loves Troll 2 = 1, we would include rows that had Loves Troll 2 = 1 and 2.
#StatQuest #CatBoost #dubbedwithaloud
コメント