Soroush Mehraban
TRELLIS: One Latent for Any 3D Asset
14:46
Soroush Mehraban
LightlyTrain - Train Better Models, Faster - No Labels Needed
14:38
Soroush Mehraban
One-step Diffusion with Distribution Matching Distillation
19:30
Soroush Mehraban
Variational Score Distillation (VSD) Helps Create Amazing 3D Scenes From Text Prompts
14:22
Soroush Mehraban
Dream-in-4D: Paper Explained!
10:29
Soroush Mehraban
FreeU - Paper Explained
6:37
Soroush Mehraban
AnimateDiff - Paper explained!
7:23
Soroush Mehraban
DreamFusion: Text-to-3D using 2D Diffusion
14:06
Soroush Mehraban
Null-text Inversion for Editing Real Images using Guided Diffusion Models
11:39
Soroush Mehraban
Prompt-to-Prompt (P2P) image Editing - Method Explained
10:04
Soroush Mehraban
Denoising Diffusion Null-Space Model (DDNM) - Method Explained
30:57
Soroush Mehraban
Autoregressive Image Generation without Vector Quantization
21:44
Soroush Mehraban
Diffusion Models (DDPM & DDIM) - Easily explained!
18:28
Soroush Mehraban
GLIGEN (CVPR2023): Open-Set Grounded Text-to-Image Generation
10:46
Soroush Mehraban
The Entropy Enigma: Success and Failure of Entropy Minimization
9:09
Soroush Mehraban
Tent: Fully Test-time Adaptation by Entropy Minimization
13:16
Soroush Mehraban
VPD (ICCV2023): Unleashing Text-to-Image Diffusion Models for Visual Perception
9:44
Soroush Mehraban
TokenHMR (CVPR2024): Advancing Human Mesh Recovery witha Tokenized Pose Representation
30:13
Soroush Mehraban
SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design
22:26
Soroush Mehraban
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
22:17
Soroush Mehraban
FastV: An Image is Worth 1/2 Tokens After Layer 2
14:10
Soroush Mehraban
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
28:39
Soroush Mehraban
PoseGPT (ChatPose): Chatting about 3D Human Pose
32:22
Soroush Mehraban
MotionAGFormer (WACV2024): Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network
9:13
Soroush Mehraban
HD-GCN (ICCV2023): Skeleton-Based Action Recognition
35:08
Soroush Mehraban
ST-GCN: Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
8:25
Soroush Mehraban
Graph Convolutional Networks (GCN): From CNN point of view
13:08
Soroush Mehraban
DINO: Self-Supervised Vision Transformers
21:12
Soroush Mehraban
MoCo (+ v2): Unsupervised learning in computer vision
31:03
Soroush Mehraban
ViTPose: 2D Human Pose Estimation
22:30
Soroush Mehraban
TrackFormer: Multi-Object Tracking with Transformers
28:40
Soroush Mehraban
MetaFormer is Actually What You Need for Vision
10:59
Soroush Mehraban
ConvNet beats Vision Transformers (ConvNeXt) Paper explained
21:00
Soroush Mehraban
Swin Transformer V2 - Paper explained
21:32
Soroush Mehraban
Masked Autoencoders (MAE) Paper Explained
15:20
Soroush Mehraban
Relative Position Bias (+ PyTorch Implementation)
23:13
Soroush Mehraban
Swin Transformer - Paper Explained
19:59
Soroush Mehraban
Vision Transformer (ViT) Paper Explained
6:41
Soroush Mehraban
Convolutional Block Attention Module (CBAM) Paper Explained
7:05
Soroush Mehraban
Squeeze-and-Excitation Networks (SENet) paper explained
9:11
Soroush Mehraban
Faster R-CNN: Faster than Fast R-CNN!
12:18
Soroush Mehraban
Receptive Fields: Why 3x3 conv layer is the best?
8:11
Soroush Mehraban
Fast R-CNN: Everything you need to know from the paper
38:37
Soroush Mehraban
R-CNN: Clearly EXPLAINED!
18:32