TRELLIS: One Latent for Any 3D Asset
Soroush Mehraban
TRELLIS: One Latent for Any 3D Asset
14:46
LightlyTrain - Train Better Models, Faster - No Labels Needed
Soroush Mehraban
LightlyTrain - Train Better Models, Faster - No Labels Needed
14:38
One-step Diffusion with Distribution Matching Distillation
Soroush Mehraban
One-step Diffusion with Distribution Matching Distillation
19:30
Variational Score Distillation (VSD) Helps Create Amazing 3D Scenes From Text Prompts
Soroush Mehraban
Variational Score Distillation (VSD) Helps Create Amazing 3D Scenes From Text Prompts
14:22
Dream-in-4D: Paper Explained!
Soroush Mehraban
Dream-in-4D: Paper Explained!
10:29
FreeU - Paper Explained
Soroush Mehraban
FreeU - Paper Explained
6:37
AnimateDiff - Paper explained!
Soroush Mehraban
AnimateDiff - Paper explained!
7:23
DreamFusion: Text-to-3D using 2D Diffusion
Soroush Mehraban
DreamFusion: Text-to-3D using 2D Diffusion
14:06
Null-text Inversion for Editing Real Images using Guided Diffusion Models
Soroush Mehraban
Null-text Inversion for Editing Real Images using Guided Diffusion Models
11:39
Prompt-to-Prompt (P2P) image Editing - Method Explained
Soroush Mehraban
Prompt-to-Prompt (P2P) image Editing - Method Explained
10:04
Denoising Diffusion Null-Space Model (DDNM) - Method Explained
Soroush Mehraban
Denoising Diffusion Null-Space Model (DDNM) - Method Explained
30:57
Autoregressive Image Generation without Vector Quantization
Soroush Mehraban
Autoregressive Image Generation without Vector Quantization
21:44
Diffusion Models (DDPM & DDIM) - Easily explained!
Soroush Mehraban
Diffusion Models (DDPM & DDIM) - Easily explained!
18:28
GLIGEN (CVPR2023): Open-Set Grounded Text-to-Image Generation
Soroush Mehraban
GLIGEN (CVPR2023): Open-Set Grounded Text-to-Image Generation
10:46
The Entropy Enigma: Success and Failure of Entropy Minimization
Soroush Mehraban
The Entropy Enigma: Success and Failure of Entropy Minimization
9:09
Tent: Fully Test-time Adaptation by Entropy Minimization
Soroush Mehraban
Tent: Fully Test-time Adaptation by Entropy Minimization
13:16
VPD (ICCV2023): Unleashing Text-to-Image Diffusion Models for Visual Perception
Soroush Mehraban
VPD (ICCV2023): Unleashing Text-to-Image Diffusion Models for Visual Perception
9:44
TokenHMR (CVPR2024): Advancing Human Mesh Recovery witha Tokenized Pose Representation
Soroush Mehraban
TokenHMR (CVPR2024): Advancing Human Mesh Recovery witha Tokenized Pose Representation
30:13
SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design
Soroush Mehraban
SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design
22:26
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Soroush Mehraban
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
22:17
FastV: An Image is Worth 1/2 Tokens After Layer 2
Soroush Mehraban
FastV: An Image is Worth 1/2 Tokens After Layer 2
14:10
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Soroush Mehraban
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
28:39
PoseGPT (ChatPose): Chatting about 3D Human Pose
Soroush Mehraban
PoseGPT (ChatPose): Chatting about 3D Human Pose
32:22
MotionAGFormer (WACV2024): Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network
Soroush Mehraban
MotionAGFormer (WACV2024): Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network
9:13
HD-GCN (ICCV2023): Skeleton-Based Action Recognition
Soroush Mehraban
HD-GCN (ICCV2023): Skeleton-Based Action Recognition
35:08
ST-GCN: Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Soroush Mehraban
ST-GCN: Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
8:25
Graph Convolutional Networks (GCN): From CNN point of view
Soroush Mehraban
Graph Convolutional Networks (GCN): From CNN point of view
13:08
DINO: Self-Supervised Vision Transformers
Soroush Mehraban
DINO: Self-Supervised Vision Transformers
21:12
MoCo (+ v2): Unsupervised learning in computer vision
Soroush Mehraban
MoCo (+ v2): Unsupervised learning in computer vision
31:03
ViTPose: 2D Human Pose Estimation
Soroush Mehraban
ViTPose: 2D Human Pose Estimation
22:30
TrackFormer: Multi-Object Tracking with Transformers
Soroush Mehraban
TrackFormer: Multi-Object Tracking with Transformers
28:40
MetaFormer is Actually What You Need for Vision
Soroush Mehraban
MetaFormer is Actually What You Need for Vision
10:59
ConvNet beats Vision Transformers (ConvNeXt) Paper explained
Soroush Mehraban
ConvNet beats Vision Transformers (ConvNeXt) Paper explained
21:00
Swin Transformer V2 - Paper explained
Soroush Mehraban
Swin Transformer V2 - Paper explained
21:32
Masked Autoencoders (MAE) Paper Explained
Soroush Mehraban
Masked Autoencoders (MAE) Paper Explained
15:20
Relative Position Bias (+ PyTorch Implementation)
Soroush Mehraban
Relative Position Bias (+ PyTorch Implementation)
23:13
Swin Transformer - Paper Explained
Soroush Mehraban
Swin Transformer - Paper Explained
19:59
Vision Transformer (ViT) Paper Explained
Soroush Mehraban
Vision Transformer (ViT) Paper Explained
6:41
Convolutional Block Attention Module (CBAM) Paper Explained
Soroush Mehraban
Convolutional Block Attention Module (CBAM) Paper Explained
7:05
Squeeze-and-Excitation Networks (SENet) paper explained
Soroush Mehraban
Squeeze-and-Excitation Networks (SENet) paper explained
9:11
Faster R-CNN: Faster than Fast R-CNN!
Soroush Mehraban
Faster R-CNN: Faster than Fast R-CNN!
12:18
Receptive Fields: Why 3x3 conv layer is the best?
Soroush Mehraban
Receptive Fields: Why 3x3 conv layer is the best?
8:11
Fast R-CNN: Everything you need to know from the paper
Soroush Mehraban
Fast R-CNN: Everything you need to know from the paper
38:37
R-CNN: Clearly EXPLAINED!
Soroush Mehraban
R-CNN: Clearly EXPLAINED!
18:32