Soroush Mehraban

TRELLIS: One Latent for Any 3D Asset

Soroush Mehraban

TRELLIS: One Latent for Any 3D Asset

14:46

LightlyTrain - Train Better Models, Faster - No Labels Needed

Soroush Mehraban

LightlyTrain - Train Better Models, Faster - No Labels Needed

14:38

One-step Diffusion with Distribution Matching Distillation

Soroush Mehraban

One-step Diffusion with Distribution Matching Distillation

19:30

Variational Score Distillation (VSD) Helps Create Amazing 3D Scenes From Text Prompts

Soroush Mehraban

Variational Score Distillation (VSD) Helps Create Amazing 3D Scenes From Text Prompts

14:22

Dream-in-4D: Paper Explained!

Soroush Mehraban

Dream-in-4D: Paper Explained!

10:29

FreeU - Paper Explained

Soroush Mehraban

FreeU - Paper Explained

6:37

AnimateDiff - Paper explained!

Soroush Mehraban

AnimateDiff - Paper explained!

7:23

DreamFusion: Text-to-3D using 2D Diffusion

Soroush Mehraban

DreamFusion: Text-to-3D using 2D Diffusion

14:06

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Soroush Mehraban

Null-text Inversion for Editing Real Images using Guided Diffusion Models

11:39

Prompt-to-Prompt (P2P) image Editing - Method Explained

Soroush Mehraban

Prompt-to-Prompt (P2P) image Editing - Method Explained

10:04

Denoising Diffusion Null-Space Model (DDNM) - Method Explained

Soroush Mehraban

Denoising Diffusion Null-Space Model (DDNM) - Method Explained

30:57

Autoregressive Image Generation without Vector Quantization

Soroush Mehraban

Autoregressive Image Generation without Vector Quantization

21:44

Diffusion Models (DDPM & DDIM) - Easily explained!

Soroush Mehraban

Diffusion Models (DDPM & DDIM) - Easily explained!

18:28

GLIGEN (CVPR2023): Open-Set Grounded Text-to-Image Generation

Soroush Mehraban

GLIGEN (CVPR2023): Open-Set Grounded Text-to-Image Generation

10:46

The Entropy Enigma: Success and Failure of Entropy Minimization

Soroush Mehraban

The Entropy Enigma: Success and Failure of Entropy Minimization

9:09

Tent: Fully Test-time Adaptation by Entropy Minimization

Soroush Mehraban

Tent: Fully Test-time Adaptation by Entropy Minimization

13:16

VPD (ICCV2023): Unleashing Text-to-Image Diffusion Models for Visual Perception

Soroush Mehraban

VPD (ICCV2023): Unleashing Text-to-Image Diffusion Models for Visual Perception

9:44

TokenHMR (CVPR2024): Advancing Human Mesh Recovery witha Tokenized Pose Representation

Soroush Mehraban

TokenHMR (CVPR2024): Advancing Human Mesh Recovery witha Tokenized Pose Representation

30:13

SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design

Soroush Mehraban

SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design

22:26

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Soroush Mehraban

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

22:17

FastV: An Image is Worth 1/2 Tokens After Layer 2

Soroush Mehraban

FastV: An Image is Worth 1/2 Tokens After Layer 2

14:10

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Soroush Mehraban

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

28:39

PoseGPT (ChatPose): Chatting about 3D Human Pose

Soroush Mehraban

PoseGPT (ChatPose): Chatting about 3D Human Pose

32:22

MotionAGFormer (WACV2024): Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network

Soroush Mehraban

MotionAGFormer (WACV2024): Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network

9:13

HD-GCN (ICCV2023): Skeleton-Based Action Recognition

Soroush Mehraban

HD-GCN (ICCV2023): Skeleton-Based Action Recognition

35:08

ST-GCN: Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Soroush Mehraban

ST-GCN: Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

8:25

Graph Convolutional Networks (GCN): From CNN point of view

Soroush Mehraban

Graph Convolutional Networks (GCN): From CNN point of view

13:08

DINO: Self-Supervised Vision Transformers

Soroush Mehraban

DINO: Self-Supervised Vision Transformers

21:12

MoCo (+ v2): Unsupervised learning in computer vision

Soroush Mehraban

MoCo (+ v2): Unsupervised learning in computer vision

31:03

ViTPose: 2D Human Pose Estimation

Soroush Mehraban

ViTPose: 2D Human Pose Estimation

22:30

TrackFormer: Multi-Object Tracking with Transformers

Soroush Mehraban

TrackFormer: Multi-Object Tracking with Transformers

28:40

MetaFormer is Actually What You Need for Vision

Soroush Mehraban

MetaFormer is Actually What You Need for Vision

10:59

ConvNet beats Vision Transformers (ConvNeXt) Paper explained

Soroush Mehraban

ConvNet beats Vision Transformers (ConvNeXt) Paper explained

21:00

Swin Transformer V2 - Paper explained

Soroush Mehraban

Swin Transformer V2 - Paper explained

21:32

Masked Autoencoders (MAE) Paper Explained

Soroush Mehraban

Masked Autoencoders (MAE) Paper Explained

15:20

Relative Position Bias (+ PyTorch Implementation)

Soroush Mehraban

Relative Position Bias (+ PyTorch Implementation)

23:13

Swin Transformer - Paper Explained

Soroush Mehraban

Swin Transformer - Paper Explained

19:59

Vision Transformer (ViT) Paper Explained

Soroush Mehraban

Vision Transformer (ViT) Paper Explained

6:41

Convolutional Block Attention Module (CBAM) Paper Explained

Soroush Mehraban

Convolutional Block Attention Module (CBAM) Paper Explained

7:05

Squeeze-and-Excitation Networks (SENet) paper explained

Soroush Mehraban

Squeeze-and-Excitation Networks (SENet) paper explained

9:11

Faster R-CNN: Faster than Fast R-CNN!

Soroush Mehraban

Faster R-CNN: Faster than Fast R-CNN!

12:18

Receptive Fields: Why 3x3 conv layer is the best?

Soroush Mehraban

Receptive Fields: Why 3x3 conv layer is the best?

8:11

Fast R-CNN: Everything you need to know from the paper

Soroush Mehraban

Fast R-CNN: Everything you need to know from the paper

38:37

R-CNN: Clearly EXPLAINED!

Soroush Mehraban

R-CNN: Clearly EXPLAINED!

18:32

次のページ