Even if it has reduced halluciations when dealing with text, I think it might still have major hurdles with such when analyzing images and videos.
@Guyverman01