I think I might find this helpful at this level of capability! I developed my own "computer use" tool shortly after function calling became a thing. Just an experiment. It used screenshots, OCR + screen-coord data to orient the LLM and give it context. But... it was pretty limited. Fun watching it move the mouse around, click, search, open apps, open terminal, type, etc. but it could not interact with much beyond basic/core windows interactions. Plus, it would stop randomly, click the wrong buttons, shut down my computer, etc. This is far better.
3:21 starts here
This seems like a great tool for accessibility, especially if in the future you could dictate your command via voice input. One thing to make it even better is if you could give it commands while having another program/window open and you continue working on your program while you have the agent perform another task for you in the background. This way you aren't just waiting for it to finish for you to do something, you can have it perform a task in parallel. I assume this is difficult because it takes screenshots of your screen to perform tasks, so if you are also using the screen for something else and using the mouse and keyboard for something else, it make cause issues, but there should be ways via software to have 2 mouse cursors at once and hide your "active" window from what the agent sees. Having 2 users on the same computer with different mouse and keyboard input has been done many years ago, so this doesn't seem like such a difficult ask, hopefully we can see something like this with future versions.
Tiff, your honesty is refreshing, thank you all those other gurus are talking about how great this works, it does not, as you said it has a long way to go....
Navigating the setup and rate limitations you mentioned truly highlights the push-pull of AI development. Early adopters like yourself help pave the way for more seamless use in the future. Thanks for sharing the experience!
I tried it tonight following your tutorial, waste of time, 6 actions cost me 0.29 Euro, also the job was simple and easely reproduced by micro programming, this was slow and overly expensive. I think they were making this a big news, but it is actually nothing that the current tech cannot replace.
GREAT VIDEO! Thank you
Thanks for teach and share!!! Blessings
I'm not surprised you were rate limited. Other sources have told me this flies through tokens so quickly. The potential is great but it really reminds you how much processing power it takes to do things we take for granted.
Your advices are always useful. I was heard about but now I have notice how so great tool is it. Cheers Tiff!
Welcome to Dublin!
that was 1 minute and 42 seconds, not 1 hour 42 minutes, but you are right, it was released to the wild a tad bit early! hope you are enjoying Dublin regardless!
It'll be super cool if this could be fine tuned or provided additional functionality/ streamlining for working with cumbersome tools like spreadsheets. I guess it is open source so I guess it's on us to do that!
Thanks Tiff, very cool. I loved the simplicity of the Docker instance the most. Al Swegart in his Automate The Boring Stuff With Python has an interesting Python web automation example, which looks similar to what Anthropic is tapping. Lots of course curation potential in this future application I think.
This seems to be for Linux or Mac. How to do it for Windows? I have already included my API key in the docker run, so no need for the set anymore. but what about the file directory mount?
Hello Everyone.....I use the Ai agent daily and I'm very happy to use this tool
you promised 'links down below' for some things that do not appear 'below'
🔥🔥😮
So are using the paid version? Have you figured out why is failed so quickly... i saw another video where it worked almost flawlessly.
@TiffInTech