Chatting with vision - GPT-4V

In the past couple of weeks OpenAI have been on a roll, releasing and updating functionality to enhance ChatGPT;

Code Interpreter has been renamed Advanced Data Analysis

Browse with Bing is back, baby

DALL·E 3 integration now allows you to generate images from within ChatGPT

In the ChatGPT mobile app, you can now use your voice to chat

But my favourite is the new GPT-4V(ision), which allows the upload of images and ability to ask questions of these images. 

For software developers, it is reasonably simple to implement a computer vision AI model, such as Google Cloud Vision that we used in our I Spy with My AI demo. However, until now there hasn’t been a widely available consumer service for generalised computer vision. 

By releasing GPT-4V(ision) within ChatGPT, OpenAI have unlocked a huge number of possibilities.

The research paper The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) goes into great depth into different use cases for consumers and businesses. Below are some examples from the paper.

Once we got access, we had tremendous fun experimenting with it. One idea we had was taking photos of post-it notes during workshops, and asking ChatGPT to type up the notes. Beyond that, ChatGPT also does a great job of sorting, categorising and explaining the themes from the post-its.

Transcription of post-it notes using GPT-4V

Currently GPT-4V(ision) is only available through the mobile app and Web site, but OpenAI have suggested an API version will be coming soon. This will open up a huge number of business uses, and the ability to easily build generalised computer vision into your own applications. We are already experimenting with GPT-4V(ision) for a client, ahead of the API release. If you have any ideas that you would like to discuss, please get in touch.

Previous
Previous

LLM experimentation framework for a global software company

Next
Next

Generative AI enablement for Griffith Hack