API Access To GPT-4 With Vision:
Following its launch, OpenAI’s ChatGPT has evolved by leaps and bounds and also recently announced API access to GPT-4 with Vision.
- GPT-4 Vision also referred to as GPT-4V which allows users to instruct GPT-4 to analyse image inputs.
- It has been considered OpenAI’s step forward towards making its chatbot multimodal an AI model with a combination of image, text and audio as inputs.
- It allows users to upload an image as input and ask a question about it.
- This task is known as visual question answering (VQA).
- It is a Large Multimodal Model or LMM, which is essentially a model that is capable of taking information in multiple modalities like text and images or text and audio and generating responses based on it.
- It has capabilities such as processing visual content including photographs, screenshots, and documents.
- The latest iteration allows it to perform a slew of tasks such as identifying objects within images, and interpreting and analysing data displayed in graphs, charts, and other visualisations.
- It can also interpret handwritten and printed text contained within images.
- This is a significant leap in AI as it, in a way, bridges the gap between visual understanding and textual analysis.