Recently, I launched a new section on my website to showcase some of my favorite photos taken during my travels, family events, and other occasions.
When posting these photos to Instagram, I found it quite cumbersome to manually extract the EXIF metadata. So, I wanted to automate this process for my own website. I aimed not only to generate good alt text for the images but also to come up with titles and tags for the photos. Let’s let AI do the heavy lifting.
Extracting EXIF Metadata
Why Extract EXIF Data?
EXIF (Exchangeable Image File Format) is a standard that specifies formats for images, audio, and additional tags used by digital cameras (including smartphones), scanners, and other systems handling image files.
I take photos with a Canon R6 Mark II, which stores extensive metadata in the EXIF format. This includes the camera model, the lens used, focal length, aperture, shutter speed, ISO, and so on. Since this data can be quite useful for photography enthusiasts and offer insights into how a picture was taken, I wanted to extract this information and display it on my website.
Python and Pillow to the Rescue
I decided to use Python to extract the EXIF data from my photos, utilizing the Pillow library. Pillow provides many powerful features, including the ability to read and write EXIF data by tag ID.
Note the conversion of exposure_time
to a more camera-like and human-readable shutter_speed
format (e.g., “1/250”).
Generating Alt Text, Title Suggestions, and Tags
Just posting an image without any context is not very helpful, not only for visually impaired people. So, I wanted to generate not only good alt text for the images but also to get some title inspirations as well as tags for the photos.
This is where GPT-4 Vision comes into play. It is a relatively new AI model from OpenAI that extends GPT-4 with vision capabilities and therefore can “understand” images.
The Prompt
My goal was to receive a response from the OpenAI API in JSON format. Unfortunately, at the time of writing, GPT-4V did not support function calling to easily get back structured response data. And as mentioned before, I wanted to get the following information:
- Alt Text: A textual description of the image, suitable for visually impaired people.
- Title Suggestions: A list of five title suggestions for the image.
- Tags: A list of five tags that best describe the image.
So, with a bit of trial and error, this was my final prompt:
As you can see, providing a JSON example of the expected output helped me get the desired response from the API with the fields of interest. I also had to explicitly request that it not return Markdown, as it sometimes returned Markdown code blocks and other times plain JSON.
Astro Content Collections
My website is powered by Astro, and the new Grid section is created from an Astro Content Collection in data-only mode with plain JSON files.
The Python script directly creates the JSON files. I just check the AI-generated texts, choose a title from the suggestions or write my own, and then update the final JSON schema.
The Script on GitHub
I have published the Python script on GitHub. You can find it here on GitHub. All you need is a Python environment and an OpenAI API key that is enabled for GPT-4 Vision.
Conclusion
I am quite happy with the results. The Python script extracts the EXIF data, and the AI generates the alt text, title suggestions, and tags. This way, I can automate the process of adding new photos to my website, making them more accessible and informative. The whole process actually feels like fun and not like a chore.