On generating alttext for images with ollama

On generating alttext for images with ollama

I run a few bots on mastodon.ozioso.online. They are mainly focused on photography, either historical (Photochromprints) or of social relevance (LewisHine, DorotheaLange), have a browse to see all of them.

The source of the images are sometimes public collections that I can pull with an api or old fashioned web page scraping.

The main critique and/or question was about an alttext for the images, adding an alttext4me tag to a post was not always appreciated, this got some of the bots banned or reported.

After testing out various options I came across Ollama where you can run LLM's on your own hardware. The first results on a decent machine were decent, although it could take up to a minute or more to generate the text. Geek that I am I purchased a machine with a GPU to generate the alttexts.

There where still comments about the quality of the alltexts, sometimes the results were kind of ridiculous, sometimes highly inaccurate and filled with hallucinations. A majority of the ai descriptions were appropriate and correct. Although recognizable as AI generated.

Ollama

I am running ollama on a machine with 32GB memory and an RX 6650M GPU (8GB memory). the ollama setup needed some tweaking to use the GPU.

The main factors of getting a decent alttext are the question to ask and the model to use. The main model that can create description is llava, there are 2 versions, llava:13b (8gb) and the standard llava. Started with the bigger the better so I tested the llava:13b. The results were too interpretive, rather than a factual description of the image. I mainly played around with the question to ask.
After many tests I switched to the standard llava model and the descriptions became a little more factual and the results were quicker.

So that's going to be used......