Image SEO used to mean writing alt text and compressing files. Both still matter, but the ground has shifted. AI models now look at images directly, not just the text around them, and that changes what good image practice looks like for a store. Commerce is visual, so this is worth getting right.
Alt text: write it for the person who cannot see the image
Start with the original purpose, because it is still the most important. Alt text describes an image for screen-reader users and for any system that cannot see the pixels. Write it as you would describe the image to someone over the phone who is deciding whether to buy.
"Black merino running crew, front view on a model" is good. "running shirt buy now best merino 2026" is keyword stuffing that helps no one and reads terribly aloud. Describe what is actually in the frame: the product, the color, the angle, anything a shopper would want to know.
A few practical rules. Every meaningful product image needs alt text. Purely decorative images, like a background flourish, should have empty alt text so screen readers skip them rather than announce noise. Keep it concise, a sentence at most. And vary it per image: the front view, the back view, and the detail shot should have different alt text because they show different things.
On Shopify, alt text lives in the product image settings. Themes pull it into the alt attribute automatically, so the work is in writing it, not in wiring it.
File names still count, a little
The image file name is a minor signal, but minor is not zero, and it is free. merino-running-crew-black-front.jpg is better than IMG_4821.jpg. You set this before upload; Shopify keeps the name. It will not move rankings on its own, but combined with good alt text it reinforces what the image is.
Performance is image SEO too
The fastest way to fail Core Web Vitals on a store is heavy images, and slow pages rank worse and convert worse. Serve modern formats, let Shopify's CDN deliver appropriately sized versions rather than a 4000px original on a thumbnail, and never lazy-load the one hero image that should appear first. I have written more about this in the Core Web Vitals post, but it belongs in any image-SEO conversation because the line between the two has basically dissolved.
The new part: AI vision reads your images
This is what has actually changed. Modern AI models process images directly. When an assistant evaluates your product, it can look at the photo and reason about it, not only read the alt text. That has two implications.
First, your actual photography matters for machine comprehension, not just human appeal. Clear, well-lit images on clean backgrounds are easier for a model to interpret correctly, the same way they are easier for a shopper. A cluttered or ambiguous image is ambiguous to the model too.
Second, alt text and visible captions still matter as the labeled, authoritative description. The model can guess what it sees, but your alt text and structured data tell it for certain. When the image, the alt text, and the structured data all agree that this is a black merino crew, the model is confident. When they conflict or the labels are missing, it falls back to guessing, and guessing is where wrong recommendations come from. The job is to make the image legible and then confirm what it shows in text.
Including image URLs in your Product structured data is part of this. The image field in your Product JSON-LD points search engines and assistants at the canonical photos for the product, which is exactly what they want when assembling a result.
Doing it across a catalog
Writing alt text for one product is a five-minute job. Writing it for five thousand products is the reason most catalogs have either no alt text or autogenerated junk like the file name repeated. Empty alt text is an accessibility failure and a missed signal; junk alt text is arguably worse.
This is a strong use case for AI vision in your own workflow. A model can look at each product image and draft accurate, descriptive alt text far faster than a person can, describing what is genuinely in the frame. The rule is the same as everywhere else: it drafts, a human reviews, and nothing publishes silently over a value you set on purpose. A vision model writing "black merino crew, front view" for ten thousand images, with a human spot-checking, is the difference between an accessible, legible catalog and an impossible manual task.
We build this review-first workflow into the stores we run, and generating accurate alt text from the actual product images is one of the things AgentReady does for merchants directly, with every suggestion shown before it is applied. However you handle it, the goal holds: describe the image honestly for the person who cannot see it, keep the files fast, shoot clear photos the machines can read, and make sure the labels agree with the picture.

