AI@home: Classifying images with Ollama – part six: More about locations – and even more prompt improvements


In part five, I started on injecting info about the place the picture was taken into my classifier. I did this by taking the embedded GPS coordinates from an image and translate this to something more meaningful.

I had two ways to translate this:

  1. I had a register of places, where I’d manually create a mapping from a GPS tag in a picure.
  2. I used an OSM-based reverse geocoding service.

For the manual register, I also introcuded kinds, basically to decide what kind of radius a place had. A village is larger than a restaurant, so for village I might have a 2km radius while for a restaurant, maybe 50 or 100 meters.

For the OSM-based service, I basically used the hierarchy and cut off at neighborhood level, so it’d always try to pinpoint «what neighborhood» the place was in.

Issues and solutions

For the manual register, I’m pretty happy so far. But for the OSM-based reverse geocoding service, it didn’t always fit. If a picture is from a mountain top, I’d definitely want the mountain top as the place, and likewise for cafes/restaurants, viewpoints, museums, and a lot of other things.

OSM data is registered by a class and type. A class can for example be tourism for tourism type features, another class is natural, for things like mountains and lakes. For a mountain, for example Mount Fuji, the OSM data might be natural=peak, name=»Mount Fuji». Or it might be natural=volcano (in case of Mount Fuji, there’s actually both).

So, I decided I needed to find a way to select some kind of points I specifically wanted as leaf instead of administrative borders. I don’t necessarily want all point types, I only select the ones I’m interested in.

Examples:

  • amenity=cafe should get the kind cafe
  • amenity=restaurant should get the kind restaurant
  • natural=peak I gave the kind mountain_top
  • natural=volcano I also map to mountain_top for now
  • natural=cliff I map to viewpoint
  • tourism=viewpoint I also map to viewpoint.

I also had to classify the mapping into a few different semantic roles.

  • mountain_top generally has a summit and a view
  • viewpoint just have view, per default.
  • structure doesn’t have a summit, but it might have a view (to be decided per type of structure).

This last semantic role part is likely not finished, but I am testing with that kind of mapping for now. Everything that doesn’t get a semantic role is just a place, and it probably doesn’t have a view etc.

But I’ll describe more about the reasoning around this later, for the prompt improvement part of this blog post.

Some mappings

Part of my mapping table is this:

   {
"osm_class": "natural",
"osm_type": "volcano",
"place_kind": {
"place_kind": "mountain_top",
"semantic_role": "summit",
"has_view": true
}
},
{
"osm_class": "natural",
"osm_type": "cliff",
"place_kind": {
"place_kind": "viewpoint",
"semantic_role": "viewpoint",
"has_view": true
}
},
{
"osm_class": "tourism",
"osm_type": "viewpoint",
"place_kind": {
"place_kind": "viewpoint",
"semantic_role": "viewpoint",
"has_view": true
}
},
{
"osm_class": "tourism",
"osm_type": "alpine_hut",
"place_kind": {
"place_kind": "hut",
"semantic_role": "structure",
"has_view": true
}
},
{
"osm_class": "amenity",
"osm_type": "shelter",
"place_kind": {
"place_kind": "shelter",
"semantic_role": "structure",
"has_view": true
}
},
{
"osm_class": "amenity",
"osm_type": "cafe",
"place_kind": {
"place_kind": "cafe",
"semantic_role": "structure",
"has_view": true
}
},
{
"osm_class": "highway",
"osm_type": "path",
"place_kind": {
"place_kind": "trail",
"semantic_role": "viewpoint",
"has_view": true
}
}

The reasoning around «has_view» is a little bit experience, a little bit guesswork. A cafe, for example, is somewhere you might sit and look at something else, like the city life. Thus, it has kind of a view. In fact, I tend to end up with most things might have a view, so the jury is still out whether it’s a useful distinction at all. But it stays for now!

Prompt improvements

Before all if this, I tended to get a lot of strangely worded descriptions. For example, it might say «A mountain hut at Galdhøpiggen Turhytte» (Galdhøpiggen turhytte is a mountain hut in itself), or «A view from the summit of Galdhøpiggen Turhytte». And this was the reasoning behind the semantic roles, which is kind of still on the testing stadium.

So, for the reverse geocoding with OSM, I ended up with a little bit differences in the prompt depending upon the properties of the mapped kind:

General prompt for places is currently:

Places and known locations
- If the context tells you the exact name of a place that clearly matches the visible scene
(for example a specific trail, building, or landmark), you MUST:
- include that place name as a label (primary or secondary), and
- mention it once in the short_description.
- Do NOT invent specific place names. If you are unsure, use generic words like
"mountain", "city street", "beach", "living room".
- You will get place names that may be in Norwegian or other local languages. Treat them as literal place names (do not translate or anglicize them), but infer what kind of place they are and describe them in English. If the name suffix indicates something like waterfall, mountain, lake, or farm, use that information. For example, do not write things like "Anita is standing at Kjosfossen viewpoint, overlooking a scenic waterfall", rephrase it to "Anita is standing at a viewpoint, overloking the scenic Kjosfossen waterfall".

The last one fixes a few places where it would say that it is at a place that actually have the kind of place inside the name, and then saying that it’s showing that kind of place – like «foss» is a «waterfall» in Norwegian. For the specific case in the prompt, interestly it chose «Anita is standing at Kjosfossen viewpoint, overlooking the scenic waterfall», which I’m pretty happy with.

Varying location prompt based on type of place in my registry

My current code for my registered places hasn’t changed all that much, and is:

    if source == "registry":
# e.g. "Byfjell Trail, Bergen, Norway"
if label:
parts.append(
f"This photo was taken at a specific place we know about: {label}."
)

# SPECIAL CASE: mountain top viewpoints
if kind == "mountain_top":
parts.append(
"This place is a mountain top (a summit). "
"Sometimes the photo shows the summit itself (for example people standing at the cairn, a sign, or a structure on the top), "
"and sometimes it shows a distant panorama seen FROM the top."
)
parts.append(
"You MUST look carefully at the image and decide whether the main subject is:"
)
parts.append(
"- the mountain top itself (people, cairn, summit marker, building or objects at the top), or"
)
parts.append(
"- a distant landscape or city that is being viewed FROM this mountain top."
)
parts.append(
"If the main subject is the summit itself, describe it as being AT the mountain top and treat the place as the subject."
)
parts.append(
"If the image shows a distant view and any town, fjord, mountain, or other landmark in that view can be clearly and confidently identified from the image and the provided context, you MUST also mention that identifiable place or landmark by name in the primary_label or secondary_labels and in the short_description (for example, 'View from Ulriken towards Bergen city centre')."
)
parts.append(
"In both cases you MUST include the mountain's name in the primary_label or secondary_labels "
"and mention it once in the short_description, but you MUST NOT write phrases like "
"'a panoramic view of <mountain>' unless the mountain itself is clearly visible as the subject."
)
elif has_view == True:
parts.append(
"This place has a view."
"Sometimes the place itself (for example people standing at the place, a sign, or a structure on the place), "
"and sometimes it shows a distant panorama seen FROM the place."
)
parts.append(
"You MUST look carefully at the image and decide whether the main subject is:"
)
parts.append(
"- the place itself (people, structure, building or objects at the place), or"
)
parts.append(
"- a distant landscape or city that is being viewed FROM this place."
)
parts.append(
"If the main subject is the place itself, describe it as being AT the place and treat the place as the subject."
)
parts.append(
"If the image shows a distant view and any town, fjord, mountain, or other landmark in that view can be clearly and confidently identified from the image and the provided context, you MUST also mention that identifiable place or landmark by name in the primary_label or secondary_labels and in the short_description (for example, 'View from Ulriken towards Bergen city centre')."

)
parts.append(
"In both cases you MUST include the place's name in the primary_label or secondary_labels "
"and mention it once in the short_description, but you MUST NOT write phrases like "
"'a panoramic view of <place>' unless the place itself is clearly visible as the subject."
)
else:
# Default registry-place instruction
parts.append(
"You MUST include this place name in the primary_label or secondary_labels "
"and also mention it explicitly in the short_description."
)

As you can see, I basically am differing only by view for now, but more testing might reveal that I might need the semantic role for this too.

Varying prompts for the OSM-based reverse geocoding.

For the registered places, I can basically create wordings which makes the prompt understand it a bit more. FOr example, I chose «The cabin at Hellesøy» to make it a clear english description of the cabin that is easy to incorporate in a text.

For the reverse-geocoding, I have a bit less control over what I get back, which is why I introduced the semantic role.

I only have slight variations for the semantic role (not all of this is tested well):

def _semantic_role_phrase(semantic_role: str | None) -> str:
if semantic_role == "structure":
return (
"This place is a man-made structure (such as a hut, café, shelter, "
"or building). You MUST treat the place name itself as the structure. "
"You MUST NOT describe the place itself as a natural feature such as "
"a mountain, summit, peak, ridge, valley, fjord, lake, river, or beach. "
"In particular, you MUST NOT write phrases like "
"'at the summit of [place name]', 'from the summit of [place name]', "
"'on the peak of [place name]', or similar. "
"If the surrounding mountains or landscape are important, describe them "
"as a view FROM the structure (for example 'a view from [place name]') "
"while keeping the structure as the named place."
)
if semantic_role == "summit":
return (
"This place is a mountain summit (a high point on a mountain). "
"The place name refers to the summit itself, not to any building on it."
)
if semantic_role == "trail":
return "This place is a hiking route or trail."
if semantic_role == "viewpoint":
return "This place is a viewpoint where people can look out over the landscape."
return "This is a specific named place in the landscape."

As you can see, the default is a generic phrase, the rest are a bit specific, that I can tune as I find special cases. I can of course easily introduce more semantic roles.

I have a pretty specific prompt for structures that prevented at least the «At the summit of Galdhøpiggen Turhytte» case, but more testing likely remains. It’s also pretty evident that for now, this will end up being a bit tailored to my use cases. If I ever get around to releasing this, I’ll need both good documentation about what to change and tune, and probably a more parametrizable prompt,

Improvements to the reverse geocoding.

I have based my place logic around there being GPS tags in the image. But that’s not always the case. You still might know what the place is, though, and want to inject that into, say, a batch of pictures taken at that specific place.

My strategy so far is a pre-step, adding GPS tags to the image. Again there can be two sources, both the manual registry and OSM.

Adding GPS tags from the manual registry.

For adding a coordinate via a registered place ID, the API endpoint is pretty simple:

@app.get("/places/by-id")
async def get_place_by_id(placeid: str = Query(...)):
nodes = [n for n in load_places() if n.place_id == placeid]
if not nodes:
raise HTTPException(status_code=404, detail="Place not found")
node = nodes[0]
return {
"place_id": node.place_id,
"lat": node.lat,
"lon": node.lon,
}

I basically read in the registered places, find any matches with the place id, and returns the first GPS coordinate registred (there can be several – for example in case of a long trail, I’ll usually create more points along the trail with less radius, rather than having the radius for a trail be 20 km…

The cli will then take that coordinate and put it on the picture itself. I’ve added a few safety measures, like not overwriting GPS tags unless you use the –-force argument, and a –dry-run to preview what happens.

@cli.command("geotag-gps-from-place")
@click.argument(
"images",
type=click.Path(exists=True, dir_okay=False, path_type=Path),
nargs=-1,
required=True,
)
@click.option("--place-id", "-i", required=True, help="Logical place id, e.g. byfjelltrail.")
@click.option(
"--force/--no-force",
default=False,
show_default=True,
help="Override existing GPS tags.",
)
@click.option(
"--dry-run/--no-dry-run",
default=False,
show_default=True,
help="Show what would be changed without writing metadata.",
)
def geotag_gps_from_place_cmd(images: list[Path], place_id: str, force: bool, dry_run: bool) -> None:
"""
Set GPS for one or more IMAGES using a known place id.

- Writes mc + EXIF GPS.
- Only overrides existing GPS when --force is given.
- With --dry-run, prints intended changes but does not modify files.
"""
# Single place lookup, reused for all images
url = f"{API_BASE}/places/by-id"
try:
resp = client.get(url, params={"placeid": place_id})
resp.raise_for_status()
except Exception as e:
click.echo(f"Error from server: {e}", err=True)
raise SystemExit(1)

data = resp.json()
lat = data.get("lat")
lon = data.get("lon")
if lat is None or lon is None:
click.echo("Server did not return lat/lon", err=True)
raise SystemExit(1)


for image in images:
xmppath = str(image)
_ensure_xmp_skeleton(str(xmppath))

with Image(str(xmppath)) as img:
md = img.read_exif()

has_exif_gps = (
"Exif.GPSInfo.GPSLatitude" in md or
"Exif.GPSInfo.GPSLongitude" in md
)

if has_exif_gps and not force:
click.echo(
f"{image} already has GPS; use --force to override.",
err=True,
)
continue

click.echo(
f"Will set GPS for {image} from place {place_id} "
f"to lat={lat}, lon={lon}"
+ (" (override existing GPS)" if force and has_exif_gps else "")
)

if dry_run:
continue

dms_lat = _deg_to_dms_rationals(lat)
dms_lon = _deg_to_dms_rationals(lon)

# Standard EXIF GPS
md["Exif.GPSInfo.GPSLatitude"] = " ".join(dms_lat)
md["Exif.GPSInfo.GPSLongitude"] = " ".join(dms_lon)
md["Exif.GPSInfo.GPSLatitudeRef"] = "N" if lat >= 0 else "S"
md["Exif.GPSInfo.GPSLongitudeRef"] = "E" if lon >= 0 else "W"
img.modify_exif(md)

if dry_run:
click.echo("Dry run: no metadata written.")
else:
click.echo("Done geotagging images."

As you can see, it calls the API endpoint and adds EXIF tags with the GPS information.

Adding GPS coordinates from OSM.

For adding coordinates from OSM, I basically search with the name in a nominatim type service, and get the place back. From that place, I get the GPS coordinates, which I can then add to the image exactly the same way as for the registry places.

The endpoint:

@app.get("/places/osm-search")
async def osm_search_place(q: str = Query(…, description="Free-text OSM place query")):
    """
    Search an OpenStreetMap place by name / address via Nominatim-compatible service.
    Returns lat/lon and a human-readable label for the best    match.
    """
res = search_place_via_geocode_maps(q)
if res is None:
    raise HTTPException(status_code=404, detail="No OSM place found")

return res

The actual search function is in placehelper.py:

def search_place_via_geocode_maps(query: str, limit: int = 1) -> dict | None:
"""
Free-text search of OSM via Nominatim-compatible /search.
Returns a normalized dict: {lat, lon, label, ...} for the best match, or None.
"""
params = {
"q": query,
"format": "json",
"limit": limit,
"accept-language": "en",
}
if GEOCODE_API_KEY:
params["api_key"] = GEOCODE_API_KEY
try:
with httpx.Client(timeout=GEOCODE_TIMEOUT) as client:
resp = client.get(GEOCODE_BASE_SEARCH, params=params)
resp.raise_for_status()
except Exception:
return None

data = resp.json()
if not isinstance(data, list) or not data:
return None

best = data[0]
lat = float(best.get("lat"))
lon = float(best.get("lon"))
display_name = best.get("display_name") or query

return {
"lat": lat,
"lon": lon,
"label": display_name,
"source": "geocode-maps",

This I again use in the cli to add GPS tags:

@cli.command("geotag-gps-from-osm")
@click.argument(
"images",
type=click.Path(exists=True, dir_okay=False, path_type=Path),
nargs=-1,
required=True,
)
@click.option("--query", "-q", required=True, help="OSM place query, e.g. 'Bergen station' or 'Fløyen, Bergen, Norway'.")
@click.option(
"--force/--no-force",
default=False,
show_default=True,
help="Override existing GPS tags.",
)
@click.option(
"--dry-run/--no-dry-run",
default=False,
show_default=True,
help="Show what would be changed without writing metadata.",
)
def geotag_gps_from_osm_cmd(images: list[Path], query: str, force: bool, dry_run: bool) -> None:
"""
Set GPS for one or more IMAGES using an OpenStreetMap place search.

- Resolves QUERY via the server's OSM/Nominatim API.
- Writes EXIF GPS (DMS) + mc XMP GPS.
- Only overrides existing GPS when --force is given.
- With --dry-run, prints intended changes but does not modify files.
"""
url = f"{API_BASE}/places/osm-search"
try:
resp = client.get(url, params={"q": query})
resp.raise_for_status()
except Exception as e:
click.echo(f"Error from server: {e}", err=True)
raise SystemExit(1)

data = resp.json()
lat = data.get("lat")
lon = data.get("lon")
label = data.get("label") or query
if lat is None or lon is None:
click.echo("Server did not return lat/lon", err=True)
raise SystemExit(1)


for image in images:
with Image(str(image)) as img:
exif = img.read_exif()
xmp = img.read_xmp()

has_exif_gps = (
"Exif.GPSInfo.GPSLatitude" in exif or
"Exif.GPSInfo.GPSLongitude" in exif
)

if has_exif_gps and not force:
click.echo(
f"{image} already has GPS; use --force to override.",
err=True,
)
continue

click.echo(
f"Will set GPS for {image} from OSM '{label}' "
f"to lat={lat}, lon={lon}"
+ (" (override existing GPS)" if force and (has_exif_gps or has_mc_gps) else "")
)

if dry_run:
continue

# 2) EXIF GPS as DMS-style that extract_gps understands
dms_lat = _deg_to_dms_rationals(lat)
dms_lon = _deg_to_dms_rationals(lon)

exif["Exif.GPSInfo.GPSLatitude"] = " ".join(dms_lat)
exif["Exif.GPSInfo.GPSLongitude"] = " ".join(dms_lon)
exif["Exif.GPSInfo.GPSLatitudeRef"] = "N" if lat >= 0 else "S"
exif["Exif.GPSInfo.GPSLongitudeRef"] = "E" if lon >= 0 else "W"

img.modify_exif(exif)

if dry_run:
click.echo("Dry run: no metadata written.")
else:
click.echo("Done geotagging images from OSM place.")

Testing place logic

The «Kjosfossen Waterfall» case

Anita stands at the Kjosfossen viewpoint, overlooking the scenic waterfall and rugged mountain landscape.

Galdhøpiggen turhytte

a view from Galdhøpiggen turhytte, showcasing a scenic mountain landscape with hikers and a stone shelter.

I haven’t quite gotten rid of «the summit of» problem yet, other pictures still get «the summit of Galdhøpiggen turhytte», but at least it improved….

Trollstigen

a view of the Trollstigen road in Norway, a serpentine mountain road with steep cliffs and lush greenery

Summary

I was able to improve this a lot with metadata, mostly manually registered in tables, making good default results.

I am quite happy with the results, but still get odd wording here and there, suggesting there’s still improvements needed in the prompt.

Getting good results from this depends upon both good data in to the prompt and the prompt itself. Wording and precise instructions to the prompt is of vital importance.

, ,

Legg igjen en kommentar

Din e-postadresse vil ikke bli publisert. Obligatoriske felt er merket med *

Dette nettstedet bruker Akismet for å redusere spam. Finn ut mer om hvordan kommentardataene dine behandles.