Menu Close

Any Vision

Use Google’s state-of-the art AI to describe, tag, classify, and search your photos and videos.

With Any Vision’s Prompt command (examples), you can:

  • Generate captions, headlines, and the accessibility fields Alt Text and Extended Description
  • Generate keywords and descriptions for specific uses, e.g. stock agencies
  • Create your own specialized descriptions and classifications, such as the type of room and design style in real-estate photos or the clothing worn by riders in motorcycle photos
  • Extract the numbers from athlete’s bibs and jerseys, motorcycles, and cars into fields or keywords
  • Extract embedded text (OCR)
  • Generate audio transcripts of videos

With Any Vision’s Classify command (examples), formerly called Analyze, you can:

  • Tag your photos with objects, activities, landmarks, logos, facial expressions, and dominant colors
  • Extract embedded text (OCR)
  • Find photos with similar visual content and similar colors

With both commands you can:

  • Store the tags and text in Lightroom’s keywords and metadata fields
  • Recognize and translate from and to over 100 languages
  • Search the tags and fields to make it much easier to find photos
  • Export the tags and text into comma-separated text files

Any Vision uses Google Gemini and Cloud Vision, the AI technology underlying Google image search. Though you’ll have to get a Google Cloud key as well as an Any Vision license, you can use the Prompt command completely free, albeit slowly, or pay pennies per thousand photos to have it go very fast. You can use the Classify command for free on up to 212,000 photos in the first three months and then up to 1,000 photos every month thereafter. See Google Pricing for details.

Consider similar services with different features and pricing.

Try it for free (limited to 50 photos). Buy a license at a price you name.

Prompt Examples

Headline, Alt Text, and Extended Description generated by the Prompt command:

A user-written prompt classifying a real-estate photo as interior/exterior, its room type, and design style:

Keywords generated for a stock-photo agency:

Classify Examples

A Classify example showing labels, landmarks, face expression, and recognized text; Google has correctly located the photo within a few meters and extracted much of the visible text from the store signs and the granite plaque on the statue:

Example of logo detection, where the logos are partially obscured but still recognized:

Correctly recognized jersey numbers:

Download and Install

Any Vision requires Lightroom 6.14 (CC 2015) or Lightroom Classic. (The newer cloud-focused Lightroom doesn’t support plugins.)

  1. Download anyvision.1.19.zip. (What’s changed in this version)
  2. If you’re upgrading from a previous version of Any Vision, exit Lightroom, delete the existing anyvision.lrplugin folder, and replace it with the new one extracted from the downloaded .zip. Restart Lightroom and you’re done.
  3. If this is a new installation, extract the folder anyvision.lrplugin from the downloaded .zip and move it to a location of your choice.
  4. In Lightroom, do File > Plug-in Manager.
  5. Click Add, browse and select the anyvision.lrplugin folder, and click Select Folder (Windows) or Add Plug-in (Mac OS).
  6. Get a Google Cloud key.

The free trial is limited to analyzing 50 photos—after that, you’ll need to buy an Any Vision license. Note that your Google Cloud key will allow you to process tens of thousands of photos for free (see here for details).

Licensing

To use Any Vision after the free trial ends, you’ll need both an Any Vision plugin license and a Google Cloud key tied to a billing account you set up with Google. Cloud Vision costs little or nothing for most users—see Cloud Vision Pricing for details.

Buy a License  

  1. Buy a license at a price you think is fair:  

    The license includes unlimited upgrades. Make sure you’re satisfied with the free trial before buying.
  2. Copy the license key from the confirmation page or confirmation email.
  3. Do Library > Plug-in Extras > Any Vision > Classify.
  4. Click Buy.
  5. Paste the key into the License key box and click OK.

Get a Google Cloud Key

Setting up a Google billing account and getting a Cloud key is a little tedious but straightforward if you follow these steps exactly. I recommend that you arrange arrange two browser windows to be visible. (Unfortunately, Google doesn’t provide any way for an application like Any Vision to make this simpler.)

If you previously obtained a free API key for Prompt and want to add a billing account to enable Prompt to go much faster (at as little as pennies per thousand photos), skip to these steps.

If you’re upgrading from an older version of Any Vision that didn’t have the Prompt command (version 18 or earlier) and had already obtained a Google license key, skip to these last steps.

  1. In a web browser, go to Get API Key.
  2.  Sign in with an existing Google account (for example, your Gmail accout) or create a new one.
  3. Click Get API Key:
  4. Agree to the Legal Notice.
  5. Click Create API Key:
  6. Click Copy to copy the API key to the clipboard:
  7. In Lightroom, do File > Plug-in Extras > Prompt, click Google Key, and paste the key:
  8. If you just want to use Prompt for free (somewhat slowly), you’re done. But if you want Prompt to go much faster or you want to use the Classify command, add a billing account in the following steps.
  9. In a web browser, go to Get API Key.
  10. Click Set Up Billing:
  11. Click Link a Billing Account:
  12. Click Create Billing Account:
  13. Complete steps 1 and 2, adding your name and payment method.
  14. Click Start Free:
  15. In the web browser, go to Cloud Vision API and click Enable:
  16. If you see Activate in the upper-right corner, click it:
  17. Go to Cloud Translation API and click Enable:
  18. If you’re upgrading from an older version of Any Vision that didn’t have the Prompt command (version 18 or earlier), go to Gemini API and click Enable, otherwise skip this step:
  19. Go to Credentials and click the underlined API key (it may have a different name):
  20. Select Don’t Restrict Key if it’s not already selected and click Save:

    The change might take up to five minutes to take effect.

Google Pricing

Google’s pricing is pay-as-you go per photo or video, with no upfront charges. You can disable your billing account at any time.

If you’re concerned about accidentally spending too much on Google services or someone stealing your Google license key, you can create a billing alert that will notify you when monthly spending exceeds a specified amount.

Prompt Pricing (as of December 2024)

Prompt uses Google’s Gemini to execute prompts. If you don’t associate a billing account with the API key you obtained, then your use of Gemini is free but very slow, and Google will use your photos to improve their products. Prompts using the flash model will process 15 photos per minute up to 1500 photos per day, while prompts using the pro model will process 2 photos per minute up to 50 photos per day. Once you associate a billing account with the key, prompts will execute about four times faster.

Gemini’s pricing varies according to the model, the pixel size of the photo, the length of the video, the length of the prompt text, and the length of the result text. Typically, the flash model costs about $0.01—$0.03 per thousand photos, while the pro model costs about $0.40–$0.90 per thousand.

To see the cost of a prompt for a particular photo or video, select it in Library, do the Prompt command, select the prompt, click Edit, and do Preview Result. Videos will cost significantly more than photos, since Gemini examines one frame from each second of video.

Classify Pricing (as of December 2024)

Classify uses Google’s Cloud Vision and charges your billing account monthly for each photo you classify. But Google offers $300 free credit for new billing accounts (good for tens of thousands of photos), and the first 1000 photos classified each month are free.

Each of the seven features (Labels, Landmarks, Logos, Faces, Safety, Text, and Dominant Color) costs $1.50 / 1000 photos, except for Safety, which is free if you also select Labels. For example, selecting Labels and Landmarks costs $3.00 / 1000 photos, and selecting all seven features costs $9.00 / 1000 photos.

Translation of labels and other features to other languages costs $20 per million characters or about 100,000 distinct labels. (My main catalog has 30,000 images with 2500 distinct labels, and translating them to another language cost about $0.50.)

Though this may sound expensive for larger catalogs, Google provides incentives that lower the cost considerably. Most users will pay nothing or very little every month.

For each feature, the first 1000 photos are free each month. For example, if you select all seven features, you can classify 1000 photos monthly for free.

Google also offers $300 free credit for creating your first billing account (the credit must be used within three months). If you select just one feature (e.g. Labels), that’s enough for 200,000 photos, or nearly 29,000 photos for all seven features.

Prompt or Classify?

Should you use Prompt or Classify? There are some things that only one command can do:

Prompt: text descriptions, specialized classifications
Classify: GPS locations

In my (not exhaustive) testing, Prompt does a little better at keywording and extracting text, including athlete’s bibs and race cars. Classify does better at dominant colors. But try it for yourself—your mileage may vary.

Classify is 2-4 times faster.

Google provides free options for both. Classify can do tens of thousands of photos for free initially and 1000 photos per month thereafter, and there is no speed penalty. Prompt doesn’t have any limit on the number of free photos, but it limits processing to 15 photos/minute 1500 photos/day (flash model) or 2 photos/minute 50 photos/day (pro model).

Classify uses 2016-era technology, and Google doesn’t appear to have improved it much in recent years. Prompt uses the leading-edge Gemini AI service, and Google is actively improving it every month.

Using Prompt

With the Prompt command, you write text prompts, detailed instructions in your native language, that ask Google’s Gemini AI to describe your photos and videos, and you specify actions how Prompt should process Google’s answers and store them in metadata fields. For example, the built-in prompt IPTC Headline has this text prompt and action, which copies Google’s response to the IPTC Headline field:

Any Vision provides over a dozen built-in prompts, which you can use as-is or modify to suit your own needs.

To apply one or more prompts, select the desired photos and videos and do File > Plug-in Extras > Prompt, displaying all the currently defined prompts:

When you click Process, all the currently enabled prompts will be applied to each of the selected photos.

To enable or disable a prompt, select it and click Enable / Disable. Disabled prompts are indicated with ⛔︎.

I’ve tried to include a range of built-in examples showing how to use the various features. You can safely delete those you don’t use, recovering them later by clicking Add Built-in.

Some of the built-in prompts store their results in standard metadata fields such as Caption and Headline, and some add keywords. Many prompts store their results in custom metadata fields named Prompt 1 through Prompt 10, which you can view using custom Metadata panel tagsets.

Creating a Prompt

Click New to create a new prompt. Use a long, descriptive name to make it easy to remember what it does.

Enter your text instructions in the Prompt box. Crafting a good prompt takes a lot of trial-and-error at first. The built-in prompts give lots of examples, and Google has a detailed introduction suitable for non-programmers.

The result will be generally be in the same language as the text prompt. You can include a specific language in the instructions, e.g. “Use English for the description”. Google Gemini knows over 40 languages.

Test out your prompt by clicking Preview Result, which will display Google’s response for the currently selected photo, along with the estimated cost per thousand photos. Use the < and > buttons to move to the previous or next photo and then click Preview Result again.

Any Vision caches the Google result and Preview Result will resend the photo and prompt to Google only if you change the prompt or a related parameter. But rarely Google might hiccup without any indication, and if you suspect that, you can force Preview Result to resend to Google by checking Refresh Google response.

You can tell Google how you’d like the output formatted, e.g. as a comma-separated list or one item per line. Google usually follows your instructions but sometimes doesn’t. If you’re technically minded, you can tell Google to generate structured output by checking Use JSON schema for output; see here for details.

After supplying the prompt text, add one or more actions by clicking Add an action. The actions specify how to process the Google result and where to store it in keywords and metadata fields. The Change advanced options action lets you override various Google settings about how it executes prompts.

After Google returns a response, the actions are applied in order. By default, the text results from the previous action are used as input to the next action, though some of the actions let you use the original results from the prompt. Preview Result will show the results of actions that change the text results but won’t execute the actions that store in keywords or metadata fields.

Explanations of the various action types follow:

Action: Copy to field

The Copy to field action copies the prompt results to any of 46 standard metadata fields and the 10 custom Prompt fields. These are all visible in the Library Metadata panel.

Action: Replace pattern

The Replace pattern action transforms the Google response, replacing all occurrences of the pattern with a replacement. The patterns and replacements are a subset of full regular expressions, documented here.

When Include unmatched text is checked, then any text in the input that doesn’t match the pattern is included in the output. When unchecked, only the replacements of text matched by the pattern are included in the output.

See the built-in prompt Runner’s bibs for an example, where numbers are extracted from recognized text and output, one per line.

Use the Preview Result button for troubleshooting pattern/replacements.

Action: Assign to keywords

The Assign to keywords action interprets the response as a list of keywords and assigns them to the photo. The options:

text separated by commas, semicolons, or newlines: The keywords are separated by commas, semicolons, or newlines.

lines with “name: value”: Lines of the form “name: value” create hierarchical keywords name > value. For example, the response lines:

Room Type: Bathroom
Design Style: Contemporary

will create keywords Room Type > Bathroom and Design Style > Contemporary.

JSON output: Interprets the response as JSON created by the prompt option Use JSON schema for output. String values create leaf keywords and property names create parent keywords. For example, this JSON response:

{"What": {
    "Room": "Bathroom",
    "Style": ["Contemporary", "Transitional"]}

will create keywords What > Room > Bathroom, What > Style > Contemporary, and What > Style > Transitional.

Create keywords under parent keyword: If given, all keywords will be created under this parent keyword. It can be a hierarchical keyword, e.g. Home > Style.

Create keywords in subgroups A, B, C, …: If checked, all leaf keywords will be created under a parent keyword whose name is the leaf keyword’s first letter. For example, the keyword mountain would be placed under the parent keyword M. This works around a longstanding (and shameful) Lightroom bug on Windows where it chokes if it tries to display more than about 1500 keywords at once.

Set “Include on Export” attribute of keywords: When set, all created keywords will have that attribute set, allowing them to be included with a photo when it is exported.

Remove previous keywords under that parent: When a parent keyword is specified, then before assigning the new keywords to the photo, all previous keywords under that parent are removed from the photo.

Action: Change advanced options

The Change advanced options action lets you set parameters controlling Google’s execution of the prompt:

Temperature: A number between 0 and 2, defaulting to 0. Lower values create more deterministic responses, while higher values create more diverse or creative results that are more likely to vary each time the prompt is executed on the photo.

Model: The AI model used by Google Gemini to execute the prompt. See here for more details.

Image size: The maximum pixel size of the long edge of images sent to Google. Images larger than this are downscaled. The default is 3072, the largest that Google will accept. Google recommends the largest size, but if your network connection or computer is slow, a smaller image size will export from Lightroom and upload to Google faster.

Video format: The resolution of videos sent to Google—lower resolution videos will upload faster. But Lightroom exports video very slowly, so if you have an average consumer Internet connection, it could be faster to send the original videos to Google rather than wait for Lightroom to export them at smaller size.

Safety category: Specifies where Google should refuse to analyze photos and videos it thinks belong to various “safety” categories. Regardless of what you specify here, Google will sometimes refuse to process content it (often mistakenly) thinks is sexually explicit.

Action: Transform with Lua code

The Transform with Lua code action is an advanced option for those with programming experience, transforming the response with the executed code. The code should be a a block of statements ending with a return. The returned string value will be used by subsequent actions.

The following global variables will be set on entry to the code:

text: The response text.

json: If the prompt option Use JSON schema is checked, the resulting JSON represented as a Lua value.

photo: The current photo (an LrPhoto object).

Debug: The Debug module from the Debugging Toolkit. The functions Debug.logn() and Debug.lognpp() can be used to log debugging output to the file debug.log in the anyvision.lrplugin folder.

In addition, the global namespace contains all the functions available to plugins, including the function require() for importing modules from the Lightroom Plugin SDK.

JSON Schemas

In the prompt text, you can tell Google how you’d like the output formatted, e.g. as a comma-separated list or one item per line. Google usually follows your instructions but sometimes doesn’t. If you’re technically minded, you can tell Google to generate structured output conforming to your requirements by checking Use JSON schema for output and providing a schema. The response text will be formatted according to that schema.

See the built-in prompt Classify real estate photos for an example. It uses a JSON schema to force the keyword Room Type to be one of fixed set of values.

You can read Google’s sketchy documentation, but I’ve found that, as with many things Gemini, JSON output is temperamental and requires trial-and-error. For several of the built-in prompts, I originally used JSON output but backed off because of Google flakiness.

Gemini Models

Google Gemini currently provides two stable AI models for executing prompts, gemini-1.5-flash-8b and gemini-1.5-pro. You can chose the model used for a prompt with the Changed advanced options action.

Google says the flash model is fast and cheap, with great performance for high-frequency, lower-intelligence tasks, while the pro model slower and more expensive but gives the best performance for a wide variety of reasoning tasks. I haven’t done rigorous testing, but anecdotally I’ve found the flash model gives pretty good results with Any Vision’s built-in prompts, though sometimes the pro model does a little bit better.

The flash model typically costs $0.01 to $0.02 per 1000 photos, while the pro model can cost $0.40 to $.50 per 1000 photos (or more).

I advise always starting with the flash model (the default) and only try the pro model when flash‘s results aren’t good enough.

Custom Metadata Panel Tagsets

Some of the built-in prompts store their results in standard metadata fields such as Caption and Headline, and some add keywords. Many prompts store their results in custom metadata fields named Prompt 1 through Prompt 10, which you can view using the Any Vision > Prompt tagset in the Metadata panel:

You can make your own custom tagsets showing just the fields you want with more appropriate titles. For example, the built-in prompt Classify real estate photos has a corresponding tagset Real Estate Classification:

Similarly, the built-in prompt Motorcycle description has a corresponding custom tagset.

To make a new custom tagset:

  1. In the anyvision.lrplugin folder, copy the file CustomTagset2.lua to MyTagset.lua (or any other name).
  2. Edit MyTagset.lua in a text editor and change the title, id, and items, whose format is perspicuous.
  3. Edit Info.lua in a text editor and add "MyTagset.lua" to the list of LrMetadataTagsetFactory.
  4. Restart Lightrom.

You can add standard metadata fields to tagsets, and there are more options for formatting fields. See the Metadata-Viewer Preset Editor plugin or the Lightroom Classic SDK Programmers Guide for details (you don’t need to be a programmer).

Using Classify

Select the photos to be classified and do Library > Plug-in Extras > Any Vision > Classify. Select the features you want to tag and click OK:

Any Vision exports reduced-size versions of the photos, sends them to Google, and processes the results. Typically it takes about 4 seconds per photo (more if you have a slower Internet connection). See Advanced for how to reduce this to 1 second per photo or less, at the expense of making Lightroom less usable interactively while Classify is running.

Once you’ve classified a photo, by default Any Vision won’t reanalyze it. So if you change the set of selected features, doing Classify again won’t have any effect. See Advanced for how to force Any Vision to resend photos to Google to be reanalyzed (at additional cost).

You can see the results in the Metadata panel (in the right column of Library) with the Any Vision > Classify tagset:

Following each label and landmark is a numeric score, e.g. “mountain (85)”, indicating Google’s estimate of the likelihood of that label or landmark. Faces and safety terms have bucketed scores ranging from “very unlikely” to “very likely”.

Classify can also assign hierarchical keywords using this hierarchy:

For example, if the photo has the label “mountain”, then the keyword Any Vision > Labels > mountain is assigned to the photo. The Advanced tab lets you control whether keywords are added, specify the root keyword, and whether to put all keywords at the top level.

Features

Labels are objects, activities, and qualities, such as mountain, tabby cat, toddler, safari, road cycling, rock climbing, white. In my main catalog of nearly 30,000 photos, Cloud Vision recognized 2500 distinct labels.

Landmarks are specific locations of where the photo was taken or of objects in the photo. Examples include Paris, Eiffel Tower, Denali National Park, Salt Lake Tabernacle Organ, Squaw Valley Ski Resort, Kearsarge Pass. But landmarks aren’t necessarily famous or well-known—they can be obscure local landmarks, such as statues or waterfalls. Photos may be tagged with more than one landmark. In my catalog, Cloud Vision recognized 500 distinct locations.

Each landmark has a GPS location, and the arrow button to the right of the Map field will open Google Maps on the first (most likely) landmark location in the photo.

Logos are product or service logos, such as Coca-Cola, Office Depot, SpongeBob SquarePants, Tonka. In my catalog, Cloud Vision recognized 110 distinct logos.

Faces: Cloud Vision identifies the “sentiment”, or expression, of each recognized face: Joy, Sorrow, Anger, Surprise. It may also tag a face as Under Exposed, Blurred, or wearing Headwear.

Safety identifies whether the photo is “safe” for Google image search: Adult, Spoof, Medical, and Violence. In my main catalog of 30,000 photos, only 213 received one of these safety tags, and most of them weren’t very accurate. Exposed skin triggers “adult”, regardless of whether it’s a bikini-clad woman, a shirtless teen, or babies in diapers.

Text contains text recognized in photos using optical character recognition (OCR). Cloud Vision appears to do a reasonable job of recognizing text on signs, plaques, athletic jerseys, etc.

Dominant Color. Cloud Vision identifies the ten most “dominant” colors in a photo, using an undocumented algorithm. Use the Sort by Color command to see those colors for a photo and find other photos with similar colors.

Searching

You can search photos’ features using the Library Filter bar, smart collections, or the Keyword List. For example, to find photos assigned the label “mountain”, you could do:

Do Library > Enable Filters if the Library Filter bar isn’t showing, and then click Text.

To search just the Labels field, use this smart-collection criterion:

Alternatively, in the Keyword List panel, type “mountain” in the Filter Keywords box, then click the arrow to the far right of the “mountain” keyword:

Advanced

The Advanced tab provides more flexibility for using Any Vision:

Score Threshold: For each label, landmark, logo, etc. Cloud Vision assigns a score, an estimate of the probability it is correct. You can set a per-feature score threshold, and only those labels, landmarks, etc. that have at least that score will be assigned to the photo.

Labels, landmarks, and logos have scores from 0 to 100, though in practice, Cloud Vision returns only those with a score of at least 50. Faces and Safety values have bucketed scores ranging from “very unlikely” to “very likely”.

Assign Keywords:If checked, Any Vision assigns a keyword for each extracted feature. For example, if the photo has the label “mountain”, then the keyword Any Vision > Labels > mountain is assigned to the photo. You can enable or disable this on a per-feature basis.

Text (OCR) copy: Recognized text can be copied from the Text field in the Metadata panel to any Lightroom-recognized metadata field. copy always copies the text, copy if empty copies the text only if the destination IPTC field is empty, append appends the text to the end of the destination IPTC field, and don’t copy never copies the text.

Text (OCR) pattern replacement: Recognized text can be transformed using patterns whose syntax is documented here. For example, to extract just numbers from the recognized text, placing one number per line:

Replace pattern: [0-9]+ with: %0 separator: \n

To extract the first number only:

Replace pattern: ^.-([0-9])+.*$ with: %1

Also see Recognizing Numbers on Race Bibs.

Text OCR model: Specifies the Google text-recognition algorithm. Documents is optimized for documents dense with text, while Photos is optimized for images with small amounts of text (e.g. on signs). My experience is that Documents performs best not only on documents but also on photos, but Google is constantly changing both algorithms.

Include scores in fields: If checked, then the score will be included with each extracted feature, e.g. “mountain (83)” or “Eiffel Tower (94)”.

Set “Include on Export” attribute of keywords: If checked, then each new keyword created by Any Vision will have the Include on Export attribute set, allowing the keyword to be included in the metadata of exported photos.

If you change this setting and want the change to be applied retroactively to all Any Vision keywords: Delete the root Any Vision keyword in the Keyword List panel. Select all the photos you’ve previously classified. Do Classify, click Advanced, and select the option Reassign metadata fields. This will recreate all the keywords with the new setting, without actually sending the photos to Cloud Vision for reanalysis.

Create keywords under root keyword: This is the top-level root keyword of all the keywords added by Any Vision; it defaults to “Any Vision”. Uncheck this option to put all keywords at the top level.

Use subgroups (A, B, C, …) for Labels, Landmarks, and Logos keywords: When checked, Any Vision will create subkeywords A, B, C, … under the parent keywords Any Vision > Labels, Any Vision > Landmarks, and Any Vison > Logos:

For example, the keyword for the label “mountain” would be placed under Any Vision > Labels > M.

This works around a longstanding (and shameful) Lightroom bug on Windows where it chokes if it tries to display more than about 1500 keywords at once.

Copy landmark location to GPS field: Each landmark assigned to a photo by Cloud Vision has an associated latitude/longitude, displayed in the Location field in the Metadata panel. Selecting Always or When GPS field is empty copies the latitude/longitude of the first landmark (the one with the highest score) to the EXIF GPS field. Once the GPS field is set, the photo will appear on the map in the Map module, and Lightroom will do address lookup to automatically set the photo’s Sublocation, City, State / Province, and Country.

Previously classified photos: This option tells Any Vision how to handle selected photos that have been previously classified:

Skip ignores such photos.

Reclassify by sending to Google sends the photos to Cloud Vision for reanalysis (and additional cost)—you must choose this if you’ve added a feature to be classified.

Reassign metadata fields reassigns the Any Vision metadata fields and keywords using the previous analysis but the current options. This is useful if you’ve changed any of the options that control how the Any Vision metadata fields are set, such as Assign Keywords or Include scores in fields. This option doesn’t incur additional costs for previously classified photos.

Concurrently processed photos: This is the number of photos that will be processed in parallel by Any Vision and Cloud Vision. The default value of 1 will have the least impact on interactive use of Lightroom (though it could still be a little jerky). The maximum value of 8 processes photos about 4 times faster, though interactive use will likely be very jerky. (In my testing, larger values didn’t provide any more speedup.)

Translation

By default, labels and other features are returned by Classify in English, but the Translation tab lets you translate them to another language. You can specify which features should be translated; features not selected will remain in English.

Any Vision uses Google Cloud Translation, the same technology behind Google Translate. More than 100 languages are supported.Translation does cost additional, but it is quite inexpensive. Any Vision remembers previous translations, so you only pay once for each distinct word or phrase.

You can override the translations of specific words and phrases with an overrides dictionary. Click Edit Overrides to open Finder or File Explorer on the dictionary for the current language (e.g. de.csv for German). The dictionary is in UTF-8 CSV format (comma-separated values), and after the header each following line contains a pair of phrases:

word or phrase in Englishword or phrase in target language

The words and phrases are case-senstive. Make sure you save the file in UTF-8 format:

Excel: Do File > Save AsFile Format: CSV UTF-8.

TextEdit (Mac): After opening the file, change it to plain text via Format > Make Plain Text. It will save in UTF-8 format.

Notepad (Windows): Do File > Save AsEncoding: UTF-8.

Sort by Color

When the Dominant Color feature is selected, Cloud Vision finds the ten most “dominant” colors in a photo, using an undocumented algorithm. You can see those colors by selecting a classified photo and doing Library > Plug-in Extras > Sort by Color:

To find other photos containing a similar dominant color, select all the photos you want to search and invoke Sort by Color. Choose one of the dominant colors of the most-selected photo, or use the color picker at the bottom left to choose another color, and click OK. The current source is changed to the collection Any Vision: Sorted by Color, containing those photos with the most similar dominant colors. Do View > Sort > Custom Order to sort the collection by similarity.

Here are the photos from my main catalog of 30,000 photos with dominant colors closest to the orange-brown chosen in the example photo above:

As another example, here are photos from my catalog labeled “sunset” by Cloud Vision,  sorted by similarity to the yellow-orange from the first photo:

Find with Similar Labels

Find with Similar Labels finds photos with similar visual content by comparing the labels assigned by Classify. This can help find duplicates and near duplicates even if they’re missing metadata, are in different formats (e.g. raw and JPEG), or have been edited or cropped. For example, in a catalog of 32,000 photos, Find discovered these two photos taken three years apart:

Each group of similar photos will be placed in a separate collection in the collection set Any Vision: Similar Labels.

To find near duplicates, set Similarity initially to 95. If that finds too many photos that aren’t near duplicates, try 98 or 99.

If you set Similarity to less than 95, Find can go much slower when run on tens of thousands of photos. You can speed it up considerably by reducing the Include slider from 100%, causing Find to go much faster at the cost of not finding as many similar photos. (The percentage of how many similar photos may be found is a crude estimate.)

Find is only as good as the labels assigned by Cloud Vision. It often does an amazing job at finding near duplicates, but sometimes it’s hilariously bad.

Exporting to a File

To export the results from Prompt or Classify to a comma-separated (CSV) text file, one row per photo, select one or more classified photos and do Library > Plug-in Extras > Export Prompt Results to File or Export Classify Results to File. Open the file in Excel or another spreadsheet program.

Remove Fields

The Remove Fields command removes from the selected photos all custom metadata fields added by Any Vision (but not any keywords). If you run this command on a large number of photos, you may wish to do File > Optimize Catalog afterward to shrink the size of the catalog by a modest amount.

Recognizing Numbers on Athletes’ Bibs and Cars

Both Prompt and Classify do a decent job recognizing numbers on athletes’ bibs and cars if they’re unobscured and facing the camera. In my limited testing, I think Prompt does a little better with partially obscured or angled numbers and unusual fonts. Both commands make it easy to store the numbers in metadata fields and keywords.

With Prompt

Use the built-in prompt Atheletes’ bibs or Dirt bike numbers, which stores the extracted numbers one per line in the Prompt 4 field and in keywords. If you want the numbers stored in another field or not stored in keywords, edit the prompt and change the actions.

If you’re working with cars or motorcycles, you might get slightly better results if you edit a copy of the prompt and change it to refer to “car” or “motorcycle”.

With Classify

In the Advanced tab, check the option Text (OCR), with these options:

Replace pattern: [0-9]+ with: %0 separator: \n

This will extract just the numbers from all the recognized text in the photos, one per line.

To search for a bib number, do the menu command Library > Find to open the text search in the Library Filter bar and use the criterion Any Searchable Plug-in Field contains number. Or create a smart collection with the criterion Searchable Text contains words number.

To create keywords for the bib numbers, check the option Assign keywords from text separated by , ; newline. If you’ve checked the option Create keywords under root keyword, these bib-number keywords will appear under the parent keyword Any Vision > Text; otherwise, they’ll appear at the top level.

To copy the bib numbers to the Caption field, set the option:

Text (OCR) copy to Caption

You can instead copy to the Title, Headline, or Source fields.

Similar Services

You may wish to consider these alternative products:

Excire is a sophisticated AI application that integrates with Lightroom. It has keywording and search capabilities similar to Classify plus much more. It doesn’t use the cloud and it has a much higher one-time upfront cost with no recurring charges.

lrc-ai-assistant uses Google Gemini to generate keywords, titles, and captions. The plugin is free but you’ll need a free or paid Gemini license.

LrTag  tags your photos like Classify, but it has a monthly charge.

MyKeyworder is intended as an aid to careful keywording of smaller numbers of photos rather than searching large catalogs.  It has recurring charges.

Sport Vision AI recognizes numbers on athlete’s race bibs, race cars, and motorcycles. It has per-photo charges from $2 to $5 per 1000 photos.

Lightroom Cloud (but not Lightroom Classic) uses similar technology to search your photos, but it doesn’t display or assign the keyword terms it’s inferred. Some people sync their Lightroom Classic with the Lightroom Cloud and search for photos there.

Keyboard Shortcuts

Windows: You can use the standard menu keystrokes to invoke the Classify and Prompt commands. ALT+L opens the Library menu, U selects the Plug-in Extras submenu, and C invokes the Any Vision > Classify command and P invokes Any Vision > Prompt.

To reassign a different final keystroke to an Any Vision command, edit the file Info.lua in the plugin folder. Move the & in front of the desired letter in the menu command’s name, changing the name itself if necessary.

To assign a single keystroke as the shortcut, download and install the free, widely used AutoHotkey. Then, in the File Explorer, navigate to the plugin folder anyvision.lrplugin. Double-click Install-Keyboard-Shortcut.bat and restart your computer. This defines the shortcut Alt+A to invoke the Classify command. To change the shortcut, edit the file Any-Vision-Keyboard-Shortcut.ahk in Notepad and follow the instructions in that file.

Mac OS: You can use the standard mechanism for assigning application shortcuts to plugin menu commands. In System Settings > Keyboard > Keyboard Shortcuts > Application Shortcuts, select Adobe Lightroom. Click “+” to add a new shortcut, in Menu Title type “Classify” (case matters) preceded by three spaces (“<space><space><space>Classify”) — don’t enter the quotes. In Keyboard Shortcut type the desired key or key combination.

Support

Please send problems, bugs, suggestions, and feedback to ellis-lightroom@johnrellis.com.

I’ll gladly provide free licenses in exchange for reports of new, reproducible bugs.

Known limitations and issues:

  • Any Vision requires Lightroom 6.14 (CC 2015) or Lightroom Classic—it relies on features missing from earlier versions.
  • As of December 2024, Google’s Gemini (Prompt) is relatively new and still hiccups now and then. If you get a mysterious error, such as an “Internal 500” error or a time out, it may go away in the next minute, hour, or day.
  • If you’re upgrading from an older version of Any Vision and see the error “Google error: Generative Language API has not been used in project…”, complete these steps.
  • If Prompt gives Gemini “recitation” errors or says the maximum number of tokens was reached, try adding the action Change advanced options to the prompt and increasing Temperature above 0. Even a small value of 0.1 might be enough to avoid the error.
  • Like other such cloud services, including Adobe Firefly, Gemini (Prompt) has overly sensitive filters for prohibited content, including obscenity and pornography. Lots of exposed skin and innocent photos of children are sometimes flagged incorrectly. These services would rather prohibit too much than too little, to avoid political and legal blowback.
  • As of 3/21/23, you may see the error “Google error (13): Internal server error. Unexpected feature response”. This bug has been acknowledged by Google but not yet fixed. On average, it seems to occur in about one of every couple hundred photos. You can usually avoid the bug by unchecking the Landmarks feature. On 8/22/23, Google Cloud Support said that Google engineering is still working on the issue but there is no ETA.
  • If the Any Vision window is too large for your display, with the buttons at the bottom cut off, do File > Plug-in Manager, select Any Vision, and in the Settings panel, check the option Use Any Vision on small displays. Unfortunately, a bug in Lightroom prevents plugins from determining the correct size of the display..
  • Unlike many photo-sharing services, with Google Gemini and Cloud Vision your photos remain exclusively your photos. For free usage of Gemini (Prompt), Google will use your photos for improving their products. But for paid Gemini and for Cloud Vision (Classify), Google promises not to process your photos for any other purpose. See their Data Processing and Security Terms.
  • Classify infrequently returns errors such as “The request timed out” and “Image processing error!”. If this occurs, just rerun Classify. I don’t know why these errors occur—they appear spurious.
  • When your keyword list has multiple keywords of the same name, Lightroom distinguishes them in the Keyword List panel using the notation child-keyword < parent-keyword. For example, if you have an existing keyword “House” and Any Vision adds the label keyword “House”, the latter will be displayed as House < Labels to distinguish it from the top-level House. But these two keywords will get exported in photo metadata simply as “House”.

Version History

1.2

  • Initial release.

1.3

  • The Use subgroups for keywords option now correctly handles keywords starting with non-English characters.
  • Recognized text can optionally be copied to an IPTC field.

1.4

  • Better handling of errors reported by Google, such as an expired credit card.
  • Pattern replacement for transforming recognized text, e.g. to extract jersey numbers.
  • Translation of features to other languages. See Support for how to enable the Google Cloud Translation API.

1.5

  • You can now specify a root keyword other than “Any Vision” on the Advanced tab.
  • Fixed a bug preventing recognized text in other languages being translated to English.
  • Now ignores missing photos instead of giving a cryptic informational message.
  • Works around obscure export bug in Lightroom 5.

1.6

  • The progress bar updates more incrementally with very large selections.

1.7

  • Dense multi-column text is handled better.
  • Classify is a little faster.

1.8

1.9

  • Advanced option OCR model for choosing which Google text-recognition algorithm to use.

1.10

  • Made the default for the advanced option OCR model to be Photos, which experiments confirm performs better than Documents when analyzing photos containing signs, t-shirts, etc.

1.11

1.12

  • Worked around Lightroom bug creating keywords for labels.
  • Provided option for running Any Vision on small displays, necessitated by Lightroom 10’s change of fonts.

1.13

  • Create top-level keywords by unchecking the Advanced option Create keywords under root keyword.
  • Create keywords from recognized text (OCR) with the Advanced option Assign keywords from text.
  • Added guide to recognizing numbers on athlete’s race bibs.
  • Fixed Export to File to export recognized text after applying any pattern replacements, not the original text.
  • Silently skips over missing photos rather than giving an obscure error.

1.14    2023-03-21

  • Correctly handles “Internal server error” responses from Google Cloud Vision.

1.15    2023-11-23

  • Help button works again.
  • Better logging of unexpected responses from Google.

1.16    2024-05-16

  • Better error messages when Lightroom fails to export a photo being analyzed.
  • Shows how many selected files will be skipped because they’re missing or videos.
  • The temporary folder contain the exported reduced-resolution images sent to Google is once again deleted correctly.
  • Correctly handles when the translation-cache file gets deleted from outside Lightroom.

1.17    2024-08-25

  • Smart previews can once again be analyzed.
  • Minor internal code improvements.

1.18    2024-09-05

1.19    2024-12-05

  • Added the Prompt command, which uses Google’s Gemini AI to describe and classify photos and videos using text prompts you write. If you’re upgrading from an older version of Any Vision, you’ll need to complete these steps to upgrade your Google API key.
  • The old Analyze command is now called Classify.
  • Increased the number of photos in the free trial to 200, to accommodate the necessary experimentation required to learn and evaluate Prompt.
  • The old Export to File command is replaced with Export Prompt Results to File and Export Classify Results to File.
  • Added many more metadata fields to which Classify can copied recognized text.