Any Vision uses Google’s state-of-the art Gemini and Cloud Vision AI to describe, tag, and classify your photos:
- Tag your photos with objects, activities, landmarks, logos, facial expressions, and dominant colors
- Generate captions, headlines, and the accessibility fields Alt Text and Extended Description
- Create your own classifications, such as the type of room and design style in real-estate photos or the clothing worn by riders in motorcycle photos
- Extract embedded text (OCR)
- Extract the numbers from athlete’s bibs and jerseys, motorcycles, and cars into fields or keywords
- Recognize and translate over 100 languages
- Search the tags and fields to make it much easier to find photos
- Find photos with similar visual content and similar colors
- Export the tags and text into comma-separated text files
Any Vision uses Google Gemini and Cloud Vision, the AI technology underlying Google image search. Though you’ll have to get a Google Cloud key as well as an Any Vision license, Google’s pricing lets you analyze for free up to 212,000 photos in the first three months and then up to 1,000 photos every month thereafter.
Consider similar services with different features and pricing.
Try it for free (limited to 50 photos). Buy a license at a price you name.
Examples
Here’s an example showing labels, landmarks, face expression, and recognized text; Cloud Vision has correctly located the photo within a few meters and extracted much of the visible text from the store signs and the granite plaque on the statue:
Here’s an example of logo detection, where the logos are partially obscured but still recognized:
And here’s an example showing correctly recognized jersey numbers:
Download and Install
Any Vision requires Lightroom 5.7 or later, Lightroom CC 2015, or Lightroom Classic. (The newer cloud-focused Lightroom doesn’t support plugins.)
- Download anyvision.1.18.zip. (What’s changed in this version)
- If you’re upgrading from a previous version of Any Vision, exit Lightroom, delete the existing anyvision.lrplugin folder, and replace it with the new one extracted from the downloaded .zip. Restart Lightroom and you’re done.
- If this is a new installation, extract the folder anyvision.lrplugin from the downloaded .zip and move it to a location of your choice.
- In Lightroom, do File > Plug-in Manager.
- Click Add, browse and select the anyvision.lrplugin folder, and click Select Folder (Windows) or Add Plug-in (Mac OS).
- Get a Google Cloud key.
The free trial is limited to analyzing 50 photos—after that, you’ll need to buy an Any Vision license. Note that your Google Cloud key will allow you to process tens of thousands of photos for free (see here for details).
Licensing
To use Any Vision after the free trial ends, you’ll need both an Any Vision plugin license and a Google Cloud key tied to a billing account you set up with Google. Cloud Vision costs little or nothing for most users—see Cloud Vision Pricing for details.
Buy a License
- Buy a license at a price you think is fair:
The license includes unlimited upgrades. Make sure you’re satisfied with the free trial before buying. - Copy the license key from the confirmation page or confirmation email.
- Do Library > Plug-in Extras > Any Vision > Analyze.
- Click Buy.
- Paste the key into the License key box and click OK.
Get a Google Cloud Key
Setting up a Google billing account and getting a Cloud key is a little tedious but straightforward if you follow these steps exactly. I recommend that you print this or arrange arrange two browser windows to be visible. (Unfortunately, Google doesn’t provide any way for an application like Any Vision to make this simpler.)
- In a browser, go to console.cloud.google.com.
- Create a Google account or sign in with an existing one (for example, your Gmail account).
- Agree to the Terms of Service.
- At the top, click “TRY FOR FREE”:
- Agree to the Google Cloud Platform Free Trial Terms of Service.
- Enter your billing information and click “START MY FREE TRIAL”.
- In “Welcome name!”, click “GOT IT”.
- Click the menu button in the upper left, hover over “APIs & Services”, and click on “Library”:
- In the “Search for APIs & services” box, type “cloud vision”, and then click on “Google Cloud Vision API”:
- Click “ENABLE”:
- Click the menu button in the upper left, hover over “APIs & Services”, and click on “Library”:
- In the “Search for APIs & services” box, type “cloud translation”, and then click on “Google Cloud Translation API”:
- Click “ENABLE”:
- Click the menu button in the upper left, hover over “APIs & Services”, and click on “Credentials”:
- Click “Create credentials”, then “API key”:
- Copy the API key by clicking the copy button:
- In Lightroom, do Library > Plug-in Extras > Any Vision > Analyze and click Google Key at the bottom:
- Paste the key into Google key and click OK:
Google Cloud Vision Pricing (as of February 2022)
Google charges your billing account monthly for each photo you analyze. But Google offers $300 free credit for new billing accounts (good for tens of thousands of photos), and the first 1000 photos analyzed each month are free. There are no upfront charges, and you can disable your account at any time.
Each of the seven features (Labels, Landmarks, Logos, Faces, Safety, Text, and Dominant Color) costs $1.50 / 1000 photos, except for Safety, which is free if you also select Labels. For example, selecting Labels and Landmarks costs $3.00 / 1000 photos, and selecting all seven features costs $9.00 / 1000 photos.
Translation of labels and other features to other languages costs $20 per million characters or about 100,000 distinct labels. (My main catalog has 30,000 images with 2500 distinct labels, and translating them to another language cost about $0.50.)
Though this may sound expensive for larger catalogs, Google provides incentives that lower the cost considerably. Most users will pay nothing or very little every month.
For each feature, the first 1000 photos are free each month. For example, if you select all seven features, you can analyze 1000 photos monthly for free.
Google also offers $300 free credit for creating your first billing account (the credit must be used within three months). If you select just one feature (e.g. Labels), that’s enough for 200,000 photos, or nearly 29,000 photos for all seven features.
Using Any Vision
Select the photos to be analyzed and do Library > Plug-in Extras > Any Vision > Analyze. Select the features you want to tag and click OK:
Any Vision exports reduced-size versions of the photos, sends them to Google, and processes the results. Typically it takes about 4 seconds per photo (more if you have a slower Internet connection). See Advanced for how to reduce this to 1 second per photo, at the expense of making Lightroom less usable interactively while Analyze is running.
Once you’ve analyzed a photo, by default Any Vision won’t reanalyze it. So if you change the set of selected features, doing Analyze again won’t have any effect. See Advanced for how to force Any Vision to resend photos to Google to be reanalyzed (at additional cost).
You can see the results in the Metadata panel (in the right column of Library) with the Any Vision tagset:
Following each label and landmark is a numeric score, e.g. “mountain (85)”, indicating Google’s estimate of the likelihood of that label or landmark. Faces and safety terms have bucketed scores ranging from “very unlikely” to “very likely”.
Any Vision also assigns hierarchical keywords using this hierarchy:
For example, if the photo has the label “mountain”, then the keyword Any Vision > Labels > mountain is assigned to the photo. The Advanced tab lets you control whether keywords are added, specify the root keyword, and whether to put all keywords at the top level.
Features
Labels are objects, activities, and qualities, such as mountain, tabby cat, toddler, safari, road cycling, rock climbing, white. In my main catalog of nearly 30,000 photos, Cloud Vision recognized 2500 distinct labels.
Landmarks are specific locations of where the photo was taken or of objects in the photo. Examples include Paris, Eiffel Tower, Denali National Park, Salt Lake Tabernacle Organ, Squaw Valley Ski Resort, Kearsarge Pass. But landmarks aren’t necessarily famous or well-known—they can be obscure local landmarks, such as statues or waterfalls. Photos may be tagged with more than one landmark. In my catalog, Cloud Vision recognized 500 distinct locations.
Each landmark has a GPS location, and the arrow button to the right of the Map field will open Google Maps on the first (most likely) landmark location in the photo.
Logos are product or service logos, such as Coca-Cola, Office Depot, SpongeBob SquarePants, Tonka. In my catalog, Cloud Vision recognized 110 distinct logos.
Faces: Cloud Vision identifies the “sentiment”, or expression, of each recognized face: Joy, Sorrow, Anger, Surprise. It may also tag a face as Under Exposed, Blurred, or wearing Headwear.
Safety identifies whether the photo is “safe” for Google image search: Adult, Spoof, Medical, and Violence. In my main catalog of 30,000 photos, only 213 received one of these safety tags, and most of them weren’t very accurate. Exposed skin triggers “adult”, regardless of whether it’s a bikini-clad woman, a shirtless teen, or babies in diapers.
Text contains text recognized in photos using optical character recognition (OCR). Cloud Vision appears to do a reasonable job of recognizing text on signs, plaques, athletic jerseys, etc.
Dominant Color. Cloud Vision identifies the ten most “dominant” colors in a photo, using an undocumented algorithm. Use the Sort by Color command to see those colors for a photo and find other photos with similar colors.
Searching
You can search photos’ features using the Library Filter bar, smart collections, or the Keyword List. For example, to find photos assigned the label “mountain”, you could do:
Do Library > Enable Filters if the Library Filter bar isn’t showing, and then click Text.
To search just the Labels field, use this smart-collection criterion:
Alternatively, in the Keyword List panel, type “mountain” in the Filter Keywords box, then click the arrow to the far right of the “mountain” keyword:
Advanced
The Advanced tab provides more flexibility for using Any Vision:
Score Threshold: For each label, landmark, logo, etc. Cloud Vision assigns a score, an estimate of the probability it is correct. You can set a per-feature score threshold, and only those labels, landmarks, etc. that have at least that score will be assigned to the photo.
Labels, landmarks, and logos have scores from 0 to 100, though in practice, Cloud Vision returns only those with a score of at least 50. Faces and Safety values have bucketed scores ranging from “very unlikely” to “very likely”.
Assign Keywords:If checked, Any Vision assigns a keyword for each extracted feature. For example, if the photo has the label “mountain”, then the keyword Any Vision > Labels > mountain is assigned to the photo. You can enable or disable this on a per-feature basis.
Text (OCR) copy: Recognized text can be copied from the Text field in the Metadata panel to one of the standard IPTC fields Caption, Headline, Title, or Source. copy always copies the text, copy if empty copies the text only if the destination IPTC field is empty, append appends the text to the end of the destination IPTC field, and don’t copy never copies the text.
Text (OCR) pattern replacement: Recognized text can be transformed using patterns whose syntax is documented here. For example, to extract just numbers from the recognized text, placing one number per line:
Replace pattern: [0-9]+ with: %0 separator: \n
To extract the first number only:
Replace pattern: ^.-([0-9])+.*$ with: %1
Also see Recognizing Numbers on Race Bibs.
Text OCR model: Specifies the Google text-recognition algorithm. Documents is optimized for documents dense with text, while Photos is optimized for images with small amounts of text (e.g. on signs). My experience is that Documents performs best not only on documents but also on photos, but Google is constantly changing both algorithms.
Include scores in fields: If checked, then the score will be included with each extracted feature, e.g. “mountain (83)” or “Eiffel Tower (94)”.
Set “Include on Export” attribute of keywords: If checked, then each new keyword created by Any Vision will have the Include on Export attribute set, allowing the keyword to be included in the metadata of exported photos.
If you change this setting and want the change to be applied retroactively to all Any Vision keywords: Delete the root Any Vision keyword in the Keyword List panel. Select all the photos you’ve previously analyzed. Do Analyze, click Advanced, and select the option Reassign metadata fields. This will recreate all the keywords with the new setting, without actually sending the photos to Cloud Vision for reanalysis.
Create keywords under root keyword: This is the top-level root keyword of all the keywords added by Any Vision; it defaults to “Any Vision”. Uncheck this option to put all keywords at the top level.
Use subgroups (A, B, C, …) for Labels, Landmarks, and Logos keywords: When checked, Any Vision will create subkeywords A, B, C, … under the parent keywords Any Vision > Labels, Any Vision > Landmarks, and Any Vison > Logos:
For example, the keyword for the label “mountain” would be placed under Any Vision > Labels > M.
This works around a longstanding (and shameful) Lightroom bug on Windows where it chokes if it tries to display more than about 1500 keywords at once.
Copy landmark location to GPS field: Each landmark assigned to a photo by Cloud Vision has an associated latitude/longitude, displayed in the Location field in the Metadata panel. Selecting Always or When GPS field is empty copies the latitude/longitude of the first landmark (the one with the highest score) to the EXIF GPS field. Once the GPS field is set, the photo will appear on the map in the Map module, and Lightroom will do address lookup to automatically set the photo’s Sublocation, City, State / Province, and Country.
Previously analyzed photos: This option tells Any Vision how to handle selected photos that have been previously analyzed:
Skip ignores such photos.
Reanalyze by sending to Google sends the photos to Cloud Vision for reanalysis (and additional cost)—you must choose this if you’ve added a feature to be analyzed.
Reassign metadata fields reassigns the Any Vision metadata fields and keywords using the previous analysis but the current options. This is useful if you’ve changed any of the options that control how the Any Vision metadata fields are set, such as Assign Keywords or Include scores in fields. This option doesn’t incur additional costs for previously analyzed photos.
Concurrently processed photos: This is the number of photos that will be processed in parallel by Any Vision and Cloud Vision. The default value of 1 will have the least impact on interactive use of Lightroom (though it could still be a little jerky). The maximum value of 8 processes photos about 4 times faster, though interactive use will likely be very jerky. (In my testing, larger values didn’t provide any more speedup.)
Translation
By default, labels and other features are returned by Google in English, but the Translation tab lets you translate them to another language. You can specify which features should be translated; features not selected will remain in English.
Any Vision uses Google Cloud Translation, the same technology behind Google Translate. More than 100 languages are supported.Translation does cost additional, but it is quite inexpensive. Any Vision remembers previous translations, so you only pay once for each distinct word or phrase.
You can override the translations of specific words and phrases with an overrides dictionary. Click Edit Overrides to open Finder or File Explorer on the dictionary for the current language (e.g. de.csv for German). The dictionary is in UTF-8 CSV format (comma-separated values), and after the header each following line contains a pair of phrases:
word or phrase in English, word or phrase in target language
The words and phrases are case-senstive. Make sure you save the file in UTF-8 format:
Excel: Do File > Save As, File Format: CSV UTF-8.
TextEdit (Mac): After opening the file, change it to plain text via Format > Make Plain Text. It will save in UTF-8 format.
Notepad (Windows): Do File > Save As, Encoding: UTF-8.
Sort by Color
When the Dominant Color feature is selected, Cloud Vision finds the ten most “dominant” colors in a photo, using an undocumented algorithm. You can see those colors by selecting an analyzed photo and doing Library > Plug-in Extras > Sort by Color:
To find other photos containing a similar dominant color, select all the photos you want to search and invoke Sort by Color. Choose one of the dominant colors of the most-selected photo, or use the color picker at the bottom left to choose another color, and click OK. The current source is changed to the collection Any Vision: Sorted by Color, containing those photos with the most similar dominant colors. Do View > Sort > Custom Order to sort the collection by similarity.
Here are the photos from my main catalog of 30,000 photos with dominant colors closest to the orange-brown chosen in the example photo above:
As another example, here are photos from my catalog labeled “sunset” by Cloud Vision, sorted by similarity to the yellow-orange from the first photo:
Find with Similar Labels
Find with Similar Labels finds photos with similar visual content by comparing the labels assigned by Analyze. This can help find duplicates and near duplicates even if they’re missing metadata, are in different formats (e.g. raw and JPEG), or have been edited or cropped. For example, in a catalog of 32,000 photos, Find discovered these two photos taken three years apart:
Each group of similar photos will be placed in a separate collection in the collection set Any Vision: Similar Labels.
To find near duplicates, set Similarity initially to 95. If that finds too many photos that aren’t near duplicates, try 98 or 99.
If you set Similarity to less than 95, Find can go much slower when run on tens of thousands of photos. You can speed it up considerably by reducing the Include slider from 100%, causing Find to go much faster at the cost of not finding as many similar photos. (The percentage of how many similar photos may be found is a crude estimate.)
Find is only as good as the labels assigned by Cloud Vision. It often does an amazing job at finding near duplicates, but sometimes it’s hilariously bad.
Export to File
To export the analyzed metadata fields to a comma-separated (CSV) text file, one row per photo, select one or more analyzed photos and do Library > Plug-in Extras > Export to File. Open the file in Excel or another spreadsheet program.
Remove Fields
The Remove Fields command removes from the selected photos all custom metadata fields added by Any Vision (but not any keywords). If you run this command on a large number of photos, you may wish to do File > Optimize Catalog afterward to shrink the size of the catalog by a modest amount.
Recognizing Numbers on Race Bibs
Cloud Vision does a decent job at recognizing the numbers on athletes’ race bibs, and Any Vision can make it easy to extract those numbers into metadata fields and keywords.
In the Advanced tab, check the option Text (OCR), with these options:
Replace pattern: [0-9]+ with: %0 separator: \n
This will extract just the numbers from all the recognized text in the photos, one per line.
To search for a bib number, do the menu command Library > Find to open the text search in the Library Filter bar and use the criterion Any Searchable Plug-in Field contains number. Or create a smart collection with the criterion Searchable Text contains words number.
To create keywords for the bib numbers, check the option Assign keywords from text separated by , ; newline. If you’ve checked the option Create keywords under root keyword, these bib-number keywords will appear under the parent keyword Any Vision > Text; otherwise, they’ll appear at the top level.
To copy the bib numbers to the Caption field, set the option:
Text (OCR) copy to Caption
You can instead copy to the Title, Headline, or Source fields.
Similar Services
You may wish to consider these alternative products:
Excire is a Lightroom plugin with capabilities similar to Cloud Vision. But it doesn’t use the cloud and it has a higher one-time upfront cost with no recurring charges.
LrTag is similar to Excire, but it has a monthly charge.
Wordroom is intended for careful keywording of individual stock photos. It’s currently in beta and pricing hasn’t been announced.
MyKeyworder is intended as an aid to careful keywording of smaller numbers of photos rather than searching large catalogs. It has recurring charges.
Sport Vision AI recognizes numbers on athlete’s race bibs, race cars, and motorcycles. It has per-photo charges from $2 to $5 per 1000 photos.
Lightroom Cloud (but not Lightroom Classic) uses similar technology to search your photos, but it doesn’t display or assign the keyword terms it’s inferred. Some people sync their Lightroom Classic with the Lightroom Cloud and search for photos there.
Keyboard Shortcuts
Windows: You can use the standard menu keystrokes to invoke Any Vision > Analyze. ALT+L opens the Library menu, U selects the Plug-in Extras submenu, and A invokes the Any Vision > Analyze command.
To reassign a different final keystroke to the Analyze command, edit the file Info.lua in the plugin folder. Move the & in front of the desired letter in the menu command’s name, changing the name itself if necessary.
To assign a single keystroke as the shortcut, download and install the free, widely used AutoHotkey. Then, in the File Explorer, navigate to the plugin folder anyvision.lrplugin. Double-click Install-Keyboard-Shortcut.bat and restart your computer. This defines the shortcut Alt+A to invoke the Analyze command. To change the shortcut, edit the file Any-Vision-Keyboard-Shortcut.ahk in Notepad and follow the instructions in that file.
Mac OS: You can use the standard mechanism for assigning application shortcuts to plugin menu commands. In System Settings > Keyboard > Keyboard Shortcuts > Application Shortcuts, select Adobe Lightroom. Click “+” to add a new shortcut, in Menu Title type “Analyze” (case matters) preceded by three spaces (“<space><space><space>Analyze”). In Keyboard Shortcut type the desired key or key combination.
Support
Please send problems, bugs, suggestions, and feedback to ellis-lightroom@johnrellis.com.
I’ll gladly provide free licenses in exchange for reports of new, reproducible bugs.
Known limitations and issues:
- Any Vision requires Lightroom 5.7 or later, Lightroom CC 2015, or Lightroom Classic—it relies on features missing from earlier versions.
- As of 3/21/23, you may see the error “Google error (13): Internal server error. Unexpected feature response”. This bug has been acknowledged by Google but not yet fixed. On average, it seems to occur in about one of every couple hundred photos. You can usually avoid the bug by unchecking the Landmarks feature. On 8/22/23, Google Cloud Support said that Google engineering is still working on the issue but there is no ETA.
- If the Any Vision window is too large for your display, with the buttons at the bottom cut off, do File > Plug-in Manager, select Any Vision, and in the Settings panel, check the option Use Any Vision on small displays. Unfortunately, a bug in Lightroom prevents plugins from determining the correct size of the display.
- If you’re concerned about accidentally spending too much on Google Cloud Vision or someone stealing your Google license key, you can create a billing alert that will notify you when monthly spending exceeds a specified amount.
- Unlike many photo-sharing services, with Google Cloud Vision your photos remain exclusively your photos. Google promises not to process your photos for any other purpose. See their Data Processing and Security Terms.
- Cloud Vision infrequently returns errors such as “The request timed out” and “Image processing error!”. If this occurs, just rerun Analyze. I don’t know why these errors occur—they appear spurious.
- When your keyword list has multiple keywords of the same name, Lightroom distinguishes them in the Keyword List panel using the notation child-keyword < parent-keyword. For example, if you have an existing keyword “House” and Any Vision adds the label keyword “House”, the latter will be displayed as House < Labels to distinguish it from the top-level House. But these two keywords will get exported in photo metadata simply as “House”.
- If you upgrade from versions 1.2 or 1.3 and get the error, “Problem getting supported languages. The Google Cloud Translation API…”, you’ll need to enable the Translation API. In your browser, go to console.cloud.google.com, log in if necessary, and follow steps 12–14 of Get a Google Cloud Key.
Version History
1.2
- Initial release.
1.3
- The Use subgroups for keywords option now correctly handles keywords starting with non-English characters.
- Recognized text can optionally be copied to an IPTC field.
1.4
- Better handling of errors reported by Google, such as an expired credit card.
- Pattern replacement for transforming recognized text, e.g. to extract jersey numbers.
- Translation of features to other languages. See Support for how to enable the Google Cloud Translation API.
1.5
- You can now specify a root keyword other than “Any Vision” on the Advanced tab.
- Fixed a bug preventing recognized text in other languages being translated to English.
- Now ignores missing photos instead of giving a cryptic informational message.
- Works around obscure export bug in Lightroom 5.
1.6
- The progress bar updates more incrementally with very large selections.
1.7
- Dense multi-column text is handled better.
- Analyze is a little faster.
1.8
- Remove Fields command removes Any Vision custom metadata from catalog.
- Better handling of spurious errors from Google Cloud Vision.
- Increased maximum value of Concurrently processed photos to 8.
1.9
- Advanced option OCR model for choosing which Google text-recognition algorithm to use.
1.10
- Made the default for the advanced option OCR model to be Photos, which experiments confirm performs better than Documents when analyzing photos containing signs, t-shirts, etc.
1.11
- Find with Similar Labels finds photos with similar visual content by comparing their labels.
1.12
- Worked around Lightroom bug creating keywords for labels.
- Provided option for running Any Vision on small displays, necessitated by Lightroom 10’s change of fonts.
1.13
- Create top-level keywords by unchecking the Advanced option Create keywords under root keyword.
- Create keywords from recognized text (OCR) with the Advanced option Assign keywords from text.
- Added guide to recognizing numbers on athlete’s race bibs.
- Fixed Export to File to export recognized text after applying any pattern replacements, not the original text.
- Silently skips over missing photos rather than giving an obscure error.
1.14 2023-03-21
- Correctly handles “Internal server error” responses from Google Cloud Vision.
1.15 2023-11-23
- Help button works again.
- Better logging of unexpected responses from Google.
1.16 2024-05-16
- Better error messages when Lightroom fails to export a photo being analyzed.
- Shows how many selected files will be skipped because they’re missing or videos.
- The temporary folder contain the exported reduced-resolution images sent to Google is once again deleted correctly.
- Correctly handles when the translation-cache file gets deleted from outside Lightroom.
1.17 2024-08-25
- Smart previews can once again be analyzed.
- Minor internal code improvements.
1.18 2024-09-05
- When you install the plugin, you must now get a Google Cloud key to use the free trial of 50 photos. The Cloud key will let you analyze tens of thousands of photos for free (after buying an Any Vision plugin license). Previous versions provided a free-trial Google Cloud key, but hackers abused that (costing me money).