Any Vision uses Google AI to tag automatically your photos with objects, activities, landmarks, logos, face expressions, and dominant colors and extracts embedded text (OCR). You can search these tags in Lightroom, making it much easier to find photos in large catalogs. You can export the tags in photo metadata as keywords and GPS locations or in comma-separated text files. And you can translate the tags to more than 100 different languages.
Find photos with similar visual content; for example, find duplicates and near duplicates even if they’re missing metadata, are in different formats, or have been cropped or edited.
Any Vision uses Google Cloud Vision, the state-of-the-art machine-learning AI technology underlying Google image search. Though you’ll have to get a Google Cloud key as well as an Any Vision license, Google’s pricing lets you analyze for free up to 212,000 photos in the first 12 months and then up to 1,000 photos every month thereafter.
Consider similar services with different features and pricing.
Here’s an example showing labels, landmarks, face expression, and recognized text; Cloud Vision has correctly located the photo within a few meters and extracted much of the visible text from the store signs and the granite plaque on the statue:
Here’s an example of logo detection, where the logos are partially obscured but still recognized:
And here’s an example showing correctly recognized jersey numbers:
Download and Install
Any Vision requires Lightroom 5.7 or later, Lightroom CC 2015, or Lightroom Classic. (The newer cloud-focused Lightroom doesn’t support plugins.)
- Download anyvision.1.12.zip. (What’s changed in this version)
- If you’re upgrading from a previous version of Any Vision, exit Lightroom and replace the existing anyvision.lrplugin folder with the new one extracted from the downloaded .zip. Restart Lightroom and you’re done.
- If this is a new installation, extract the folder anyvision.lrplugin from the downloaded .zip and move it to a location of your choice.
- In Lightroom, do File > Plug-in Manager.
- Click Add, browse and select the anyvision.lrplugin folder, and click Select Folder (Windows) or Add Plug-in (Mac OS).
The free trial is limited to analyzing 50 photos—after that, you’ll need to buy an Any Vision license and a Google Cloud key.
To use Any Vision after the free trial ends, you’ll need both an Any Vision plugin license and a Google Cloud key tied to a billing account you set up with Google. Cloud Vision costs little or nothing for most users—see Cloud Vision Pricing for details.
Buy a License
- Buy a license at a price you think is fair:
The license includes unlimited upgrades. Make sure you’re satisfied with the free trial before buying.
- Copy the license key from the confirmation page or confirmation email.
- Do Library > Plug-in Extras > Any Vision > Analyze.
- Click Buy.
- Paste the key into the License key box and click OK.
Get a Google Cloud Key
Setting up a Google billing account and getting a Cloud key is a little tedious but straightforward if you follow these steps exactly. I recommend that you print this or arrange arrange two browser windows to be visible. (Unfortunately, Google doesn’t provide any way for an application like Any Vision to make this simpler.)
- In a browser, go to console.cloud.google.com.
- Create a Google account or sign in with an existing one (for example, your Gmail account).
- Agree to the Terms of Service.
- At the top, click “TRY FOR FREE”:
- Agree to the Google Cloud Platform Free Trial Terms of Service.
- Enter your billing information and click “START MY FREE TRIAL”.
- In “Welcome name!”, click “GOT IT”.
- Click the menu button in the upper left, hover over “APIs & Services”, and click on “Library”:
- In the “Search for APIs & services” box, type “cloud vision”, and then click on “Google Cloud Vision API”:
- Click “ENABLE”:
- Click the menu button in the upper left, hover over “APIs & Services”, and click on “Library”:
- In the “Search for APIs & services” box, type “cloud translation”, and then click on “Google Cloud Translation API”:
- Click “ENABLE”:
- Click the menu button in the upper left, hover over “APIs & Services”, and click on “Credentials”:
- Click “Create credentials”, then “API key”:
- Copy the API key by clicking the copy button:
- In Lightroom, do Library > Plug-in Extras > Any Vision > Analyze and click Google Key at the bottom:
- Paste the key into Google key and click OK:
Google Cloud Vision Pricing (as of March 2018)
Google charges your billing account monthly for each photo you analyze. There are no upfront charges, and you can disable your account at any time.
Each of the seven features (Labels, Landmarks, Logos, Faces, Safety, Text, and Dominant Color) costs $1.50 / 1000 photos, except for Safety, which is free if you also select Labels. For example, selecting Labels and Landmarks costs $3.00 / 1000 photos, and selecting all seven features costs $9.00 / 1000 photos.
Translation of labels and other features to other languages costs $20 per million characters or about 100,000 distinct labels. (My main catalog has 30,000 images with 2500 distinct labels, and translating them to another language cost about $0.50.)
Though this may sound expensive for larger catalogs, Google provides incentives that lower the cost considerably. Most users will pay nothing or very little every month.
For each feature, the first 1000 photos are free each month. For example, if you select all seven features, you can analyze 1000 photos monthly for free.
Google also offers $300 free credit for creating your first billing account (the credit must be used within 12 months). If you select just one feature (e.g. Labels), that’s enough for 200,000 photos, or nearly 29,000 photos for all seven features.
Using Any Vision
Select the photos to be analyzed and do Library > Plug-in Extras > Any Vision > Analyze. Select the features you want to tag and click OK:
Any Vision exports reduced-size versions of the photos, sends them to Google, and processes the results. Typically it takes about 4 seconds per photo (more if you have a slower Internet connection). See Advanced for how to reduce this to 1 second per photo, at the expense of making Lightroom less usable interactively while Analyze is running.
Once you’ve analyzed a photo, by default Any Vision won’t reanalyze it. So if you change the set of selected features, doing Analyze again won’t have any effect. See Advanced for how to force Any Vision to resend photos to Google to be reanalyzed (at additional cost).
You can see the results in the Metadata panel (in the right column of Library) with the Any Vision tagset:
Following each label and landmark is a numeric score, e.g. “mountain (85)”, indicating Google’s estimate of the likelihood of that label or landmark. Faces and safety terms have bucketed scores ranging from “very unlikely” to “very likely”.
Any Vision also assigns hierarchical keywords using this hierarchy:
For example, if the photo has the label “mountain”, then the keyword Any Vision > Labels > mountain is assigned to the photo.
Labels are objects, activities, and qualities, such as mountain, tabby cat, toddler, safari, road cycling, rock climbing, white. In my main catalog of nearly 30,000 photos, Cloud Vision recognized 2500 distinct labels.
Landmarks are specific locations of where the photo was taken or of objects in the photo. Examples include Paris, Eiffel Tower, Denali National Park, Salt Lake Tabernacle Organ, Squaw Valley Ski Resort, Kearsarge Pass. But landmarks aren’t necessarily famous or well-known—they can be obscure local landmarks, such as statues or waterfalls. Photos may be tagged with more than one landmark. In my catalog, Cloud Vision recognized 500 distinct locations.
Each landmark has a GPS location, and the arrow button to the right of the Map field will open Google Maps on the first (most likely) landmark location in the photo.
Logos are product or service logos, such as Coca-Cola, Office Depot, SpongeBob SquarePants, Tonka. In my catalog, Cloud Vision recognized 110 distinct logos.
Faces: Cloud Vision identifies the “sentiment”, or expression, of each recognized face: Joy, Sorrow, Anger, Surprise. It may also tag a face as Under Exposed, Blurred, or wearing Headwear.
Safety identifies whether the photo is “safe” for Google image search: Adult, Spoof, Medical, and Violence. In my main catalog of 30,000 photos, only 213 received one of these safety tags, and most of them weren’t very accurate. Exposed skin triggers “adult”, regardless of whether it’s a bikini-clad woman, a shirtless teen, or babies in diapers.
Text contains text recognized in photos using optical character recognition (OCR). Cloud Vision appears to do a reasonable job of recognizing text on signs, plaques, athletic jerseys, etc.
Dominant Color. Cloud Vision identifies the ten most “dominant” colors in a photo, using an undocumented algorithm. Use the Sort by Color command to see those colors for a photo and find other photos with similar colors.
You can search photos’ features using the Library Filter bar, smart collections, or the Keyword List. For example, to find photos assigned the label “mountain”, you could do:
Do Library > Enable Filters if the Library Filter bar isn’t showing, and then click Text.
To search just the Labels field, use this smart-collection criterion:
Alternatively, in the Keyword List panel, type “mountain” in the Filter Keywords box, then click the arrow to the far right of the “mountain” keyword:
The Advanced tab provides more flexibility for using Any Vision:
Score Threshold: For each label, landmark, logo, etc. Cloud Vision assigns a score, an estimate of the probability it is correct. You can set a per-feature score threshold, and only those labels, landmarks, etc. that have at least that score will be assigned to the photo.
Labels, landmarks, and logos have scores from 0 to 100, though in practice, Cloud Vision returns only those with a score of at least 50. Faces and Safety values have bucketed scores ranging from “very unlikely” to “very likely”.
Assign Keywords:If checked, Any Vision assigns a keyword for each extracted feature. For example, if the photo has the label “mountain”, then the keyword Any Vision > Labels > mountain is assigned to the photo. You can enable or disable this on a per-feature basis.
Text (OCR) copy: Recognized text can be copied from the Text field in the Metadata panel to one of the standard IPTC fields Caption, Headline, Title, or Source. copy always copies the text, copy if empty copies the text only if the destination IPTC field is empty, append appends the text to the end of the destination IPTC field, and don’t copy never copies the text.
Text (OCR) pattern replacement: Recognized text can be transformed using patterns whose syntax is documented here. For example, to extract just numbers from the recognized text, placing one number per line:
Replace pattern: [0-9]+ with: %0 separator: \n
To extract the first number only:
Replace pattern: ^.-([0-9])+.*$ with: %1
Text OCR model: Specifies the Google text-recognition algorithm. Documents is optimized for documents dense with text, while Photos is optimized for images with small amounts of text (e.g. on signs). My experience is that Documents performs best not only on documents but also on photos, but Google is constantly changing both algorithms.
Include scores in fields: If checked, then the score will be included with each extracted feature, e.g. “mountain (83)” or “Eiffel Tower (94)”.
Set “Include on Export” attribute of keywords: If checked, then each new keyword created by Any Vision will have the Include on Export attribute set, allowing the keyword to be included in the metadata of exported photos.
If you change this setting and want the change to be applied retroactively to all Any Vision keywords: Delete the root Any Vision keyword in the Keyword List panel. Select all the photos you’ve previously analyzed. Do Analyze, click Advanced, and select the option Reassign metadata fields. This will recreate all the keywords with the new setting, without actually sending the photos to Cloud Vision for reanalysis.
Use subgroups (A, B, C, …) for Labels, Landmarks, and Logos keywords: When checked, Any Vision will create subkeywords A, B, C, … under the parent keywords Any Vision > Labels, Any Vision > Landmarks, and Any Vison > Logos:
For example, the keyword for the label “mountain” would be placed under Any Vision > Labels > M.
This works around a longstanding (and shameful) Lightroom bug on Windows where it chokes if it tries to display more than about 1500 keywords at once.
Root keyword: This is the top-level root keyword of all the keywords added by Any Vision; it defaults to “Any Vision”.
Copy landmark location to GPS field: Each landmark assigned to a photo by Cloud Vision has an associated latitude/longitude, displayed in the Location field in the Metadata panel. Selecting Always or When GPS field is empty copies the latitude/longitude of the first landmark (the one with the highest score) to the EXIF GPS field. Once the GPS field is set, the photo will appear on the map in the Map module, and Lightroom will do address lookup to automatically set the photo’s Sublocation, City, State / Province, and Country.
Previously analyzed photos: This option tells Any Vision how to handle selected photos that have been previously analyzed:
Skip ignores such photos.
Reanalyze by sending to Google sends the photos to Cloud Vision for reanalysis (and additional cost)—you must choose this if you’ve added a feature to be analyzed.
Reassign metadata fields reassigns the Any Vision metadata fields and keywords using the previous analysis but the current options. This is useful if you’ve changed any of the options that control how the Any Vision metadata fields are set, such as Assign Keywords or Include scores in fields. This option doesn’t incur additional costs for previously analyzed photos.
Concurrently processed photos: This is the number of photos that will be processed in parallel by Any Vision and Cloud Vision. The default value of 1 will have the least impact on interactive use of Lightroom (though it could still be a little jerky). The maximum value of 8 processes photos about 4 times faster, though interactive use will likely be very jerky. (In my testing, larger values didn’t provide any more speedup.)
By default, labels and other features are returned by Google in English, but the Translation tab lets you translate them to another language. You can specify which features should be translated; features not selected will remain in English.
Any Vision uses Google Cloud Translation, the same technology behind Google Translate. More than 100 languages are supported.Translation does cost additional, but it is quite inexpensive. Any Vision remembers previous translations, so you only pay once for each distinct word or phrase.
You can override the translations of specific words and phrases with an overrides dictionary. Click Edit Overrides to open Finder or File Explorer on the dictionary for the current language (e.g. de.csv for German). The dictionary is in UTF-8 CSV format (comma-separated values), and after the header each following line contains a pair of phrases:
word or phrase in English, word or phrase in target language
The words and phrases are case-senstive. Make sure you save the file in UTF-8 format:
Excel: Do File > Save As, File Format: CSV UTF-8.
TextEdit (Mac): After opening the file, change it to plain text via Format > Make Plain Text. It will save in UTF-8 format.
Notepad (Windows): Do File > Save As, Encoding: UTF-8.
Sort by Color
When the Dominant Color feature is selected, Cloud Vision finds the ten most “dominant” colors in a photo, using an undocumented algorithm. You can see those colors by selecting an analyzed photo and doing Library > Plug-in Extras > Sort by Color:
To find other photos containing a similar dominant color, select all the photos you want to search and invoke Sort by Color. Choose one of the dominant colors of the most-selected photo, or use the color picker at the bottom left to choose another color, and click OK. The current source is changed to the collection Any Vision: Sorted by Color, containing those photos with the most similar dominant colors. Do View > Sort > Custom Order to sort the collection by similarity.
Here are the photos from my main catalog of 30,000 photos with dominant colors closest to the orange-brown chosen in the example photo above:
As another example, here are photos from my catalog labeled “sunset” by Cloud Vision, sorted by similarity to the yellow-orange from the first photo:
Find with Similar Labels
Find with Similar Labels finds photos with similar visual content by comparing the labels assigned by Analyze. This can help find duplicates and near duplicates even if they’re missing metadata, are in different formats (e.g. raw and JPEG), or have been edited or cropped. For example, in a catalog of 32,000 photos, Find discovered these two photos taken three years apart:
Each group of similar photos will be placed in a separate collection in the collection set Any Vision: Similar Labels.
To find near duplicates, set Similarity initially to 95. If that finds too many photos that aren’t near duplicates, try 98 or 99.
If you set Similarity to less than 95, Find can go much slower when run on tens of thousands of photos. You can speed it up considerably by reducing the Include slider from 100%, causing Find to go much faster at the cost of not finding as many similar photos. (The percentage of how many similar photos may be found is a crude estimate.)
Find is only as good as the labels assigned by Cloud Vision. It often does an amazing job at finding near duplicates, but sometimes it’s hilariously bad.
Export to File
To export the analyzed metadata fields to a comma-separated (CSV) text file, one row per photo, select one or more analyzed photos and do Library > Plug-in Extras > Export to File. Open the file in Excel or another spreadsheet program.
The Remove Fields command removes from the selected photos all custom metadata fields added by Any Vision (but not any keywords). If you run this command on a large number of photos, you may wish to do File > Optimize Catalog afterward to shrink the size of the catalog by a modest amount.
You may wish to consider these alternative products:
Cloud Tagger also uses Google Cloud Vision, but it is intended as an aid to careful keywording of smaller numbers of photos rather than searching large catalogs. It currently has no recurring charges.
Excire is a Lightroom plugin with capabilities similar to Cloud Vision. But it doesn’t use the cloud and it has a higher one-time upfront cost with no recurring charges.
LrTag is similar to Excire, but it has a monthly charge.
Lightroom Keywords is intended for careful keyword of smaller numbers of photos rather than searching large catalogs. It has per-photo charges.
Wordroom is intended for careful keywording of individual stock photos. It’s currently in beta and pricing hasn’t been announced.
MyKeyworder is intended as an aid to careful keywording of smaller numbers of photos rather than searching large catalogs. It has recurring charges.
Lightroom CC (but not Lightroom Classic CC) uses similar technology to search your photos, but it doesn’t display or assign the keyword terms its inferred.
Windows: You can use the standard menu keystrokes to invoke Any Vision > Analyze. ALT+L opens the Library menu, U selects the Plug-in Extras submenu, and A invokes the Any Vision > Analyze command.
To assign a single keystroke as the shortcut, download and install the free, widely used AutoHotkey. Then, in the File Explorer, navigate to the plugin folder anyvision.lrplugin. Double-click Install-Keyboard-Shortcut.bat and restart your computer. This defines the shortcut Alt+A to invoke the Analyze command. To change the shortcut, edit the file Any-Vision-Keyboard-Shortcut.ahk in Notepad and follow the instructions in that file.
Mac OS: You can use the standard mechanism for assigning application shortcuts to plugin menu commands. In System Preferences > Keyboard > Keyboard Shortcuts > Application Shortcuts, select Adobe Lightroom. Click “+” to add a new shortcut, in Menu Title type “Analyze” (case matters) preceded by three spaces (“<space><space><space>Analyze”). In Keyboard Shortcut type the desired key or key combination.
Please send problems, bugs, suggestions, and feedback to email@example.com.
I’ll gladly provide free licenses in exchange for reports of new, reproducible bugs.
Known limitations and issues:
- Any Vision requires Lightroom 5.7 or later, Lightroom CC 2015, or Lightroom Classic—it relies on features missing from earlier versions.
- If the Any Vision window is too large for your display, with the buttons at the bottom cut off, do File > Plug-in Manager, select Any Vision, and in the Settings panel, check the option Use Any Vision on small displays. Unfortunately, a bug in Lightroom prevents plugins from determining the correct size of the display.
- If you’re concerned about accidentally spending too much on Google Cloud Vision or someone stealing your Google license key, you can create a billing alert that will notify you when monthly spending exceeds a specified amount.
- Unlike many photo-sharing services, with Google Cloud Vision your photos remain exclusively your photos. Google promises not to process your photos for any other purpose. See their Data Processing and Security Terms.
- Cloud Vision infrequently returns errors such as “The request timed out” and “Image processing error!”. If this occurs, just rerun Analyze. I don’t know why these errors occur—they appear spurious.
- If you upgrade from versions 1.2 or 1.3 and get the error, “Problem getting supported languages. The Google Cloud Translation API…”, you’ll need to enable the Translation API. In your browser, go to console.cloud.google.com, log in if necessary, and follow steps 12–14 of Get a Google Cloud Key.
- Initial release.
- The Use subgroups for keywords option now correctly handles keywords starting with non-English characters.
- Recognized text can optionally be copied to an IPTC field.
- Better handling of errors reported by Google, such as an expired credit card.
- Pattern replacement for transforming recognized text, e.g. to extract jersey numbers.
- Translation of features to other languages. See Support for how to enable the Google Cloud Translation API.
- You can now specify a root keyword other than “Any Vision” on the Advanced tab.
- Fixed a bug preventing recognized text in other languages being translated to English.
- Now ignores missing photos instead of giving a cryptic informational message.
- Works around obscure export bug in Lightroom 5.
- The progress bar updates more incrementally with very large selections.
- Dense multi-column text is handled better.
- Analyze is a little faster.
- Remove Fields command removes Any Vision custom metadata from catalog.
- Better handling of spurious errors from Google Cloud Vision.
- Increased maximum value of Concurrently processed photos to 8.
- Advanced option OCR model for choosing which Google text-recognition algorithm to use.
- Made the default for the advanced option OCR model to be Photos, which experiments confirm performs better than Documents when analyzing photos containing signs, t-shirts, etc.
- Find with Similar Labels finds photos with similar visual content by comparing their labels.
- Worked around Lightroom bug creating keywords for labels.
- Provided option for running Any Vision on small displays, necessitated by Lightroom 10’s change of fonts.