The Beautiful Machine: EyeEm's Lorenz Aschoff on Using Computer Vision to Recognize Aesthetics in Images

Among the many industries that the rise of the visual web has disrupted is stock photography. With billions of images captured, uploaded and shared online every day, it's become increasingly difficult for editorial and marketing departments alike to find the perfect images at scale and at speed.

Among the many industries that the rise of the visual web has disrupted is stock photography. With billions of images captured, uploaded and shared online every day, it's become increasingly difficult for editorial and marketing departments alike to find the perfect images at scale and at speed. Sure, a lot of off-the-shelf image recognition tools excel at finding objects, scenes and faces, everything from cats and street signs to forests and celebrities, but an artsy, focused face with a blurry background portrait? Not so much.


Founded in 2011 in Berlin, EyeEm is a next-generation stock photography house that uses computer vision and machine learning to identify images by their aesthetic qualities. And by aesthetic, we mean of the pleasing variety, for the most part. EyeEm offers a mobile phone app that photographers can use to identify the most aesthetically pleasing and commercially viable images on their smartphones, then upload them to the company's online marketplace, where they can be efficiently (thanks to computer vision) searched for and purchased by stock photo shoppers. How does this lofty, artful approach to computer vision work, and why should every photographer or photo editor use it be found as or find, respectively, the next Vivian Maier? We asked Lorenz Aschoff, the Berlin-based founder and CEO of EyeEm, to give us a review of exactly how EyeEm works.


What does EyeEm do?


We build AI that understands aesthetics and beauty. Here's the problem we're trying to solve: Since the invention of digital capture, people have taken trillions of images, and there's this massive amount of data piling up. It's increasingly hard to find relevance among all these pictures, and we're actually lacking a way to review and curate them effectively. This is as much an issue for individual consumers as it is for those in the professional space. So, we've developed an Android and iOS app with a network of approximately 22 million photographers, both amateur and professional, and they upload their images there. Then our proprietary computer vision and machine learning technology reviews the images that have been submitted and finds the best ones in aesthetic terms, but also in terms of commercial value. The product is aimed not only at photographers-amateur and professional alike-who want to monetize the images they upload, but also photo editors at media companies and marketing departments alike who need to find relevant imagery. The app also helps you find the kinds of photos you're looking for faster on your phone.


That last feature sounds a lot like Google Photos - How is EyeEm different?


Well, the main thing is that we're focused on stock photography-that's our business model versus Google Photos, which is aimed at consumers-but a key difference in terms of how our technology works is that we're focused on aesthetics, understanding the beauty in photos rather than just the content. We do a lot of keywording in the same way that Google Photos does it-labeling images that have mountains, faces, animals, objects and so on-but our keywords are often applied to aesthetics and image composition rather than basic objects. To simplify, we basically take images by the best photographers in the world and feed them to the machine, which uses deep learning to find commonalities within these aesthetically strong images.


We also review some of these images with humans, so that we're actually benchmarking. In other words, we employ the technology to identify the gems, and it's systematically learning to identify the highlights with everything, but then we're benchmarking what humans like with what the machine likes, almost as a sort of reinforcement. And that's the key to success, especially with deep learning. Then, we can throw in any other image that you just captured and we can, essentially, say, “Okay, how likely is it that, speaking from an aesthetic standpoint, this image is going to resonate well with your brain, that you might find it beautiful.” Then this basic technology can be applied to, for example, scanning the images on your phone and surfacing the most beautiful images you have on your phone, or also telling you which version of the same image that you may have taken five or six times is the best one.


Does the machine actually come up with esthetics rules?


Not in a classic sense, and it's always learning. This is a good example of how a human understanding of definitions of aesthetics clashes with how a machine works and understands aesthetics. Unlike humans, machines don't have any rules or think in aesthetic concepts like “symmetry is beautiful” or “the golden ratio is beautiful.” It's much more abstract than that. We as humans may not be able to understand why something is beautiful, we just know it, but the machine is able to identify patterns that correlate to what would be considered aesthetic qualities. Eventually, it will find symmetrical and golden ratio-based images, and it will do it quickly and at scale, but exactly how it works can't be put into words, since machines learn in a non-exact, heuristic way, which is frustrating to a lot of people.


It might be frustrating to some photo editors at magazines, websites and marketing departments, too.


Well, some people are definitely irritated about the idea that there's a technology that can actually help them amplify what they do, and that's, unfortunately, because a big part of the debate around AI and the workplace is a little simplified. But, as I mentioned, one of the key challenges in both an editorial and marketing context is that if you're searching for photos or videos in the stock space, the aesthetics can be all over the map. Sometimes the image search results are really inauthentic and it takes a massive amount of time to find what's actually relevant to you. Our technology ensures that you see only the content that's aesthetically in line with what you're looking for. It allows you to work with more input at once and cut down the time you spend searching and dismissing what's not right for you, which allows you to focus on other tasks and really get down to the business of getting the actual curation right.


And in terms of calming the nerves of photo professionals who might feel threatened by AI tools such as EyeEm, we think a lot of that has to do with the interface. You just have to make suggestions and people can take them or leave them. We don't ever say, “this is the best image right now,” but instead, “how about this” or “how do you like this,” so that it's a very unforced prompt.


So how does it filter down search results? After all, beauty is in the eye of the beholder.


I think one of the key messages we need to get across about our tool is that photo editors, photographers, brands and so on can train and control the tool based on the content they put in. You can train it on any aesthetics you want. So, for example, if you only provide portraits, you'll only get portraits back. Or you can narrow down by depth of field or close-up portraits, or only black-and-white photography. It can be a huge time save for brands that have a specific aesthetic look.


One of our customers is the Boston Consulting Group. It has around 8,000 consultants around the world, and the company needs them to understand the visual language that they should be using in presentations, brochures or any interactions they have with clients from a visual standpoint. And, as smart as consultants may be in terms of consulting and business, they may not have an instinctual understanding of aesthetics, especially aesthetics that will be aligned with the brand.


So, BCG's marketing team provided us with a range of about 30 images across different topics and scenes and so on, that are in line with a recent rebranding that the firm did. We then used our aesthetics technology to build a personalized search engine filter from the input that the marketing team provided, so that whenever there is an image search for photography via our platform, BCG consultants will get content back that is in line with the brand's new aesthetics.


You mentioned earlier that EyeEm can also search for images with commercial value. How does this work?


We're currently working on a bunch of approaches. We just built a tool called IM Social, for example, where we can visually analyze a brand's Instagram account: what they previously posted, what the resonance was on these posts, and then, based on this, we can understand the aesthetic style that resonates with the followership, then start to predict or suggest images that a brand's followers are likely to engage with in the future. We're also working on using this technology to provide intelligent recommendations for an ad campaign, whether it's on Instagram, Facebook or online. Using this same idea, you could also enrich the aesthetics data with the conversion data of actual ads, which would allow you to not only predict images that are in line with your brand, but which also will perform well for conversion.


What's next for EyeEm?


We just started working on video, but it's much more complex because videos contain multiple scenes and perspectives, so there are a lot of challenges to solve there. In the end, we do this as a sequence of photos. We're also prototyping the ability to give machine-based aesthetics critiques. In other words, suggestions for capturing and adjusting the composition of photos in certain ways, or what kinds of filters and color edits you can apply to raise the aesthetic quality of the image. This isn't really the main goal for us right now, but it certainly will be an interesting challenge to translate the abstract way in which a machine understands aesthetics into human terms and actual action words that make sense. This is very complex, this intersection of what the right interface between AI and humans is, and no one has quite perfected it yet.


by Marina Esmeraldo

You might also like

Click this icon to scroll back to the top of the page