Google is using deep learning and data analysis to curate the Play Store

Google has released some details from the Security and Privacy team about how Google Play is being curated, and machine learning plays a big part.

Google has two basic goals for applications in the Play Store: safety and exposure. The Security and Privacy team wants to weed out apps with malware, but they're also concerned about applications that ask for broad permissions that might not be needed. In turn, when good apps that follow good practices are found, the team wants them to be featured in the Play Store.

Machines build peer groups to study what apps can do and if they should be doing it.

One of the ways they do this is by using what is called "peer groups". Applications with similar capabilities are grouped together. Apps like Spotify and Pandora (for example) are different from each other, but they have the same basic functions and are designed to stream music to your Android using details from your account with each service. The same goes for Twitter and Facebook or apps like coloring books. When they do the same basic things, they get lumped together. This makes it easier to study what the apps are doing, how they are doing it, and if they should be doing it at all.

They are then analyzed to see what they request from your device when it comes to personal data. Ideally, every app in a peer group will request the same types of information and have a good reason to do so. But sometimes, one will be an outlier. Google gives the example of a coloring book app that requests fine location details through GPS. Other coloring book apps don't do this, so one that does would be subject to further review by the Security and Privacy team.

There are too many apps in Google Play to do this by hand.

There are too many apps in Google Play for humans to do this effectively, so Google has employed some machine learning techniques to automate much of the process. Deep learning algorithms study the language in the app, data about what the app does and how it does is analyzed by computer, and the peer groups themselves are built by these machines based on things like app metadata and text descriptions as well as metrics like user installs.

Google does plenty to keep malware from getting on your phone through Google Play, but this is also to educate developers about the complex (very) permission model Android uses. this is a pretty cool way to use computers that help users and developers, and it's great that Google is willing to share some information about how it's being done.

Jerry Hildenbrand
Senior Editor — Google Ecosystem

Jerry is an amateur woodworker and struggling shade tree mechanic. There's nothing he can't take apart, but many things he can't reassemble. You'll find him writing and speaking his loud opinion on Android Central and occasionally on Twitter.