Industry perspective: How LinkedIn developed a differentially private data analytics API at scale

Andrew Moon
June 11th, 2021 · 2 min read

Utilizing data while protecting the privacy of individuals is a common challenge across organizations. Privacy engineering is an increasingly popular approach to balancing data collection with respectful data use. Within the discipline, differential privacy is one potential technique for analyzing large datasets while preserving the privacy of individuals within the dataset.

Ryan Rogers, Staff Software Engineer at LinkedIn, joined our Privacy_Infra() event to talk about how they built an audience engagement API that leverages differential privacy to protect user information while providing data insights that enable marketing analytics-related applications.

You can watch a recording of Ryan’s talk below, starting at 38:25.

According to Ryan, the project was a collaborative effort among multiple teams including data science applied research, backend infrastructure, and marketing solutions.

He noted that compliance was only one reason they chose to invest in this privacy system. Other reasons included a commitment to prioritize the experience of LinkedIn members as well as defend against potential attacks against their privacy such as reconstruction, differencing, and inference attacks.

“Differential privacy introduces noise or randomness into the computation so that the result that you get of your algorithm or of your computation is randomized,” Ryan explained. “So you get an actual distribution of all possible outcomes.”

The goal of differential privacy is to protect the identity of individuals within the dataset by providing computational results that don’t depend on any specific individual’s data being included. Differential privacy measures the distance between the original distribution and one that’s been randomized.

“In mathematics we say that a randomized algorithm, one that introduces noise, is differentially private if for any two neighboring datasets—differing in at least one person’s record—the output distributions are close to one another.” Ryan continued.

There are two parameters to determine closeness. One is epsilon, commonly referred to as the “privacy loss” parameter. The smaller the epsilon, the closer together the distributions are, and the smaller your privacy loss is. A large epsilon means the distributions are far apart and there is less privacy for the individuals in the dataset. The second parameter is delta, an additive factor that says the privacy loss is bounded most of the time.

LinkedIn’s use case includes their audience engagement API, which provides insights on content and audience data to external marketing partners. It’s built on top of Pinot, an open source project for fast, real-time data analytics. According to Ryan, the questions they needed to ask for this project included how much can a single user impact the outcome of analytics queries and how many queries should an advertiser be allowed to ask.

To address these questions, Ryan explained, LinkedIn built a privacy system with a budget management service to enforce a differential privacy budget on the returned results. This prevents analysts from being able to reconstruct the dataset by running a large number of queries on the same dataset.

Watch Ryan’s full talk from Privacy_Infra() below to learn more about LinkedIn’s differentially private data analytics API.

Note: This post reflects information and opinions shared by speakers at Transcend’s ongoing privacy_infra() event series, which feature industry-wide tech talks highlighting new thinking in data privacy engineering every other month. If you’re working on solving universal privacy challenges and interested in speaking about it, submit a proposal here.

More articles from Transcend

Engineering a consent sandbox to eliminate annoying pop-ups and dark patterns

Transcend CEO Ben Brook presented at the PEPR 2021 conference on our engineering journey to find a way to eliminate pop-ups while maintaining regulatory compliance.

June 10th, 2021 · 1 min read

Announcing our Consent Manager in exclusive beta

Existing consent solutions are broken. We are invested in reinventing the consent experience and are excited to share our product journey.

June 9th, 2021 · 3 min read

Privacy XFN

Sign up for Transcend's weekly privacy newsletter.

San Francisco, California Copyright © 2022 Transcend, Inc.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Link to $ to $ to $