Industry perspective: Engineering with a Lean Data Diet in mind

February 19, 20211 min read

Share this article

The Lean Data Diet didn’t have a catchy name until recently, but it’s a methodology that Nuria Ruiz and other engineers at the Wikimedia Foundation and Wikipedia have been practicing for a long time.

“This idea of the Lean Data Diet is…that you should be very purposeful with the data you gather and make sure that it truly enhances the product and the value that you’re trying to provide to your customers,” she explained at a recent privacy_infra() event.

Wikipedia is built on the idea of a Free Knowledge Movement, and they believe “there cannot be access to free knowledge without a strong guarantee of privacy.” According to Ruiz, a volunteer Engineer with the Wikimedia Foundation, they achieve privacy through three engineering measures: deleting data at scale, sanitizing data, and building a privacy culture.

For example, deleting data at scale is traditionally a dangerous undertaking that cannot be undone, but Wikipedia has developed a system with a built-in safety net to prevent unintended data loss.

They rely on a final checksum argument determined early in the deletion process to verify that the final data set is correct. “That way, if you want to schedule the deletion…and instead of saying older than 90, you say older than nine…nothing will happen because the arguments do not match the checksum.”

“[In my] experience as an engineer, when you do simple solutions, very low tech, they are alive for a long time.”

She further expands on how the team of volunteer software engineers sanitizes data and has successfully built a privacy culture using the Lean Data Diet methodology in her full talk from Privacy_Infra(), which you can watch below.

“Privacy is itself a feature. It is not something that we do, but it is part of our product offering.”

Watch the full talk here.

Note: This post reflects information and opinions shared by speakers at Transcend’s ongoing privacy_infra() event series, which feature industry-wide tech talks highlighting new thinking in data privacy engineering every other month.

If you’re working on solving universal privacy challenges and interested in speaking about it, submit a proposal to speak at an upcoming event here

Share this article