Open Sourcing Penumbra: A Leap Towards Transparent Data Privacy

July 23, 20203 min read

Share this article

We aren’t interested in seeing user data. As the company that powers data rights requests between businesses and their end users, we facilitate thousands of personal data transfers. When building our data privacy infrastructure, we realized it would be painfully ironic if we had the ability to look at user data while transferring it to its owners. For us, end-to-end encryption (E2EE) of user data is a nonnegotiable requirement (see endnote). But between browser support, large data exports, endpoint hardware limitations, and more, we were led to a core engineering problem: how can we decrypt gigabytes of personal data on every end user’s device?

Our solution was to build Penumbra—a way of implementing end-to-end encryption in the browser on files that may not fit into the memory of the consumer’s machine. Today, we’d like to introduce you to Penumbra’s open-source library and explain why we made these technologies readily available for anyone seeking to build privacy-respecting solutions.

Penumbra takes advantage of a new writable stream feature in Chrome and Edge in order to decrypt files in chunks, streaming the decrypted copy directly to disk without needing to buffer the entire file into memory. There is virtually no cap on file size, and Penumbra works with web workers to prevent the main thread from blocking during file decryption. Immediate use cases include end-to-end encryption of video and audio streams, secure broadcasting of information, and trustless file systems in the browser.

As engineers know, there’s a gigantic chasm between a partner with end-to-end encryption and a partner that uses encryption in transit and at rest. Unfortunately, the latter partner sees all user data, which means they can also lose it to hackers.

Why Penumbra

First, it’s important to understand that whether we’re exporting your videos or your genome, Transcend might be working with gigabytes of personal data. Decrypting small pieces of data on your browser is easy with the existing Web Crypto API.

window.crypto.subtle.decrypt(
  {
    name: "AES-GCM",
    iv: iv
  },
  key,
  ciphertext
);

This is great if you only need to decrypt text snippets that are a few kilobytes (like in an E2EE chat app). There’s a limit to the sizes, since the ciphertext parameter must be a buffer. You must first fetch that file and buffer it fully into memory before passing it into the crypto.subtle.decrypt function. That means you have a (theoretical) maximum file size limit of about 512MB%20on%2064-bit.).

In practice, the maximum file size is even lower. Say we want to decrypt a 50MB file. When you pass your 50MB buffer into the decryption function, it will block the main thread—so, the event listeners will stop firing and the end user’s browser will freeze.

At Transcend, not only do we regularly encounter files far larger than the memory limits of your computer, but we also process hundreds of them in one user export. It’s important to us that we don’t freeze the UI of an end user’s Privacy Center while downloading and decrypting their files.

To solve the max file size limitation, we built a stream pipeline that would fetch and decrypt all files in parallel, zip them, and then download them to disk without ever buffering an entire file into memory.

To solve the UI breakage problem, we needed to do all of that heavy data processing on a separate background thread. Taken together, this ensures our team can safely fulfill data rights requests and provide a great user experience on behalf of our customers, and our customers’ customers.

Note how even in this Penumbra demo, with multiple files processing in parallel, the browser does not freeze (live event updates). Earth image is 80MB.

Open-sourcing Penumbra

Today, there are still several areas for potential community contribution to the open-sourcing of Penumbra.

  • Improve browser support by adding fallbacks when streams are not supported.
  • Improve the communication channel between the main thread and the web worker to simplify Penumbra’s API for developers.
  • Support more encryption algorithms (Penumbra currently uses AES-256 in the GCM mode of operation).
  • Upgrade to the fastest underlying crypto library on an ongoing basis (there are some promising developments in native and WebAssembly implementations).

In addition, we would love to see Safari and Firefox introduce TransformStreams, and believe the Web Crypto spec should support streaming (and we’re not alone, according to GitHub).

We decided to open-source Penumbra because we believe it can lower the bar of entry for developers who also want to build trustless platforms. Implementations could include a better cloud storage platform (a storage company shouldn’t need access to your data, only the person uploading it and the person receiving it) or a more secure video chat platform. Penumbra could also be used for media or human rights platforms that need secure ways for civil activists to share sensitive information between groups.

Overall, we believe more end-to-end encryption across industries creates a more secure online world. Check out Penumbra on GitHub and let us know what you think.


Endnote: In addition to E2EE, there are two other security properties we chose to encode into Transcend’s data privacy infrastructure architecture: we would not hold the keys to our customers’ data systems and Transcend would not have the authority to create arbitrary new data requests (we can’t just erase any—or every—user). As a team of security engineers, we knew that building data privacy infrastructure in any other way would just be plain irresponsible. But we’ll save those details for a future post.


Share this article