By David Mattia
February 17, 2021•5 min read
Logs are inherently risky: small mistakes can lead to sensitive data appearing in plaintext in places they shouldn’t. No organization can eliminate the possibility that secrets will be logged, but they can better manage the risk by adding safeguards into their systems.
This post details one approach we took at Transcend to help attack this problem, by declaring fields as sensitive or not in our type system.
We found that many avenues for making mistakes with handling secrets have disappeared, and our confidence in the safety of our logs has gone up considerably.
Building in protections like these can add extra work for developers, but they can be powerful tools for increasing developer confidence that when they make mistakes the damage can be minimized.
In the past few years, we’ve seen Twitter log unhashed user passwords, Facebook logged tens of millions of unhashed user passwords, Google logged unhashed GSuite user passwords, Ubuntu’s server installer logged passwords, and many other cases of similar incidents occurred. These incidents are extremely damaging to user trust and can require extensive cleanup to prevent identity theft.
Despite the severity, it seems that whenever these incidents occur, online forums are filled with comments from developers who couldn’t possibly let this sort of mistake happen in their systems, and that Twitter, Facebook, Google, Ubuntu, and others must be worse at architecting systems than they are.
This could be true in some instances, but brushing off these cases as elementary feels disingenuous to the brilliant DevOps work each of these companies has contributed to the open source community.
If we can accept that developers occasionally make mistakes while also accepting that these breaches are unacceptable, we can conclude that safeguards that would protect against similar mistakes while simultaneously minimizing the damage would be beneficial.
The above diagram shows how many production systems manage logs at a high level, across a variety of stacks and languages.
There is some application code that contains log statements, and those logs are picked up and enhanced by a daemon service, which then forwards the logs to a log collection service, that allows you to interact with those logs.
Throughout this pipeline, redaction can happen at a number of stages:
All of these levels offer a filtering system where logs that make it to the collection services shouldn’t contain any sensitive data, but holes are still possible in each step:
As you move to the right in the diagram, getting closer to the log collection services, the filtering becomes more of a last resort approach. They are still good to have, but ultimately, it’s as you move to the left towards application code where the most effective best practices live. Your application code is where you likely have the most robust code reviews and can most easily enforce best practices like only logging the minimum number of fields.
By carefully picking what fields we want to log in structured log formats, we can have quite strong confidence that we won’t log sensitive data.
After all, the easiest way to ensure your fluentd regex that redacts social security numbers doesn’t miss a value is to just not log social security numbers!
But we’d be foolish to think that Google, Facebook, Twitter, and others don’t use structured logs with typed fields. There’s still one problem left: many structured logs have string as some field types. And strings can contain sensitive data.
This last step, and the one the rest of the article will focus on, is adding secret metadata to our typing system so that we can try to prevent these freeform string fields from presenting any data we don’t explicitly mention should be presented.
A quick note: In this blog, we’ll be focusing on Typescript and our open-source @transcend-io/secret-value library available on npm/Github Packages, but the concepts can be used across a variety of languages.
In order to help ensure that sensitive values are not included in log statements, we’ve created a new type Secret that prevents its values from being added to logs. The easiest way to show how is with this example:
import { Secret } from `@transcend-io/secret-value`; const secret = new Secret(`some secret value`) console.log(secret);
We wrap the string some secret value into a Secret<string> type that will appear as [redacted] whether you console.log it, JSON.stringify it, secret.valueOf() it, interpolate it into a string, or do just about anything else to it you could imagine in JavaScript.
If you want to modify the value of the secret without unwrapping it, you can use the map function common to many functional wrappers like const secretLenth = secret.map(rawValue => rawValue.length).
When you want to use the value stored inside the secret, you can use secret.release() to get the value some secret value back out.
The API is minimal, but it’s meant to be this way. This wrapper provides a few benefits:
This paradigm can also be extended in a couple of different ways:
At Transcend, our business is protecting users’ personal data. This requires careful thought and care to go into our application code, infrastructure, and end-to-end encryption pipelines. Keeping sensitive information out of logs is just one part of this process, but using Secret<T> has made this one part much easier.
By David Mattia