From a leaked internal document at Facebook, we see the clear struggle with governing personal data:
We do not have an adequate level of control and explainability over how our systems use data, and thus we can’t confidently make controlled policy changes or external commitments such as “we will not use X data for Y purpose.” And yet, this is exactly what regulators expect us to do, increasing our risk of mistakes and misrepresentation.
As Facebook continues with years of legal proceedings stemming from their use of personal data, it becomes increasingly important for all companies to be able to accurately identify and govern all personal data in their systems.
At Transcend, we’ve recently open sourced two infrastructure as code tools that make both identification and governance of personal data in your engineering systems much easier.
About the new tools
The Transcend Terraform Provider
We recently announced the release of our official Terraform Provider. This provider lets you declaratively create and update:
Data silos (integrations with third parties like Stripe/Datadog/Salesforce or internal databases),
Datapoints (classifications of a set of personal data that exists under some data silo),
API keys (that can be scoped to individual data silos if needed),
Enrichers (a concept for connecting user identifiers like phone numbers, user IDs, and email addresses)
And more!
One of the great things about the Terraform provider is that it allows you to integrate Transcend alongside any other tools that have Terraform providers.
Here’s an example of a snippet that creates an IAM Role in an AWS account—giving Transcend access to scan an account for what personal data it might contain and then using the Transcend provider to create and connect an AWS data silo.
1resource "transcend_data_silo" "aws" {2 type = "amazonWebServices"3 description = "Amazon Web Services (AWS) provides information technology infrastructure services to businesses in the form of web services."45 # Normally, Data Silos are connected in this resource. But for AWS, we want to delay connecting until after6 # we create the IAM Role, which must use the `aws_external_id` output from this resource. So instead, we set7 # `skip_connecting` to `true` here and use a `transcend_data_silo_connection` resource below8 skip_connecting = true9 lifecycle { ignore_changes = [plaintext_context] }10}1112resource "aws_iam_role" "iam_role" {13 name = "TranscendAWSIntegrationRole2"14 description = "Policy to allow Transcend access to this AWS Account"1516 assume_role_policy = jsonencode({17 Version = "2012-10-17"18 Statement = [19 {20 Action = "sts:AssumeRole"21 Effect = "Allow"22 // 829095311197 is the AWS Organization for Transcend that will try to assume role into your organization23 Principal = { AWS = "arn:aws:iam::829095311197:root" }24 Condition = { StringEquals = { "sts:ExternalId" : transcend_data_silo.aws.aws_external_id } }25 },26 ]27 })2829 inline_policy {30 name = "TranscendPermissions"31 policy = jsonencode({32 Version = "2012-10-17"33 Statement = [34 {35 Action = [36 "dynamodb:ListTables",37 "dynamodb:DescribeTable",38 "rds:DescribeDBInstances",39 "s3:ListAllMyBuckets"40 ]41 Effect = "Allow"42 Resource = "*"43 },44 ]45 })46 }47}4849# Give AWS Time to become consistent with the new IAM Role permissions50resource "time_sleep" "pause" {51 depends_on = [aws_iam_role.iam_role]52 create_duration = "10s"53}5455data "aws_caller_identity" "current" {}56resource "transcend_data_silo_connection" "connection" {57 data_silo_id = transcend_data_silo.aws.id5859 plaintext_context {60 name = "role"61 value = aws_iam_role.iam_role.name62 }6364 plaintext_context {65 name = "accountId"66 value = data.aws_caller_identity.current.account_id67 }6869 depends_on = [time_sleep.pause]70}
This can enable all sorts of cool integrations like connecting Transcend and Datadog to remove logs relating to a particular user, securely connecting with clouds like AWS, Google Cloud, or Azure, or creating a database and then immediately connecting it to Transcend.
The CLI tool
Our second infrastructure as code tool is the `@transcend-io/cli` NPM package that can be used as a standalone binary.
Similar to the Terraform provider, the Command Line Interface (CLI) is an infrastructure as code tool aimed at making discovering and governing data easier. The schema is very similar to the Terraform API, but has three major reasons why you might want to use it:
If your organization does not have established practices around Terraform and how to deploy it on CI, the CLI provides a lower barrier of entry.
If you are planning on auto-generating the config to upload to Transcend (which we’ll show an example of later), then it may be more natural to output YAML (which the CLI injests) than HCL from Terraform.
The CLI comes with options for generating configuration from your existing Transcend account that are not present in the Terraform provider. The Terraform provider would require using `terraform import` to bring already existing infrastructure into your code.
You are also welcome to use both tools in conjunction with one another. We have seen success when using Terraform to manually configure systems and to securely specify API credentials while using the CLI to upload auto-generated datapoint schemas. But feel free to use whichever tool or combination fits your business needs best.
Data Mapping with Transcend
You can’t manage what you can’t see, so step one in setting up a privacy program is often determining where personal data lives inside your systems. If, like at most companies, the personal data you collect changes over time, data mapping is not an exercise you can complete once and then forget about.
Robust data mapping systems must enable you to retroactively find personal data in your legacy and existing systems, proactively manage new sources of personal data as you add them, and continuously scan for personal data to prevent missing personal data that may be added in the future.
Retroactively finding and classifying personal data
At Transcend, we understand that not every software project starts out designing for privacy. Companies that have been around for a while likely have systems containing personal data that predate the current laws and regulations around how personal data must be handled. And as new laws come into effect in the future, those systems may need updates again.
It’s often infeasible to ask a company to go back and hand label where all of their personal data lives.
Did the person who created a central system leave your company and nobody is quite sure how “that server over there” works or what data it contains?
Do you have thousands of databases with millions of tables and tens of engineers?
Is there a disconnect between the people you want to be responsible for labeling data (such as legal) and the people who know how to find that data (such as engineers)?
Enter our Data Silo Plugins. These come in a variety of forms to help you sort through your old systems.
Silo Discovery Plugins
We have Silo Discovery Plugins that can find and suggest data silos your org uses. Examples include scanning an AWS account for databases used, your SSO tool such as Okta for all applications your employees can access, or Salesforce for where you might keep personal data on prospects and leads.

Using our Terraform provider, adding a data silo plugin is as easy as defining when you want the scans to start and how often you want them to occur going forward.
1resource "transcend_data_silo" "aws" {2 type = "amazonWebServices"3 description = "Amazon Web Services (AWS) provides information technology infrastructure services to businesses in the form of web services."45 plugin_configuration {6 enabled = true7 type = "DATA_SILO_DISCOVERY"8 schedule_frequency_minutes = 1440 # 1 day9 schedule_start_at = "2022-09-06T17:51:13.000Z"10 schedule_now = false11 }1213 # ...other fields...14}1516# ...other resources...
In this example, we set up an AWS data silo to scan for databases, S3 buckets, and other resources that often contain personal data.
Silo Discovery via dependency files
Another way to discover data silos to connect is by scanning your codebase for external SDKs. Transcend can then map those SDKs to data silos and suggest them to you to add. Currently we support scanning for new data silos in Javascript, Python, Gradle, and CocoaPods projects.
To get started, you'll need to add a data silo for the corresponding project type with the Silo Discovery Plugin enabled. For example, if you want to scan a JavaScript project, add a “JavaScript package.json” data silo. You can do this in the Transcend admin-dashboard (or via this CLI tooling or Terraform).
Then, you'll need to grab that dataSiloId and a Transcend API key and pass it to the CLI. Using JavaScript package.json as an example:
1# Scan a javascript project (package.json files) to look for new data silos23yarn tr-discover-silos --scanPath=./myJavascriptProject --auth={{api_key}} --dataSiloId={{dataSiloId}}4
This call will look for all the package.json files that are in the scan path ./myJavascriptProject, parse each of the dependencies into their individual package names, and send it to our Transcend backend for classification.
These classifications can then be viewed in the data silo recommendations triage tab, just like where you’d look with other Silo Discovery mechanisms. The process is the same for scanning requirements.txt, podfiles and build.gradle files.
Datapoint Discovery Plugins
We also have Datapoint Discovery Plugins that can go into your data stores and extract your schemas. This supports databases like BigQuery, MongoDB, DynamoDB, Snowflake, PostgreSQL, MySQL, Redshift, and many more, while also supporting data stores such as Google Forms, Amazon S3, and Salesforce.

Adding a Datapoint Discovery Plugin is very similar to adding a Silo Plugin, just using a type of `DATA_POINT_DISCOVERY` instead:
1resource "transcend_data_silo" "aws" {2 type = "amazonS3"34 plugin_configuration {5 enabled = true6 type = "DATA_POINT_DISCOVERY"7 schedule_frequency_minutes = 1440 # 1 day8 schedule_start_at = "2022-09-06T17:51:13.000Z"9 schedule_now = false10 }1112 # ...other fields...13}1415# ...other resources...
In this example, we set up AWS to scan for personal data in Amazon S3.
Datapoint Classification Plugins
Lastly, we support Datapoint Classification Plugins that can sample the data in your datastores. This is especially powerful when combined with the Datapoint Discovery Plugins that find the schemas of your internal systems.

The results of the classification of datapoints in a Redshift database
In the above example, a Redshift database is being scanned. Each column under each table is sampled and our classifier attempts to classify the data category that each column belongs to. Each classification comes with a confidence rating to make triaging the findings easier.
Classification with complete security
One great part of this classification process is the security model. By using our end-to-end encryption gateway, Sombra, Transcend never needs to see the sample data in any of your systems. Likewise, Transcend never needs to have direct access to your databases nor any means of connecting to them even if we did have access.
All communication from Transcend to your database or other internal systems happens through Sombra, and all data that flows from Sombra back to Transcend will not contain any personal data that we would have access to. If personal data is ever returned, it is encrypted by the encryption gateway with keys from your Key Management System, which Transcend does not have access to.
Here’s a complete example, using Terraform, of setting up a PostgreSQL database using Amazon RDS in a private subnet of a VPC and connecting it to Transcend:
1locals {2 subdomain = "https-test"3 # You should pick a hosted zone that is in your AWS Account4 parent_domain = "sombra.dev.trancsend.com"5 # Org URI found on https://app.transcend.io/infrastructure/sombra6 organization_uri = "wizard"7}89######################################################################################10# Create a private network to put our database in with the sombra encryption gateway #11######################################################################################1213module "vpc" {14 source = "terraform-aws-modules/vpc/aws"15 version = "~> 2.18.0"1617 name = "sombra-example-https-test-vpc"18 cidr = "10.0.0.0/16"19 azs = ["us-east-1a", "us-east-1b"]2021 private_subnets = ["10.0.101.0/24", "10.0.102.0/24"]22 public_subnets = ["10.0.201.0/24", "10.0.202.0/24"]23 database_subnets = ["10.0.103.0/24", "10.0.104.0/24"]2425 enable_nat_gateway = true26 enable_dns_hostnames = true27 enable_dns_support = true28 create_database_subnet_group = true29 create_database_subnet_route_table = true30}3132#######################################################################33# Deploy a Sombra encryption gateway and register it to a domain name #34#######################################################################3536data "aws_route53_zone" "this" {37 name = local.parent_domain38}3940module "acm" {41 source = "terraform-aws-modules/acm/aws"42 version = "~> 2.0"43 zone_id = data.aws_route53_zone.this.id44 domain_name = "${local.subdomain}.${local.parent_domain}"45}4647variable "tls_cert" {}48variable "tls_key" {}49variable "jwt_ecdsa_key" {}50variable "internal_key_hash" {}51module "sombra" {52 source = "transcend-io/sombra/aws"53 version = "1.4.1"5455 # General Settings56 deploy_env = "example"57 project_id = "example-https"58 organization_uri = local.organization_uri5960 # This should not be done in production, but allows testing the external endpoints during development61 transcend_backend_ips = ["0.0.0.0/0"]6263 # VPC settings64 vpc_id = module.vpc.vpc_id65 public_subnet_ids = module.vpc.public_subnets66 private_subnet_ids = module.vpc.private_subnets67 private_subnets_cidr_blocks = module.vpc.private_subnets_cidr_blocks68 aws_region = "us-east-1"69 use_private_load_balancer = false7071 # DNS Settings72 subdomain = local.subdomain73 root_domain = local.parent_domain74 zone_id = data.aws_route53_zone.this.id75 certificate_arn = module.acm.this_acm_certificate_arn7677 # App settings78 data_subject_auth_methods = ["transcend", "session"]79 employee_auth_methods = ["transcend", "session"]8081 # HTTPS Configuration82 desired_count = 183 tls_config = {84 passphrase = "unsecurePasswordAsAnExample"85 cert = var.tls_cert86 key = var.tls_key87 }88 transcend_backend_url = "https://api.dev.trancsend.com:443"8990 # The root secrets that you should generate yourself and keep secret91 # See https://docs.transcend.io/docs/security/end-to-end-encryption/deploying-sombra#6.-cycle-your-keys for information on how to generate these values92 jwt_ecdsa_key = var.jwt_ecdsa_key93 internal_key_hash = var.internal_key_hash9495 tags = {}96}9798######################################################################99# Create a security group that allows Sombra to talk to the database #100######################################################################101102module "security_group" {103 source = "terraform-aws-modules/security-group/aws"104 version = "~> 4.0"105106 name = "database-ingress"107 vpc_id = module.vpc.vpc_id108109 # ingress110 ingress_with_cidr_blocks = [111 {112 from_port = 5432113 to_port = 5432114 protocol = "tcp"115 description = "PostgreSQL access from private subnets within VPC (which includes sombra)"116 cidr_blocks = join(",", module.vpc.private_subnets_cidr_blocks)117 },118 ]119}120121###################################################122# Create a sample postgres database using AWS RDS #123###################################################124125module "postgresDb" {126 source = "terraform-aws-modules/rds/aws"127 version = "~> 5.0"128129 allocated_storage = 5130 engine = "postgres"131 engine_version = "11.14"132 family = "postgres11"133 major_engine_version = "11"134 instance_class = "db.t3.micro"135136 multi_az = true137 db_subnet_group_name = module.vpc.database_subnet_group138 vpc_security_group_ids = [module.security_group.security_group_id]139 skip_final_snapshot = true140 deletion_protection = false141 apply_immediately = true142143 identifier = "some-postgres-db"144 username = "someUsername"145 db_name = "somePostgresDb"146}147148#######################################################149# As Sombra can talk to the database, we can create a #150# data silo using the private connection information. #151#######################################################152153resource "transcend_data_silo" "database" {154 type = "database"155156 plugin_configuration {157 enabled = true158 type = "DATA_POINT_DISCOVERY"159 schedule_frequency_minutes = 1440 # 1 day160 schedule_start_at = "2022-09-06T17:51:13.000Z"161 schedule_now = false162 }163164 secret_context {165 name = "driver"166 value = "PostgreSQL Unicode"167 }168 secret_context {169 name = "connectionString"170 value = join(";", [171 "Server=${module.postgresDb.db_instance_address}",172 "Database=${module.postgresDb.db_instance_name}",173 "UID=${module.postgresDb.db_instance_username}",174 "PWD=${module.postgresDb.db_instance_password}",175 "Port=${module.postgresDb.db_instance_port}"176 ])177 }178}
Notice that the database has a security group setup such that it can only be talked to from within the Virtual Private Cloud. Also, note that the Sombra encryption gateway is given permissions to talk to the database, but not any external Transcend system.
Proactively managing new sources of personal data
Finding user data in existing systems is cool. But do ya know what’s even cooler? Proactively labeling your data classifications and purposes as you add new features and syncing that data to Transcend. You can eliminate the need to triage our classifications by just telling us what the classifications are.
This can be done with both Terraform and the CLI, but this is where the CLI really shines. Our customers like Clubhouse have even created database client libraries where they can encode privacy information directly into their schema definitions. During their deploys, they extract this data into a YAML file that the CLI syncs to Transcend.

Here’s an example of a change in our codebase where we use an extension of Sequelize to define some fields on an email-related model. As the `from` and `to` fields of an email may contain personal email addresses, we labeled this data directly in our schema.
During a deploy, we extract the metadata about each database model and create a `transcend.yml` file containing a data point declared like:
1- title: Communication2 key: dsr-models.communication3 description: A communication message sent between the organization and data subject4 fields:5 ...other fields ...6 - key: from7 title: from8 categories:9 - name: CONTACT10 category: CONTACT11 - key: to12 title: to13 categories:14 - name: CONTACT15 category: CONTACT
The CI job then syncs this data to Transcend, where there will be a data silo for the database with datapoints for the `from` and `to` columns listed as contact information.
Continuously scanning for personal data sources

In the “Retroactively Finding and Classifying Personal Data” section above, we showed how Transcend makes it easy to find and classify personal data in your existing systems. But what about going into the future?
What about when your Go To Market team adds a new tracking tool “just to test it out” and forgets to let your security owners know?
Or what about if an engineering project slips through the cracks and doesn’t proactively note all personal data it stores?
Or what if your ideal process is that engineering builds the tooling and that legal uses the scanning/classification tooling to label the data before the project fully launches (as opposed to engineering labeling the data as described in the previous section)?
Because our scanners and classifiers can run on a schedule, it’s easy to stay continuously compliant. Let the Okta plugin discover that new tracking tool. Let the AWS plugin notify you when a new database is created. Let a database plugin scan the databases for any new personal data that might appear.
Labeling data shouldn’t be a once per year affair, and it definitely shouldn’t be a “I’ll do it when we’re getting audited for privacy violations” affair. With Transcend, you can rest easy knowing that your data map will always be up to date.
About Transcend
Transcend is the company that makes it easy to encode privacy across your entire tech stack. Our mission is to make it simple for companies to give users control of their data.
Automate data subject request workflows with Privacy Requests, ensure nothing is tracked without user consent using Transcend Consent, or discover data silos and auto-generate reports with Data Mapping.