At a glance
- If you’ve landed on this page, you might be looking to answer the question: What is data mapping? Lucky you-this guide is here to help!
- Data mapping is the process of matching data fields in one database to corresponding data fields in another—helping the two databases communicate and share data more effectively.
- Data mapping has many applications, but is commonly used for data integration, transformation, or migration. It’s also a foundational part of complying with modern privacy laws such as the General Data Protection Regulation (GDPR) and California Consumer Protection Act (CCPA).
- You can use a template to manually create a data map (we’ll cover those steps, plus best practices below). However, if you’re dealing with large data sets or multiple systems, using an automated tool is recommended.
Table of contents
- Data mapping definition
- What’s the purpose of data mapping?
- How to create a data mapping template
- Data mapping techniques
- How to manually create a data map
- Data mapping best practices
Data Mapping Definition
Data mapping is the process of matching data fields in one database to corresponding data fields in another—helping the two databases communicate and share data more effectively.
Imagine a customer profile for Jane Elliot appears in two different databases, but the data analyst wants her profile counted only once. With data mapping, the analyst can create a connection between the two Jane Elliot’s, ensuring she doesn’t get counted twice in an analysis or query.
Though this example of data mapping is fairly simple, it is reflective of data mapping’s basic purpose: building connections to improve data quality, standardize data across different systems, and support better analysis down the line.
In 2020, mid-size companies were using an average of 288 different software-as-a-service (SaaS) apps, according to a 2020 SaaS Trends report.
That’s why for most large-scale applications, using an automated data mapping tool is recommended.
It is theoretically possible to use a data mapping template or other manual means to create a full data map, but that exercise would get complicated quickly.
Imagine trying to constantly translate between 288 different languages, with thousands of people speaking all at once (!!!), and you don’t have any tools or automatic processes backing you up. Overwhelming at best, impossible at worst.
That’s why employing a good data mapping software, which enables a crucial level of process automation, is recommended in most cases. All of that said, your purpose or end goal will ultimately define what data mapping tools you’ll actually need.
What’s the purpose of data mapping?
The overarching goal of data mapping is to bring more structure and cohesion to your company’s data.
In a more focused context, such as data privacy, data mapping is foundational to complying with modern privacy laws like the General Data Protection Regulation (GDPR) and the California Consumer Protection Regulation (CCPA).
By creating and maintaining a unified data inventory, including the flow and structure between different databases, systems, and SaaS tools, a company can:
Track down data for data subject access requests (DSAR) i.e. when a consumer asks to see all the data a company has collected about them.
Identify risky data processing activities, which under GDPR Article 35 requires completing a data protection impact assessment.
Create and maintain records of processing activities, as required by GDPR Article 30.
However, data mapping does have a wide range of applications outside of data privacy compliance. Remember, data mapping helps different data systems communicate, passing data back and forth in a shared language created through the data mapping process.
In practical terms, this process of built understanding can take many forms; however, the most common applications for data mapping include data transformation, migration, and integration.
Data transformation is the process of translating data between different formats, usually from an unstructured data type (like text or media files) into a more usable format (like CSV). Often referred to as extract/transform/load (ETL), data transformation often involves some degree of data cleaning, validation, or enrichment.
Data migration, moving data from one location to another, is simple in principle, but often complex in practice. The complexity stems from the fact that moving data between locations often involves a change in data format. Examples of data migration include moving from an on-premise data center to the cloud, moving data between hardware systems, combining data systems, and migrating data to a new software or SaaS tool.
Data integration is the act of unifying different data sources into one central location. This process usually involves some degree of data transformation, as creating a unified view of data often requires data cleaning, duplicate removal, and/or putting all data into a cohesive, shared format. Similar to data transformation, one of the main goals of data integration is to enable sound analysis for informed decision making.
Keep in mind that, though these data mapping use cases are defined individually, they have a lot of overlap when actually applied. Effective data migration requires some level of data integration and, as mentioned, effective data integration often requires some degree of data transformation.
Remember, data mapping is about building effective communication channels between different data sets and communication can take many forms, serving many purposes.
How to create a data mapping template
Reading about data mapping isn’t necessarily the best way to understand the concept. That’s why we included a few things to consider when creating your own data mapping template.
While you’re creating your template, keep in mind its potential long-term use and document accordingly. Include data management policies—guidelines for tracking data lifecycle across ingestion, transformation, analysis, etc— within your template, as this will help both you and your team maintain a healthy data map in the future.
This in mind, your data mapping template should include:
- Name of the source database (Where the data is coming from)
- Name of the target database (Where the data is going)
- Which columns or values you’re mapping
- The intended format for the data post-transformation
- Triggers for the data integration or transfer
- Documentation on any automation (when and how it will run, intended outcome, possible failure points)
For data mapping at scale, using an automated tool or platform is still recommended. However, creating your own template can be a great starting point if you’re looking to get a better grasp of the concept, create a data management framework for your team, or map smaller quantities of data.
Data mapping techniques
Manual data mapping
- The good: Fully customizable
- The bad: Time and resource intensive, relies entirely on code, often requires advanced skills and specialized knowledge
Manual data mapping means using code (and a talented developer) to connect the data fields between different sources. The process often involves using ETL functions or other coding languages like C++, SQL, or Java.
Though this approach gives the data mapper ultimate control over the process and final product, it can be quite unwieldy given the quantity of data and data systems used by most businesses. Manual data mapping is really only a good option for smaller databases or one-time processes.
Semi-automated data mapping
- The good: Balances scalability, efficiency, and flexibility
- The bad: Manual parts of the process can be time and resource intensive, requires coding knowledge and an understanding of how to navigate between the automated and manual components
Sometimes called schema mapping, semi-automated data mapping blends automation processes with manual intervention. In this data mapping approach, a developer will use software to define relationships between data fields that are similar, but not the same—eventually creating schemas.
For example, they might match ‘Social’ with ‘SSN,’ or ‘FirstName’ with ‘FName’ and ‘first_name.’ Depending on the tool they’re using, there are various methods to complete this process including drag-and-drop, drawing lines, or smart clustering.
After defining the schemas, a script or other code is run to complete the actual data conversion. This portion of the process usually requires coding knowledge, as the script is generally built with C#, Java, or C++.
Automated data mapping
- The good: Faster outcomes, least coding knowledge required
- The bad: Some automated tools can be costly and often require platform-specific training
Automated data mapping removes the need for code, allowing more people to engage with and update a company’s data map. Depending on the solution, automated data mapping allows for drag-and-drop mapping, pre-built data transformation, and even representations of data flow.
Because automated platforms don’t require coding knowledge, your no-code team members can manipulate the data as necessary and steward their own analysis without making constant requests to your data analysts.
Though these tools have their own challenges, the time saved combined with the lower barrier to entry for non-technical folks make it a worthwhile investment for many companies.
How to manually create a data map
Though the process itself is more involved in practice, there are five basic data mapping steps.
1. Determine which data fields to include in your map
What you’re trying to achieve with your data map will be the best guide for deciding what data fields need to be included. Questions you might ask include:
- What data needs to be combined?
- How many data sources are there?
- How often will this process need to be repeated?
- What is your target location?
- What format(s) need to be accounted for?
2. Determine standard naming conventions
Once you’ve determined which data fields you plan to map, identify and document the data format for each. Then, determine the target data format. For instance, if you’re integrating a list of clients and your source data has a ‘First Last’ format, but your target database takes a ‘Last, First’ format—you’ll need to identify this upfront so as to set conditions for the data’s final format.
3. Define schema logic or transformation rules
How this step plays out relies almost entirely on your data mapping approach. If you’re mapping data manually, these rules will be created by the developer writing the code. For semi-automated, you’ll be defining the schema and data connections and a developer might create code for any transformations. With automated platforms, the software will do much of the work for you.
4. Test the logic on a small sample
Before you go all in, it’s a good idea to test your logic on a smaller data set to ensure everything is working the way you intended. Errors in your logic or transformation rules could create errors or other issues in the final data map, or the dataset itself. Especially if you’re dealing with a particularly large dataset, it never hurts to err on the side of caution—measure twice, cut once isn’t just for DIY.
5. It’s go time—complete your data map
Once you’ve defined your data field, determined naming conventions, defined your logic or rules, and then checked for bugs, you’re ready to go. Migrate, integrate, or transform your data to your heart’s delight.
Data mapping best practices
Though automation can do most of the heavy lifting, even someone using the best data mapping software available, should consider evangelizing data mapping best practices throughout their organization. Working towards standard operating procedures and strict data hygiene will make your day-to-day work easier and benefit your organization in the long term.
Document, document, document
Though it can feel tedious to document your tools and procedures, it will pay significant dividends in the long run. Small changes in a tool’s configuration, how a data field is named, or an updated automation schedule can cause big issues if everyone isn’t on the same page. Document your tools and processes and make sure to regularly socialize any changes with relevant teams.
Standardize naming conventions
Number 2 has some overlap with Number 1, admittedly, but it’s important enough to merit its own callout. Create a readily accessible document outlining a clear approach to naming conventions and remind people of its existence whenever you can. The ideal is that eventually these conventions become rote, but it never hurts to have a universal reference point.
Shore up data security
As high-profile data breaches continue to make headlines, ensuring data security is a best practice for any organization—regardless of the data mapping activities you engage in. That said, data integration in particular—as it pulls data from multiple sources into a single master location and often involves a greater level of access for more people—can create vulnerabilities.
Taking a least privilege approach, limiting access to only those that need it, only to the level they need it, is a best practice no matter the context. For a stronger security stance, the context will define the specifics, but strategies to consider include data encryption, tokenization, or masking.
Conduct regular maintenance
Data mapping, like any machine with many moving parts, requires regularly scheduled maintenance. This maintenance can take the form of debugging, simplifying your automation, or making code tweaks to better suit your business needs.
As a dynamic, ongoing process, data integration in particular can require regular updates to ensure everything runs as it should. Data sources can undergo changes that disrupt the defined mapping paths, so it’s critical that the data administrator monitor the system in order to identify and repair any broken pathways.
Depending on the size of the system in question, this process can become time consuming—quickly overwhelming a single person or even an entire team. This brings us to our final best practice: automation.
Automate what you can
At a very small scale, automating your data mapping may not be necessary. It is possible to complete many data mapping tasks without it. However, as the number of data fields, systems, and databases grow, so does the need for automation.
Past a certain size and level of complexity, data mapping tools like spreadsheets simply won’t work. Beyond pure efficacy, automation provides other significant benefits: eliminating manual errors, saving you or your team hours or even days of work, and as a result, speeding up the process overall.
As we covered in earlier sections though, automation doesn’t need to be an all or nothing exercise. For data mapping, automation occurs on a spectrum, so if you don’t have the resources or need for a fully automated tool right now, explore options that would allow you to automate parts of the process (ideally those that are taking you the most time).
Any level of automation should save you time in the end, so if data mapping is something you do on a regular basis it’s definitely worth exploring.
Transcend can help your organization automate data mapping for privacy law compliance. Use Transcend Data Mapping to discover your company’s data silos, classify personal data, and auto-generate reports – all in an easy-to-use, collaborative platform.
Power your company’s regulatory compliance with actionable data governance suggestions based on your real-time data map. Transcend is the first and only data mapping tool that ensures the systems discovered in your data map are seamlessly included in user deletion, access or modification privacy request workflows.