Frequently asked questions: Data Lineage in Data Governance
Q1: What is data lineage in data governance?
Data lineage is the visual representation and documentation of a data asset's entire lifecycle. It maps out the data's journey over time, showing exactly where it originated (the source), how it was modified or transformed across different systems, and where it ultimately flows to be consumed (such as a dashboard or report).
Q2: Why is data lineage important for data governance?
If data governance provides the laws and policies for your data, data lineage provides the visibility needed to enforce them. It is a critical component of governance because it allows organizations to:
- Verify Data Trustworthiness: Users can trace a number on an executive dashboard all the way back to its raw source to confirm its accuracy.
- Simplify Root Cause Analysis: If a financial report is suddenly broken, data lineage allows IT teams to look upstream and pinpoint exactly which database or transformation script caused the error.
- Conduct Impact Analysis: Before a developer changes a column name in a CRM database, they can check downstream lineage to see which marketing reports or APIs will break as a result.
Q3: What is the difference between business lineage and technical lineage?
A comprehensive data governance platform typically tracks two types of data lineage to serve different audiences:
- Business Lineage: A high-level, simplified view designed for business users. It shows how data flows across major concepts and systems (e.g., Sales CRM → Financial Ledger → Quarterly Revenue Report) without complex code.
- Technical Lineage: A deep-dive, granular map designed for engineers and DBA teams. It tracks column-level transformations, specific SQL queries, ETL jobs, and API integrations that alter the data along the way.
Q4: How does data lineage assist with regulatory compliance (GDPR, CCPA)?
Data privacy regulations give consumers the "right to be forgotten" and require companies to know exactly where personal information is stored. If a customer requests that their data be deleted, data lineage acts as a map, showing compliance officers every database, data lake, and backup server where that specific customer's data has traveled. It also provides an audit trail to prove to regulators that data handling conforms to compliance standards.
Q5: How do organizations automate data lineage tracking?
Manually drawing data maps in Excel or Visio is impossible in modern enterprise environments because systems change too fast. Instead, organizations use modern data governance and data catalog tools (such as Collibra, Alation, or Informatica) that use metadata scanners. These tools automatically connect to your tech stack, read the system logs and SQL scripts, and dynamically generate an interactive, living lineage map.
See lineage that updates itself