Data governance is becoming a commonly critical aspect of modern enterprise operations. In fact, 64% of data leaders surveyed say governance is a priority in 2024.
This makes sense. GPTs, LLMs, machine learning, and the adoption of cloud services and cloud computing are becoming table stakes for the average organization (if they aren’t already).
It technically follows that organizations should do more than simply adopt and refine data governance practices; they should look to implement data governance as code (DGaaC).
Because business stakeholders are increasingly relying on real-time, high-quality data in their daily operations (not just in key strategic moments), the stakes related to data management are getting intense.
This means organizational data lifecycles need more than just run-of-the-mill, human policing. With organizational success on the line, entrusting enforcement to machines is becoming the best way forward. When it comes to data governance, every organization needs a RoboCop.
Using data quality as code to automate data quality enforcement
Reporter: Robo, excuse me, Robo—any special message for all the kids watching at home?
Robo: Stay out of trouble.
(RoboCop, 1987)
In theory, governance as code builds on traditional data governance—the mission-critical approach to managing and enforcing data governance policies. In practice, governance as code leverages the principles and practices of software development and engineering, particularly the concept of infrastructure as code (IaC), and applies them to policing exceptional data quality.
This provides data governance as code a sturdy initial framework, one designed to enhance what human professionals can do within an organization as opposed to replacing them outright.
Automation of policies
First, and true to its name, governance as code will leverage code to automate the monitoring and enforcement of data governance policies and compliance. Doing so allows for real-time policy enforcement, which reduces reliance on manual processes.
Additionally, this policy automation enables organizations to maintain compliance more efficiently in increasingly dynamic environments.
Version control and modularity
Influenced by its IaC pedigree, DGaaC will emphasize the use of version control systems to manage necessary changes to governance policies—meaning all modifications will be tracked and that previous versions can be restored (if necessary).
Moreover, modularizing of governance policies makes it easier for teams to make changes to individual components within a data environment without affecting the entire system. This results in environments that are easier to maintain and scale.
Continuous integration and deployment
By leveraging continuous integration and deployment (CI/CD) practices, governance as code also enables rapid and reliable deployment of governance policies themselves. This ensures that policies are consistently applied across all environments, reducing the risk of security breaches and being in states of non-compliance.
Collaboration and agility
Data governance as code will foster collaboration between different teams (e.g., operations, development, security) as it establishes a common framework for defining and enforcing policies.
Such collaborative approaches enhance organizational agility, enabling departmental leaders to quickly pivot and adapt to new governance requirements without hampering developmental velocity.
Declarative policy definition
DGaaC will also employ simple declarative language to define governance policies. In relation to data governance, this is an operational balm—making it easier for various stakeholders to understand and implement these critical policies, as demonstrated through the example of IaC, where key infrastructure is defined to be as human-readable as possible.
Together, this foundation's net output results in prime directives of sorts, not unlike those designed to guide RoboCop in his attempts to improve the quality of the film’s fictional take on Detroit.
3 Principles of data governance as code
While they may not read quite as punchy as the film’s 1. Serve the public trust, 2. Protect the innocent, 3. Uphold the law, they can prove as instrumental for data teams charged with delivering quality across the data lifecycle (especially if voiced-over by actor Peter Weller):
1. Ensure data integrity
DGaaC will work to ensure that organizational data remains accurate, reliable, and consistent throughout the organization.
This principle emphasizes the importance of automated validation, transformation, and monitoring processes to preserve data integrity.
Any code that implements data governance must prioritize the maintenance of data quality and integrity above all else.
2. Safeguard data privacy and security
It will protect the innocent and the organization’s reputation by prioritizing the protection of sensitive and personal data.
This principle enforces strict access controls, encryption, and anonymization practices within the governance code, ensuring that data is securely handled and that privacy regulations (such as GDPR or CCPA) are upheld.
3. Enforce compliance
DGaaC will uphold the law by ensuring all data governance policies and regulatory requirements are strictly enforced through code.
Ultimately, any governance through code must be capable of automatically enforcing data compliance rules, detecting violations, and ensuring that data handling practices align with both internal policies and external regulatory requirements.
Together, these principles provide a clear, directive-driven framework that ensures data professionals and stakeholders guide the implementation of data governance as code, emphasizing integrity, security, and compliance as they relate to the organization's goals and needs.
The distinct advantages of embracing data governance as code
Thank you for your cooperation.
(RoboCop, 1987)
The foundational aspects of DGaaC we’ve mentioned here certainly benefit organizations. But the advantages of employing a RoboCop tend to positively affect the teams they work on behalf of as well.
In this sense, both Dev and DataOps teams enjoy additional and specific benefits when their data governance is policed through code. When taken together, these benefits can significantly improve their ability to manage and improve data utilization for critical stakeholders and data consumers:
Automation enablement
As noted, DGaaC supports the implementation and optimization of data governance policies and procedures through automated scripts and tools. When automation can handle what well-meaning human hands once had to, human-related errors naturally drop while efficiency increases.
This also frees up valuable time and energy for data teams, which naturally increases as teams no longer need to apply and enforce processes across the organization.
Increased scalability
Businesses need to grow. This is why organizations always need to be prepared to handle growth, even when it isn’t yet happening. This gets trickier for data teams handling governance manually, as the size and complexity of organizational data ebbs and flows.
However, the automated monitoring and enforcement DGaaC provides is the perfect foil to surging tidal waves of ever more elaborate and intricate data. Data governance as code ensures governance policies remain effective and manageable at scale, sans the need to increase manual workloads.
All-encompassing data consistency
The understated beauty of putting compliance RoboCops to work is how consistently they deliver data quality, as automatic enforcement of governance policies greatly reduces the bad actors that chip away at data reliability and accuracy (e.g., data discrepancies, redundancies, formatting issues).
Moreover, DGaaC can also be empowered to automate critical aspects of data remediation within an organization. By doing so, teams further governance consistency by enabling continuous data validation and data cleaning.
Enhanced agility
Some business leaders need to especially adjust and adapt to volatile business landscapes in order to help guide their organizations. In these instances, data teams are most likely already considered secret weapons within said orgs. But of these teams, those who handle data governance through code can make large-scale changes to procedures and policies whenever their higher-ups deem it’s needed. And these changes can be subsequently deployed across the entire organization in real time.
This capability alone creates exceptional governance guardrails—enabling organizations to respond promptly to new data privacy laws and internal policy updates, reducing penalty risks and potential reputational damage.
Comprehensive compliance
Finally, with DGaaC, this agility doesn’t come at a cost. With governance as code, organizations with automated data governance processes can trust their data handling practices are consistently aligned with all regulatory standards, in addition to shifting and changing as regulatory requirements can and do.
This can almost zero out the risks of non-compliance and associated fines, as DGaaC provides clear audit trails of all governance activities. As such, data teams can save valuable time and effort in demonstrating compliance if and when audits occur.
A note on sequels: Similarities and differences between data and cloud governance as code
Excuse me, I have to go. Somewhere there is a crime happening.
(RoboCop, 1987)
Success begets success, much as RoboCop paved the way for RoboCop 2 and...well, let’s just cut our movie analogy losses there.
More importantly (and relevantly), cloud governance as code (CGaC) is now taking compliance excellence a step further, in much the same way DGaaC built on IaC had.
Although cloud governance as code and data governance as code share some similarities, they are distinct solutions. This distinction can lead to confusion, as it’s tempting to use the terms interchangeably. However, cloud governance as code and data governance as code are complementary approaches to implementing the key directives mentioned above.
1. Shared foundations: Automation and compliance
Both CGaC and DGaaC build on the principles of Governance as Code, leveraging automation to ensure consistent enforcement of policies across the data lifecycle. These frameworks are grounded in the idea that governance should be an integrated, continuous process rather than a series of manual checks and balances. This shared foundation leads to the following common goals:
- Ensuring data integrity: Both approaches prioritize the accuracy and reliability of data, utilizing automated processes to validate and monitor data quality.
- Safeguarding privacy and security: Whether in the cloud or across broader data environments, both CGaC and DGaaC enforce strict security measures, such as access controls and encryption, to protect sensitive data.
- Enforcing compliance: Automated enforcement of regulatory requirements is central to both frameworks, reducing the risk of non-compliance and ensuring that data handling practices align with legal standards.
2. Context and scope: Different operating environments
The primary distinction between CGaC and DGaaC lies in the contexts within which they operate.
Data Governance as Code
Broad application: DGaaC is applied across all data environments, whether on-premises, in hybrid setups, or across multiple cloud platforms. It encompasses the entire data lifecycle, managing everything from data quality and lineage to access control and compliance across various systems.
Comprehensive coverage: DGaaC is designed to handle governance across diverse data systems, making it essential for organizations with complex, multi-layered data infrastructures. It addresses challenges related to data consistency, lineage tracking, and comprehensive policy enforcement across all data assets, regardless of their location.
Cloud Governance as Code
Cloud-specific focus: CGaC is tailored to the unique challenges of cloud environments. It emphasizes the governance of cloud resources, such as virtual machines, storage, databases, and networking components, while addressing cloud-specific concerns like scalability, cost management, and the shared responsibility model between cloud providers and customers.
Dynamic and real-time enforcement: Given the elastic and scalable nature of cloud environments, CGaC often requires real-time policy enforcement and continuous monitoring, ensuring that governance adapts to the dynamic cloud landscape. This makes it particularly relevant for organizations operating in or migrating to cloud-based infrastructures.
3. Challenges addressed: Unique vs. overlapping issues
While DGaaC and CGaC share some common challenges, they also address distinct issues unique to their respective domains:
DGaaC challenges
Data consistency across platforms: Ensuring consistent governance policies across varied and often siloed data systems.
Complexity of data lineage: Tracking data as it moves through multiple systems and processes, often in diverse environments, to maintain accurate data lineage.
Scalability of governance practices: Scaling governance policies as data volumes grow exponentially, without a proportional increase in manual effort.
CGaC challenges
Cloud resource management: Managing the governance of dynamic, often transient cloud resources, such as instances that can be spun up or down rapidly.
Cost and resource optimization: Implementing governance policies that not only ensure compliance but also optimize cloud costs and resource utilization.
Security in a shared environment: Addressing the specific security challenges posed by the cloud's shared responsibility model, where both the cloud provider and the customer have roles in maintaining security and compliance.
Complementary approaches: Building a holistic approach to data policing
While CGaC and DGaaC are distinct, they are also complementary. Together, they form a holistic governance strategy that ensures robust, end-to-end governance across an organization's entire data ecosystem:
Integration of policies: Organizations can integrate CGaC and DGaaC to create seamless governance policies that apply both to their cloud resources and to their broader data environments. This integration ensures that as data moves between on-premises systems and cloud environments, governance remains consistent and effective.
Unified compliance framework: By leveraging both CGaC and DGaaC, organizations can establish a unified compliance framework that adapts to different environments while maintaining a consistent approach to data protection and privacy.
Lawing down the law: Expanding on compliance excellence
The premise of RoboCop hinges on a simple truth: without “the law,” the entire story falls apart. There is a clear distinction between right and wrong, and recognizing that distinction is crucial. RoboCop (Alex Murphy) embodies this understanding, making him effective both as a movie character and as an analogy for data governance as code.
Similarly, there are clear rights and wrongs when it comes to data quality—a truth well known to data professionals. They also understand that the effectiveness of data governance is directly tied to the quality of the data being governed.
In our world of data producers, consumers, stakeholders, and high stakes, data contracts have become increasingly vital. They ensure that the promise of high data quality is upheld and enforceable by data teams and governance policies within organizations.
Just as RoboCop stands out by working with the best, it’s important for data professionals to do the same. To delve deeper into why this matters, download our ultimate guide on data contracts to learn more!