Data Management for Software Engineering Teams
Hello everyone, my name is Chad Sanderson. I am the author of the blog Data Products and the CEO/Co-Founder of Gable.ai. Over the past few years, I’ve written quite a bit on data management, data quality, and data contracts. In 2024 I spent a little bit less time writing, and more time implementing. I’ve worked with dozens of enterprises during that time span and watched the evolution of data contracts from a nascent idea stemming from a few LinkedIn posts to driving real change for some of the largest companies in the world.
The result of that experience has been I have become a bit of a data extremist. I believe there is a completely new domain of data management over the horizon, one that will altogether change how we think about the discipline, rewrite most/all of our common best practices, and bring the various stakeholders into a cohesive lifecycle of data management. This is a mix of opinions, combined with a description of the cutting-edge - truly game-changing companies that are pioneering how data management is done, oftentimes from unexpected places.
Over the course of this manifesto, I will try to convince you of a few things I strongly believe:
- Most engineering teams are federated or becoming so
- The way we manage data is designed for centralized environments
- Data strategies will almost always fail, due to points 1 and 2
- Federated data management is possible, but requires a different approach
- That approach has been historically successful in other engineering disciplines
Is this manifesto about data contracts? No. But they do feature prominently. Data contracts are a component of shifting left, among dozens of other components. These components work together to create an entirely new dynamic of data management, completely inverting the processes, tools, strategies, and adoption rates of data quality and data governance. There is so much content to cover, across such a wide variety of topics that I’m splitting the manifesto into two parts for my own sanity. The first section is going to cover where we are today, why I believe the state of data management is fundamentally flawed, why “culture shift” is almost always impossible without technology, and working solutions, and what we can learn from other industries that have solved the same type of problems. Let’s jump into it.
Conway’s Law
Conway’s Law is the observation that organizations tend to design systems that mirror their communication structure. A product designed by a three-person organization will likely have three components. If it is designed by a single team, it will likely all be built within a single large service.
A media company with separate teams for video encoding, recommendation algorithms, and user interfaces might build a streaming platform where these components are loosely coupled, reflecting the team's structure. Hospitals and insurance companies have separate IT systems due to distinct legal and compliance teams. This results in fragmented medical records across different providers, forcing patients to manually transfer records or redo tests.
There are three primary stakeholders in the data management value chain:
1. Producers: The teams generating the data
2. Platforms: The teams maintaining the data infrastructure
3. Consumers: The teams leveraging the data to accomplish tasks
Conway's Law would dictate that the data management, governance, and quality systems implemented in a company will reflect how these various groups work together.
In most businesses, data producers have no idea who their consumers are or why they need the data in the first place. They are unaware of which data is important for AI/BI, nor do they understand what it should look like. Platform teams are rarely informed about how their infrastructure is being leveraged and have little knowledge of the business context surrounding data, while consumers have business context but don't know where the data is coming from or whether or not it's quality.
Is it any wonder that data management programs are a complete, disjointed mess?
The opposite side of the coin of Conway’s Law is the Law of Unintended Consequences Systems, or to summarize - "The purpose of a system is what it does" (POSIWID) – coined by Stafford Beer, a cybernetics researcher. The rule means that what a technology does is more illustrative of what its intended goal is, rather than any stated intent.
For example, suppose data pipelines are consistently breaking and the data is always low quality. In that case, it means the point of your data ecosystem is not to produce high quality trusted data - it is actually to enable teams to move fast, ship without accountability, and tolerate breakages as an acceptable trade-off.
Your data ecosystem is optimized for speed over reliability, manual firefighting over prevention, and short-term fixes over long-term quality. If high-quality data were truly the goal, the system would have built-in schema enforcement, automated validation, and clear ownership—but since those don’t exist (or are routinely bypassed), the real function of the system is to allow chaotic, ad-hoc data handling that prioritizes short-term delivery over long-term trust.
If Conway’s Law helps explain how data got into such a sorry state, POSIWID explains why - the broader organization optimizes for manual effort over proactive, automated, comprehensive solutions.
Federation Ate the World
The early 2000s marked a fundamental shift in how software engineering teams were structured. As technology companies scaled, they recognized that high-quality software required rapid iteration and continuous delivery. Research on software development lifecycles—such as the work popularized in the "Accelerate" book by Forsgren, Humble, and Kim—demonstrated that teams capable of shipping frequently could identify and fix defects faster. The more frequently features were pushed into production, the sooner feedback loops closed, leading to better user experiences and stronger business outcomes.
To facilitate this velocity, companies embraced Agile methodologies, which dismantled the traditional, slow-moving hierarchical structures and replaced them with autonomous, cross-functional teams. Rather than requiring months or years of deliberation, these teams operated with localized decision-making authority, allowing them to experiment, iterate, and ship software much faster. This shift not only optimized for speed but also reduced the coordination overhead that had historically slowed large engineering organizations.
Out of this decentralized model emerged federated engineering structures and the adoption of microservices architectures. Instead of monolithic applications where multiple teams shared responsibility for a single massive codebase, companies transitioned to a world where individual teams owned their services, databases, infrastructure, and deployment pipelines. Each team was empowered to make locally optimal decisions—choosing their own programming languages, data models, and release schedules, all in the name of speed.
The trade-off, however, was that many centralized cost centers—teams and functions designed for a monolithic, tightly controlled architecture—struggled to adapt. Operations teams, for example, were historically responsible for managing deployments in a centralized, controlled manner. With a monolithic system, they could plan releases and enforce best practices through well-established governance processes. But in a federated world, visibility disappeared. Now, hundreds of teams were shipping thousands of changes independently, overwhelming centralized ops teams who could no longer track, validate, or mitigate risk effectively.
This same dynamic played out in the world of data. Historically, data teams had ownership over the organization’s entire data architecture—curating data models, defining schema governance, and managing a centralized data warehouse. But as engineering teams began making independent decisions about which events to log, what databases to use, and how to structure data, the once-cohesive data ecosystem fragmented overnight.
Without centralized oversight, engineering teams optimized for their immediate needs rather than long-term data quality. Events were collected inconsistently, naming conventions varied wildly, and different teams structured their data models based on what was most convenient for their service, rather than what was best for the organization as a whole. This led to massive data silos and duplicated effort.
The response from data teams? The Data Lake.
Instead of trying to force governance onto hundreds of independent teams, companies adopted a "dump now, analyze later" approach, instructing engineering teams to send all their raw data into a centralized cold storage repository. This led to the rise of data engineering, a discipline that emerged to turn the messy unstructured data into something usable. Data engineers became reactive firefighters, constantly wrangling broken schemas, cleaning up unexpected transformations, and trying to reconstruct meaning from fragmented event logs.
This model was deemed acceptable by business leaders because it allowed engineering teams to move quickly, even though it meant the data team was perpetually stuck in a reactive mode.
In the early days of the cloud, this reactive data engineering model was sufficient. Most organizations primarily used data for dashboarding and reporting, where occasional inconsistencies could be tolerated. But as the industry evolved, the stakes for data reliability grew exponentially in multiple directions:
- Machine Learning & AI: With AI-driven decision-making, poor data quality no longer just caused bad reports—it directly impacted product functionality and user experience. A mislabeled dataset could lead to a faulty recommendation algorithm or an inaccurate pricing model.
- Data as a Revenue-Generating Product: Companies began monetizing data directly—either by selling insights, building customer-facing analytics, or enabling real-time personalization. Inaccurate data now had a direct impact on revenue.
- Regulatory Compliance & Risk: As GDPR, CCPA, and other data protection regulations took effect, bad data practices became a legal liability. A single oversight—such as failing to properly delete user data upon request—could result in multimillion-dollar fines.
With these shifts, the consequences of reactive data engineering became untenable. Data teams could no longer afford to be downstream janitors, constantly cleaning up after engineering decisions made without governance. Instead, something fundamental had to change.
The federated model of software engineering isn’t going away—if anything, it has only expanded. However, as we’ve seen, the decentralization of engineering cannot come at the cost of quality. Organizations now face a critical inflection point:
- How do we reintroduce governance without reintroducing bottlenecks?
- How do we enable engineering speed while ensuring data correctness and compliance?
- How do we prevent reactive firefighting and create proactive, self-service data management?
Just as DevOps introduced infrastructure as code to solve the challenges of federated operations, the next era of data engineering must be deeply integrated into the software development lifecycle. Federation ate the world. But now we must decide how to rebuild it—this time, with accountability and resilience at its core.
Shifting Left
As the cloud, microservices, and decoupled engineering teams grew the centralized model of cost center management became harder to maintain and justify. The concept of Shifting Left emerged as a mechanism of driving ownership across a decoupled engineering organization and ultimately became the go-to solution for developers. In our context, shifting left is designed to help data teams overcome the people, process, and cultural challenges created by gaps in communication around data management. Instead of data management solely being the responsibility of the downstream data organizations, the treatment of data is a shared responsibility across producers, data platform teams, and consumers.
Simply put: Shifting Left means moving ownership, accountability, quality and governance from reactive downstream teams, to proactive upstream teams.
While Shifting Left may sound too good to be true, this pattern has happened on three notable occasions in software engineering. The first is DevOps, second is DevSecOps, and third is Feature Management.
DevOps first emerged as a concept between 2007-2008 meant to address the growing gaps between IT teams (Dev) and Operations (Ops). Before DevOps, software development and IT operations worked in silos. Developers would write code and pass it to operations teams, who were responsible for deployment and maintenance. This led to:
- Slow releases due to hand-offs and bottlenecks.
- Frequent deployment failures caused by differences between development and production environments.
- Blame culture, where development blamed operations for slow deployments, and operations blamed development for unstable code.
With DevOps, IT teams have become more collaborative. Development and operations now work closely together, using continuous integration and continuous deployment pipelines to automate testing and deployment, reducing human errors and accelerating release cycles. Infrastructure as Code (IaC) allows teams to manage infrastructure programmatically, ensuring consistency and scalability. Monitoring, logging, and observability tools provide real-time insights, enabling proactive issue resolution rather than reactive firefighting.
These days it is incredibly rare to see an engineering organization operating at a meaningful scale without a DevOps function. Most developers in a company are responsible for writing their own unit and integration tests. Teams rally around version control tools like Github and GitLab, both for collaboration, code review, auditing, and more.
Roughly 10 years later in 2015, we saw a similar pattern in DevSecOps. Security teams were reactive -dealing with fraud and hacking after the fact rather than taking proactive and preventative steps to ensure software was designed with security in mind. Like Ops teams, the Security organization was siloed and disconnected from value, and as a cost center, suffered from the same problems as operations.
DevSecOps is more complex than simply integrating security into existing DevOps workflows because it requires security to be automated, continuous, and developer-friendly—something traditional security practices were not designed for. Unlike traditional security, which was often applied as a final step before deployment, DevSecOps embeds security checks throughout the entire software development lifecycle. This shift introduces several challenges:
- Shift-Left Security Requires Developer Buy-In
Security teams normally operated as gatekeepers, reviewing code and infrastructure late in the process. DevSecOps requires developers to take ownership of security much earlier meaning they need security tools that are easy to use and integrated into their existing workflows. However, many security tools were designed for security experts, not developers. - Balancing Security and Speed
DevOps emphasizes fast, frequent releases, while security traditionally slows things down with rigorous reviews and manual testing. DevSecOps must balance both, requiring automation that can enforce security without blocking deployments. Achieving this requires integrating automated security best practices into CI/CD pipelines without causing excessive friction. - Automated Security Testing at Scale
Traditional security relied on periodic manual testing (e.g., penetration testing, compliance audits). DevSecOps requires continuous security testing, including:
- Static Application Security Testing for code vulnerabilities.
- Software Composition Analysis for third-party dependencies.
- Dynamic Application Security Testing for runtime security risks.
- Infrastructure as Code Scanning to prevent misconfigurations.
- Secrets and Credential Scanning to detect exposed sensitive data.
The overall takeaway? The more complex or multi-component a cost center’s workflows, the more sophistication is required to effectively shift left while managing the delicate balance of developer expectations, speed and scale.
And finally, the Shift Left has happened with Feature Management. Traditionally, feature rollouts, experiments, and instrumentation were handled late in the development cycle—often by downstream teams like product or growth. This led to:
- Engineers shipping features without proper instrumentation or user tracking.
- Product teams struggling to get clean data on feature performance post-launch.
- A/B testing requiring significant engineering support, slowing experimentation velocity.
- Limited ability to control or roll back features without a full redeploy.
By shifting Feature Management into the software development lifecycle, teams can build observability, experimentation, and rollout controls directly into the feature itself. Feature Flags allow engineers to ship code behind toggles, enabling controlled rollouts and fast reversions without redeployments. Instrumentation and product analytics are now added as part of the development process, not as a follow-up task. And experimentation frameworks are increasingly embedded into the codebase, letting product teams test and iterate without waiting on engineering.
Just as DevOps brought deployment and infrastructure closer to development, Feature Management brings experimentation the most important workflows from product analytics and A/B Testing into the software engineers core development path.
All three of these disciplines follow the exact same pattern: A critical business function is siloed downstream (Ops, Security, Product/Growth). By pushing tools and methodologies to the left, it isn’t an incremental change in value, but an inversion of how the job is done. Experimentation becomes something that happens for every new feature deployment by default. Systems are secure by default. While every team is following their own maturity within these paths - many are more sophisticated in shifting left than others - it is not just theory. This has already happened, and we in the data space should stop speaking about what might happen, and more about what can.
Shifting Data Left
Unlike engineering and security, data was the last frontier in cloud migration. While cloud-native infrastructure transformed application development and security in the early 2010s, data teams lagged behind, facing unique and more complex challenges that made cloud adoption far more difficult.
The primary reason for this delay lies in the inherent complexity and multi-faceted nature of data that makes the shift left difficult to manage. Security, despite its wide operational scope, primarily deals with enforcement within a codebase or infrastructure environment. Engineering, too, could migrate by lifting application workloads into cloud-hosted services. The product discipline had the easiest transition, given that front-end/full-stack engineers were already adding monitoring and instrumentation to their services with or without product managers asking for it. However, data does not exist in a single place, nor does it follow a single lifecycle. It moves across many different technologies and code bases, often passing through multiple transformations before it can be used.
A typical data workflow spans:
- Source data ingestion (from application databases, logs, APIs, and event streams).
- Storage across multiple environments (operational databases, data lakes, warehouses, object storage).
- ETL (Extract, Transform, Load) and ELT processes that modify and refine data.
- Aggregation into analytical databases or data warehouses.
- Further transformations inside the warehouse to clean and normalize the data.
- Downstream consumption by dashboards, machine learning models, or data products.
This multi-stage pipeline meant that migrating data to the cloud required far more than simply moving databases—it demanded rebuilding the entire data infrastructure stack from ingestion to transformation, as well the data management systems that were layered on top. The large number of dependencies across teams and the hefty requirements to manage the complexity of those dependencies slowed down cloud adoption significantly.
Now, 20 years into the cloud era, data teams are encountering the same organizational and technical bottlenecks that operations teams faced in the mid-2000s. Back then, software engineering moved to a decentralized, service-based model, which broke traditional operations workflows and required a complete rethink of deployment and monitoring strategies. Today, data teams face the same disaggregation problem—modern software development creates silos that fragment data, leading to inefficiencies and bottlenecks that restrict its flow and value within the organization.
In a highly federated engineering environment, individual teams often manage their own databases without centralized coordination, emit event streams without consistent schema or semantic governance, and make changes to their software without thinking of how the broader engineering ecosystem might be interconnected. These teams also tend to create ad hoc transformations that may duplicate or overwrite critical business logic.
The result? Data quality suffers. Much like operations before the rise of DevOps, data engineering has become a reactive cost center—constantly fixing inconsistent schemas and data drift, resolving duplicate or contradictory transformations, debugging downstream breakages caused by upstream changes, and responding to compliance incidents like untracked PII exposure.
Just as DevOps emerged to address the chaos of decentralized operations, we now need a similar movement for data—one that rethinks how we approach governance, engineering, and automation from first principles. A Shift Left approach to data requires embedding data management best practices at the source, not just patching issues downstream.
This new data paradigm must deliver on several fronts:
- Schema and contract enforcement at ingestion, to prevent breakages by validating structure at the point of creation.
- Versioning and change management, applying DevOps principles to schema evolution and business logic to ensure traceability and control.
- End-to-end lineage tracking, giving teams visibility into how data transforms across systems, helping them understand and reduce the blast radius of change.
- Automated compliance enforcement, detecting and tagging sensitive data like PII or financial records at the source.
- Observability and real-time monitoring, to catch anomalies and schema drift before they impact analytics or AI.
Operationally, shifting data left also changes how teams work:
- Engineering teams must own data quality, just as they own application reliability.
- Governance must become proactive, driven by automation and policy enforcement, not just documentation.
- Data contracts should be standardized to reduce fragmentation and ensure consistent expectations across teams.
- Compliance checks must be embedded into CI/CD pipelines, mirroring how DevSecOps integrates security into the development lifecycle.
The lessons from other shift-left approaches are clear: the embedding the techniques of business critical centers at the source is the only scalable approach. The same applies to data. The future of data engineering is not about building bigger, more sophisticated reactive teams—it is about pushing responsibility upstream and empowering the producer teams to own quality at the moment of creation.
Much like operations and security before it, data must shift left. Organizations that fail to adapt will face the same problems they always have, except this time the excuse of “its someone else’s problem won’t cut it when the success of a company' AI initiative is on the line. Those that succeed will transform data into a true production-grade asset, allowing data management initiatives to scale to their full potential in the AI era.
Data as Code
A core idea behind shifting Data Left is simple but often overlooked: data is code. Or more accurately—data is produced by code. It’s not just some downstream artifact that lives in tables and gets piped into dashboards and spreadsheets. Every record, event, or log starts somewhere—created, updated, or deleted by a line of code. And just like DevOps demonstrated, if you want to manage something well, you start at the point of creation.
Imagine you're in charge of a machine that produces a high-value product every day. Your job is to ensure quality. If something starts breaking, you don’t sit around analyzing boxes of defective products—you inspect the machine. Code is the machine. It runs on constraints that result in some form of data being produced—a CRM entry, an API payload, a Kafka event, a database write. Managing that data means managing the system that generates it.
This is where it’s useful to separate data management into three different but interconnected layers:
- DevOps is focused on the software development lifecycle—code.
- Observability is about the data that’s already been produced—records and aggregates.
- Business glossaries operate at a higher level—domains / policies / workflows.
Each layer has value, but they serve different personas and purposes. DevOps is the proactive software engineering layer; observability is the reactive data team layer; business glossaries are organizational scaffolding. If one of these layers is missing, it becomes incredibly difficult to connect the dots to the higher order data management workflows most teams wish they had.
For example: without data lineage, business processes can’t be tied back to any actual systems or datasets. Without code lineage, your data lineage is blind outside the warehouse—you have no idea where upstream data is coming from or what’s generating it.
This is why data management patterns—catalogs, contracts, lineage, and monitors—shouldn’t be thought of as individual tools. They’re cross-cutting patterns that apply across personas:
- Engineers need catalogs of code assets, source systems, and event producers.
- Data teams need catalogs of tables, metrics, and dashboards.
- Business teams need catalogs of domains, data products, and process workflows.
Same goes for lineage, contracts, and monitors. It’s not enough to do these things in isolation—we need contract enforcement in CI/CD, not just in downstream pipelines. We need code-level lineage, not just column-level lineage. We need contract monitors that can detect schema and semantic breakage as early as possible.
But it’s not just about capabilities—it’s about making sure those capabilities work for the right people. Effective data management requires alignment across three key groups: producers, consumers, and business teams. And each of these groups has very different needs.
Before the rise of the modern data stack, we had legacy catalogs—manual systems designed for centralized data stewardship. These catalogs were maintained by data stewards who filled in process docs by hand. That model worked when a few people owned data governance and everything moved slowly.
Before the rise of the modern data stack, we had legacy catalogs—manual systems designed for centralized data stewardship. These catalogs were maintained by data stewards who filled in process docs, definitions, and ownership by hand. That model worked when data governance was owned by a few people and everything moved slowly.
Then modern data catalogs showed up—tools that connected directly to warehouses like Snowflake and Databricks and scanned data assets automatically. They gave data teams a lot more visibility into the artifacts they worked with day-to-day. But they also ran into friction with software engineering teams. The problem? These catalogs weren’t built for engineers. They didn’t expose anything about the code that produces data, the services that emit events, or the systems that control data generation. And when a tool doesn’t map to your responsibilities, it doesn’t get adopted.
Engineers want to understand how the code they own creates data, where that data goes, and what depends on it. They care about understanding who is impacted by a change to their system and why it matters—but none of that is visible in warehouse-first tooling. A table in Snowflake doesn’t tell you what GitHub repo created it or what line of code owns the transformation logic. So engineers are left in the dark, and downstream teams are left managing the fallout.
To bridge that gap, we need to bring in techniques that have worked elsewhere—DevOps, CI/CD, and even security. Software composition analysis gives us a model for understanding dependencies. Dataflow analysis gives us insight into how code transforms data. CI pipelines can block bad changes before they reach production. Contract tests can catch mismatches between producers and consumers early. The point is: we already know how to solve this in software—we just need to apply those lessons to data.
Until we do, data will remain something engineers generate but don’t own—leaving the rest of the organization to clean up the mess downstream.
The number one question I get is: “Chad, this all sounds great—and conceptually we’re on board—but how? Where do we start?”
The answer: find allies on the software engineering team—specifically, those who already think in terms of validating their code against some predefined expectation (AKA - a contract). One of the best places to start? QA engineers and automated testing teams. They’re on a shift-left journey of their own, working to push testing and validation closer to the source of truth: the code. They already use a familiar concept—contract testing—to enforce expectations between APIs. This process defines the structure and behavior of communication between systems before runtime. Sound familiar? That’s a data contract.
This concept can—and should—be extended to cover all data ingress and egress points: where data enters a system, and where it leaves. Just like APIs, data systems need contracts at the edges. These contracts should validate everything from schema shape to semantic meaning. But more importantly, they shouldn’t just observe the data after it's been produced—they should observe the code that produces it. We need systems that catch breaking changes at the source, during development, not days later in a downstream dashboard.
A system like that doesn’t just help QA—it scales to every engineer. It defines clear ownership for producers and gives data teams hooks into the creation process instead of chasing quality issues downstream. It embeds data quality inside code quality.
And here’s the mindset shift: most data quality problems are code quality problems. There are really two types of root causes for production impacting bugs:
- A lapse in judgment—an engineer skips writing a test and a bug slips through.
- A broken dependency—an engineer unknowingly changes something a downstream team relied on.
The second category is where things get interesting. Think of a backend engineer changing an API that silently breaks the frontend. That’s not just a code quality issue—it’s a data quality issue. A schema changed. Expectations weren’t communicated. If engineers adopted data contracts to protect themselves, they’d also protect everyone else: analysts, ML teams, finance reporting, compliance—you name it.
And this approach isn’t limited to traditional pipelines. AI is about to turn this problem up to eleven.
Autonomous agents making code changes might work when isolated to a single codebase. But LLMs struggle with system-wide context—how one service depends on another, how a change in a producer might cascade across APIs, databases, and pipelines. As AI starts making changes across systems, data becomes the medium of communication—not just between humans and machines, but between AIs themselves. And with it comes a combinatorial explosion of data dependencies and breakages. Without contracts and shift-left enforcement, we’ll be flying blind into that complexity.
What Happens When We Get This Right
1. Data teams move upstream
When data quality enforcement happens downstream, data teams are left cleaning up issues they didn’t create and don’t control. But when contracts, lineage, and validation happen at the code level, data teams become active participants in the creation of data—not just its consumers. Instead of writing Slack messages about broken dashboards, they’re writing rules that prevent them from breaking in the first place.
2. Compliance becomes code
Policies don’t scale when they live in wikis or checklists. But they do scale when they’re encoded directly into CI/CD. Contracts as code let organizations define standards—PII tagging, schema validation, retention rules—and then automatically enforce them wherever data is produced. This moves compliance from something reactive and manual to something continuous and automated. No more chasing teams down before audits—violations get caught at the pull request.
3. Engineers get early feedback
Software engineers are used to fast feedback loops. They expect test failures to be caught before they merge code—not after something explodes in production. Data should work the same way. If a change to a schema or data payload is going to break an ML model or a reporting pipeline, the engineer should know before they hit merge. That kind of signal creates trust—and makes it easier for engineering to take ownership of data quality.
4. Quality becomes a shared responsibility
Right now, data quality is everyone’s problem and no one’s responsibility. But when data contracts are treated like API contracts, and quality issues are caught in dev, the responsibility naturally shifts to the teams closest to the cause. Engineers own their breakages, data teams own their transformations, and business teams finally get transparency into what’s reliable. Everyone’s incentives align. Quality isn’t something you inspect later—it’s something you build from the start.
5. The language changes
When data issues are framed as code issues, the conversation changes. Engineers don’t have to learn new tools or vocabulary—they just see tests, contracts, and CI checks like they do in the rest of their codebase. And when the language is familiar, adoption skyrockets. Suddenly, “data quality” isn’t a data team problem—it’s a software engineering best practice. This is how data quality becomes something everyone owns—because it's finally framed in a language software teams understand.
Conclusion
You made it to the end. Thanks for the read, I know it was long. This is a subject I’m passionate about, and I believe that shifting left is the key to unlocking such a wide range of solutions and utility to data teams that it is hard to succinctly describe. The impact of this movement, in my opinion, will be equally as impactful on software development as the advent of DevOps. And in the age of AI where data matters more than ever, developing the systems and culture to manage data as code is more critical than ever.
You may have noticed this article was light on details. That was intentional, for my own sanity. In the next article: The Engineering Guide to Shifting Data Left, we’ll get more hands on with real world examples, use cases, and implementations. Take care until then, and good luck.
-Chad