The missing data link: Five practical lessons to scale your data products

Imagine you were a railway executive with a contract to transport valuable cargo across the country. You wouldn’t have a different engine pulling each individual car of cargo. It would be much more efficient and cost-effective to hitch as many cargo cars as possible to the same engine. In fact, you would want a standard set of trains and connectors that would allow you to pull different kinds of cargo anywhere.

This analogy is particularly germane to the world of data products. Scale and value come from treating a data product like an engine that can support a large number of high-value use cases (or cars). Unfortunately, when it comes to data products, companies are operating much more along the single engine–single car model. The result is fragmenting data programs that fail to scale or generate the value that many had expected.

In some ways, this is a glass-half-full problem. When we wrote about data products in 2022, we detailed the advantages of managing data like a product. A data product delivers a high-quality, ready-to-use set of data that people across an organization can easily access and reuse for a variety of business opportunities (see sidebar, “What is a data product?”). Since then, organizations across sectors have started to adopt data products as core elements of their data and business strategies. The wave of enthusiasm surrounding gen AI has driven a wider appreciation in the boardroom of the importance of data and the need to better harness it.

That enthusiasm, however, has produced mixed results. Confusion about how data products deliver value, governance practices that favor the individual use case over larger ROI benefits, and institutional incentives that reward building data products over scaling them all have a role in choking value. With companies increasingly relying on data—from harnessing gen AI to developing digital twins—to innovate and expand the business, ineffective or nonexistent data product practices are becoming a top strategic issue.

Our experience working with dozens of companies in the past few years has shown that building valuable data products is much less of a technical challenge than a strategic and operational one. That experience can be boiled down to five key lessons:

It’s about more value, not better data. The goal of developing data products isn’t to generate better data; it’s to generate value. No data product program should begin until leadership has a firm grasp of the value that each use case can generate and prioritized the biggest opportunities.
Understand the economics of data products. A data product’s effectiveness is based on the “flywheel effect” of accelerating value capture and reducing costs with each additional business case that it enables.
Build data products that can power the flywheel effect. Harnessing the flywheel effect of ever-lowering costs and -rising value requires building a capability that maximizes reuse and reduces rework.
Find people who can run data products like a business. Put in place empowered data product owners (DPOs) and senior data leaders who understand what matters to the business, from articulating the value in business terms to building support.
Integrate gen AI into the data product program. Gen AI is already proving that it can help develop better data products faster (as much as three times faster) and cheaper than other methods.

It’s about more value, not better data

Companies that are disciplined in developing a thoughtful data product program can target high-value cases to reap benefits quickly while putting in place the right foundations to continue to build incremental value over time. Delivering on this aspiration requires both a more targeted and more expansive approach to developing data products than is often the case.

In our experience, the vast majority of executive focus and energy is on a specific use case or two because it allows leadership to show activity and celebrate impact. Or CIOs get pulled in many directions with requests to create particular data products without any effective way to weigh their costs and benefits to the enterprise as a whole.

Leaders instead need to create a clear view on where the greatest value to the business is. That starts with having the discipline to analyze the value potential of each use case on a business’s program (at least over the upcoming 12 to 24 months), then clustering the ones that rely on similar types of data. If there are no other relevant business cases, then building a data product isn’t necessary. If, however, there are multiple high-value use cases that rely on similar data sets, that’s a strong argument to develop a data product. The more business cases that a data product can address (the larger the size of that cluster), the greater the value that it can generate.

This analysis should result in the creation of a map of use cases and corresponding data products, with value measurements for each (Exhibit 1). This plan sets the expectation of value, is an asset that leadership aligns on, and becomes a practical tool to guide decision-making.

Data products can be prioritized based on the value they enable.

It’s important to note that delivering on high-value business cases often requires multiple data products. One telco, for example, wanted to optimize its network deployment, which required it to figure out which people and equipment were available at any given location. To run the relevant analysis, it needed to develop two data products: one focused on technicians’ skills and locations (among other things) and the other on towers to capture the SKUs for the parts used, configurations, traffic data, and each tower’s performance.

Understand the economics of data products

Companies often don’t have a clear understanding of the economics of data products, which leads to misleading business cases, uninformed decision-making, and ineffective resourcing. The value of a data product comes from the steady reduction in incremental costs achieved from reusing it and the acceleration in capturing the value of each additional use case.

Many of the costs from developing a data product are one-time investments. At one telco, for example, an estimated 60 to 80 percent of a data team’s time spent finding, preparing, and performing quality assurance on data in setting up an initial data product was for one-time efforts that didn’t need to be repeated for each new business case. Those one-time costs are in effect amortized as the data product is reused for other use cases, resulting in steadily lower costs for each use.

This flywheel effect of lowering costs was clear at one international consumer company. At the point when a data product enabled five use cases, its projected cost was about 30 percent less than building individual data pipelines for five analytical solutions (Exhibit 2). When that data product was then scaled to another market, projected costs were about 40 percent lower when addressing five analytical solutions (versus building individual data pipelines). This cost reduction stemmed not only from the reuse of the standardized data product but also from the experience that the data product team had accrued.

Reusing data products across multiple analytical cases and markets can reduce incremental costs.

That flywheel effect also plays out on the value creation side of the equation in two ways. The first is the acceleration in capturing a use case’s value—in some cases, by as much as 90 percent. The more business cases that a data product supports, the faster a company can book the value of each of them. One insurance company, for example, used data products to capture $210 million of value from a set of use cases in its plan. It could have captured that amount through standard data programs, but they would have cost an estimated 50 percent more and taken up to twice as long to complete.

The second is the reduction of costs associated with data quality and reliability issues related to data practices when not using data products. The standardization, structuring, and automation that are part of data products help significantly reduce failure rates resulting from poor data management and reduce product defects.

This understanding of data economics has three key implications:

Companies can build fewer but better data products. The lion’s share of the potential value to a company generally comes from five to 15 data products.
CIOs can convincingly articulate the value of data products to the business. They should be able to communicate the value when developing business plans and making investment requests.
Clarity on data product value can help the effective alignment of resources and incentives. Too much energy and focus are spent on building data products rather than maintaining and evolving them over time. This includes, for example, ensuring that people with relevant skill sets support a data product’s growth over time and aligning incentives to meaningful KPIs (such as data-product-reuse velocity, stakeholder trust and satisfaction rates, efficiency of data product maintenance, and time to value). Budgets for use cases should also build in data product usage, limiting the opportunity for teams to go off and create their own data solutions.

Build data products that can power the flywheel effect

Data engineering is the unsexy but critical blocking and tackling of the data product development that powers the flywheel effect of lowering costs and accelerating value. That effect depends on how well data products and systems are built to scale. Too often companies will shortcut this work, leading to predictable delays and cost overruns.

There are a number of key considerations when it comes to engineering data products so that they can scale:

Build the data product to evolve easily. One of the most important steps in developing a data product is understanding how it will need to evolve to accommodate future use cases. That’s because it can be difficult and costly to change the data product once built. In practice, this means modeling the data product so that new data sources and types can be added to it without changing the core of it. Maintaining data simplicity is particularly important. That might mean, for example, capturing a customer’s date of birth (which is relevant for many business cases) rather than performing any calculation on it, such as their age at the time of the first buy (which isn’t useful for many additional use cases).
Develop assets that easily tie into existing systems. Data products can’t scale on their own. Data leaders need to invest in support mechanisms to enable scaling. While many are aware of best practices (such as establishing libraries of approved code), breakdowns often occur with how well data products connect to existing systems. Putting in place standardized connection technologies (such as APIs and database connectors) is critical, as is thinking through technology choices that might affect data product performance. If a company has developed an enterprise data warehouse, for example, building data products within that environment makes it much easier to access the data.
Make access to data products simple. A great data product is meaningless if no one uses it. Avoiding this issue entails putting in place easy access to data products through a searchable marketplace (think Apple’s App Store) and an easy way to contact the data product team for support when developing consumption pipelines (Exhibit 3).
Build a DataOps (data management and operations) capability to automate as much as possible. Processes to rationalize data and direct them to the right data product tend to be manual and thus arduous and time-consuming. That’s why developing a mature DataOps capability to improve the integration and automation of data flows from source to product is critical for scale. This requires automating as many processes as possible (by security as code and data lineage documentation, for example). Because data products are organic and change as new use cases emerge, data leaders need to revisit and adjust these automations often.
Organize groupings of reliable data. Good data products need good data. That’s a massive challenge at most companies, where a lack of standards, conflicting data sources, and the fast-changing nature of data in the world of gen AI create significant roadblocks. One company, for example, had 400 columns in its data tables just for different ways to capture customers’ dates of birth. Addressing this issue requires teams to create logical groupings of related data, sometimes called “data domains.” In the realm of supply chain management, for example, a data domain might include the order, finished-goods inventory, term discount, shipping documentation, transportation data, distribution center information, and demand-planning forecast data.

This curation effort includes consolidating copies of data, creating data standards, and establishing which data is the definitive source. In this way, data products can access the reliable sources that they need to function reliably and effectively. Creating reliable domains relies on having strong data engineers, but the gap between good and mediocre performers is often vast (and dragooning software engineers into the role rarely works out well). Great data engineers not only have strong technical skills but also are good at asking second- and third-order questions (such as what the data will be used for) and are creative problem solvers.

A data marketplace allows business units and group functions to search for, access, and consume data products.

Find people who can run data products like a business

A data product program viewed as “just an IT project” won’t succeed in creating value. Successful data product efforts require broad cooperation with people on the business side of the house and data leaders who understand how to run data products like a business, not a project (Exhibit 4).

Successful data product teams balance business and tech expertise.

Two needs in particular stand out:

Strong DPOs should lead the programs. Many companies understand the need to put a DPO or data manager in charge, but they don’t act on it sufficiently. Too often they will give project managers the lead role, resulting in a focus on delivering requirements rather than building value.

Strong DPOs are most effective when they run data product programs like a business and go way beyond just delivering the data product itself. They work closely with business sponsors, actively look across the business for new use cases for the data product, closely track KPIs and how much value is being generated, and identify ways to reduce costs (such as by shutting down databases that aren’t needed). They are directly accountable for generating value from the data product. What that means in practice can vary. In some cases, the DPO will have profit-and-loss responsibility, while in others, the DPO may be directly compensated for the value that their data product creates.

DPOs should not manage data domains. That’s the purview of data stewards, who focus on meeting standards, lowering risk, and ensuring quality of the data in domains.
Businesses should lead development. Businesses need to be closely involved from the beginning of the data-product-development effort. Leaving product design decisions to a data engineer alone, for example, often leads to a data product that functions well technically but doesn’t address the business’s true needs. Instead, the best companies bring in subject matter experts and colleagues from the business side of the house to collectively decide which data is most important to deliver a given use case.

This level of collaboration applies at a broad organizational level, since data product programs require significant orchestration across the business to share data and update processes. Such an effort has a much higher chance of success when led by a senior business executive rather than a data leader. At one insurance company, for example, almost 80 percent of the contact information of beneficiaries was out of date. Updating that data required operation teams to prioritize the effort, customer-relationship-management teams to update data feeds, and sales teams to make the calls and input the updated information into the system. A senior business leader not only actively campaigned to build support among leaders in the various functions to make this happen but also developed a report card to track the completeness and quality of data. Some companies take the additional recommended step of creating a team responsible for data performance management and measurement to ensure that data product teams are delivering on their KPIs.

Integrate gen AI into the data product program

Gen AI tools and capabilities are having a profound effect in data product development, accelerating the process by as much as three times over traditional methods. Many companies, however, struggle to capture that efficiency. In our experience, the issues are related to the fact that companies are only focusing on a relatively small part of the cycle for data product development and deployment. They need to instead dissect the development steps to identify repeatable processes that are well suited for gen AI (Exhibit 5).

Gen AI enables organizations to rethink the entire data product development cycle to curate industrialized and highly consumable data products.

Tasks for which we have seen gen AI be particularly effective include creating features and user stories with acceptance criteria, generating requirements based on business goals, automating the generation of data relationships, generating transformation code for migrating data from its source to the target system, and testing for data quality and privacy.

The access to entirely new types of unstructured data (such as images, user reviews, and videos) that gen AI affords can enhance a data product’s effectiveness—by, for example, incorporating sentiment analysis and historical behavior. But companies need to organize and treat their unstructured data to make it usable. That means, for example, tagging the data, determining its importance, and creating shortcuts to access the most used data to control for costs.

To shift analogies, data without data products is like oil without refineries: There is little value in the raw form. Data products are the key to leading data-driven decisions and actions that generate value. But that value can only become meaningful when leaders are ready to not just build data products but scale them as well.

McKinsey & Company

“Our firm is designed to operate as one—a single global partnership united by a strong set of values. We are equally committed to both sides of our mission: attracting and developing a talented and diverse group of colleagues and helping our clients create meaningful and lasting change.

From the C-suite to the front line, we partner with clients to help them innovate more sustainably, achieve lasting gains in performance, and build workforces that will thrive for this generation and the next.”

Please visit the firm link to site

share this article Share this content

It’s about more value, not better data

Understand the economics of data products

Build data products that can power the flywheel effect

Find people who can run data products like a business

Integrate gen AI into the data product program

share this article Share this content

You Might Also Like

From AI to Impact: Capabilities powering Lighthouses’ 4IR adoption

Positioning for success in the chemical markets of the future

Mapping the road to prosperity and parity for Black and Latino residents across America

The fashion industry faces a world in flux

Share this content

Share this content