Our Articles

A Monthly Article from our Speakers

Current Article of the month

Articles of 2020

Articles of 2019

Articles of 2018

Articles of 2017

Articles of 2016

Articles of 2015

Articles of 2014

Articles of 2013

Articles of 2012

Articles of 2011

Articles of 2010

Articles of 2009

Articles of 2008

Articles of 2007

Articles of 2006

Articles of 2005

Articles of 2004

Articles of 2003

Articles of 2002

Barry DevlinFrom Data Warehouse to Digital Business

by Barry Devlin

June 2019

 

If you already have a comprehensive and well-architected data warehouse, you may be closer to delivering a digital business than you might think.

If you already have a comprehensive and well-architected data warehouse, you may be closer to delivering a digital business than you might think.

Digital transformation! Digital business! From analysts to vendors, from consultants to CEOs, the cry is the same: transform yourself into a digital business if you want to succeed in your industry. And do so quickly, because companies that are far down that road already have an enormous variety and volume of data that will enable them to displace you and other incumbents. According to McKinsey’s December 2016 report, The Age of Analytics: Competing in a Data-Driven World: “The network effects of digital platforms are creating a winner-take-most situation. The leading firms … are actively looking for ways to enter other industries… can take advantage of their scale and data insights to add new business lines… are blurring traditional sector boundaries.”

How far this prediction will play out remains to be seen. Nonetheless, the search for competitive advantage in every industry is now firmly aligned with digital transformation: the ability of businesses to take advantage of the information they already have and all additional data they may access or acquire. For those businesses that have deep experience in the digital world, especially new Internet-based businesses, being data-driven is almost second-nature to them. However, most traditional businesses still struggle to progress down the road of digital transformation. Indeed, some wonder where to even begin.

If you are one of those companies, the starting point may be nearer than you imagine: Digital transformation can begin with your existing data warehouse, but only if you are willing to look at that infrastructure in a different light. Let’s start with the original purpose of the data warehouse.

Thirty years ago, I defined the first data warehouse architecture. The business driver was to provide a reliable and consistent set of data from all available sources for any reporting or analysis need. In the intervening years, much of the focus has been on the reporting and analysis needs. However, when considering the value of data warehousing in digital transformation, we need to look at the underpinning idea of how and when to create a reliable and consistent data set from multiple sources. In its essence, a digital business is one that can use data from every imaginable source for business advantage and balances how far such data should be made consistent and reliable and when that data can—or must—be used in its raw state in order to limit complexity and cost.

This, of course, leads us to the topic of the appropriate information/data architecture for a digital business.

In contrast to a traditional business, a digital business uses externally sourced data—from social media and the Internet of Things (IoT)—as a basis for both operations and management. These new sources are characterised not only by the three Vs of volume, velocity and variety. More importantly, they are often of questionable quality and reliability. Furthermore, this new data must also be integrated with data originating from the traditional, internal operational systems. As a result, and bearing in mind thirty years of data warehouse experience and success, it makes sense to ask: What can we keep and what must change in a data warehouse architecture in order to build a digital business? We can see three key aspects. First and foremost is data reconciliation and consistency. In a data warehouse, data is reconciled by bringing it all to a single store—the enterprise data warehouse (EDW) built in a relational database. Big data proponents have long claimed that a relational database cannot handle the volumes, velocity and variety of big data. However, hardware advances—such as in-memory databases, parallel processing and GPUs—coupled with software that takes advantage of this new hardware have weakened these claims significantly. In addition, advances in data virtualization technology allow data to be logically reconciled without the need to bring it all to the same platform. In effect, we can use an enhanced EDW for data that must be physically reconciled, while data virtualization allows us to reconcile data across relational and non-relational platforms. This concept is key to the pillared logical information architecture that I devised in my book, Business unIntelligence.

Second is enterprise information modelling. We must recognise that it is unnecessary—and impossible—to reconcile all data. Some data (individual data points from IoT sensors, for example) is too fleeting to reconcile. Other data, such as that from social media, is too unmanaged and ill-defined to make consistent. An enterprise information model (EIM) is necessary to decide the uses and value of different types of information to the business and when it is necessary to reconcile it or not. Most data warehouse implementations have already created EIMs of varying degrees of completeness and complexity. Such EIMs must be extended to cover external information sources. The core entities and relationships of the existing EIM will certainly remain at the heart of the digital business and allow new information sources to be linked as appropriate to traditional entities. It is through the EIM, for example, that social media user IDs can be matched to internal customer IDs (within the bounds of GDPR legislation, of course).

Third is metadata or, as I prefer to call it, context-setting information (CSI). The concept of metadata was central to the data warehouse, both for technical management of the data stored there, as well as for enabling businesspeople to understand and benefit from the data warehouse. Unfortunately, many earlier implementations have focused almost exclusively on technically oriented metadata. This is now changing with an industry emphasis on “data catalogs”, particularly those that incorporate some elements of machine learning. The inclusion of machine learning, as well as collaboration, in these modern tools is important because it addresses long-standing challenges of gathering information about how data is really used and understood by businesspeople. These challenges that have prevented data warehouse developers offering a full set of CSI in the past.

In summary, the major change to a data warehouse architecture to enable support for digital business is in how data is stored and reconciled. In a digital business, we no longer channel all data through a relationally based EDW. Rather we limit the EDW to that data that must be consistent across the enterprise and allow other data to be stored on other platforms (a data lake) and reconciled through data virtualization only to the extent necessary. This change leads to extended requirements on—but no fundamental change in—information modelling and business-focused context-setting information as defined in traditional data warehousing. So, if you are planning to implement a digital business, dust off your old data warehouse architecture and see how it can help!