What is an enterprise data warehouse?

An enterprise data warehouse is a master dataset or a centralized repository which stores data from all areas of a business enterprise. It is vital for enterprises today to be agile while making data-driven business decisions, rather than intuition driven. Data is considered one of the most valuable assets to any business, as data, collected over time, reflects the entire knowledge of the company—pertaining to its customers, employees, investors, and other facets of its existence.

Unfortunately, business data is often stored in varied departments, datasets, platforms, and can be managed by different teams. This often leads to less than optimum analysis, which prevents businesses from getting an honest, holistic view of the business.

An example of this disparity and lack of cohesiveness is when key metrics related to supply chain management are handled by the procurement department, while customer relationship management data including sales conversions are managed by the sales team. How can accurate and relevant reports or KPIs be assessed with such disparate information?

The ultimate goal here is to obtain a single point of truth, so a clear, 360–degree view about the business is available to those making important decisions. This is where a solution is needed that enables data-driven decision making in a simple, easy, and quick way. The enterprise data warehouse not only integrates data from various sources but enables centralized analysis across the enterprise from multiple angles for more efficient business intelligence. It can store data from all departments and from multiple sources, such as (but not limited to) system information from applications, physical records, customer relationship management systems, and enterprise resource planning systems.

However, the goal of an enterprise data warehouse is seldom attained. While enterprise data warehouses are great at storing historical information, they are seldom able to load current transactions. This means that often, data virtualization is required to fulfill the 360 degree goal and attain real-time data.

It is also important here to make the distinction between a data warehouse and an enterprise data warehouse. While all data warehouses stay connected to raw data sources via integration tools on one end and an analytical interface on the other, an enterprise data warehouse has a much wider functionality and architectural diversity.

Functions and concepts of an enterprise data warehouse

Ultimate storage

An enterprise data warehouse serves as the ultimate unified storage or repository for all business and corporate data ever to occur during the existence of the business.

Reflects source data

As the enterprise data warehouse sources data from its original storage, such as Google Analytics or IoT devices, it must provide a uniformed likeness while reflecting original source. An enterprise data warehouse has the infrastructure to uniformize the data before storing it.

Structured data storage

The data stored in an enterprise data warehouse is always structured and standardized, enabling end users to query it via business intelligence interfaces and form reports.

Subject-oriented data

The main purpose of the enterprise data warehouse is to enable correlation of business data from different domains. Therefore, all data is structured around a data model, which is basically a specific subject, for example, region-wide sales numbers. Metadata is also added in detail to explain the origin of each and every piece of data.

Dependent on time

The data collected in the enterprise data warehouse is usually historical data as it refers to past business events. This is why most stored data is generally divided into time periods, to enable understanding of trends and patterns.

Non-volatile

Data is never deleted from the enterprise data warehouse once it is placed there. It can be modified, manipulated, or updated due to change in sources, but never erased, at least by end users. Of course, general revisions do occur once in a while to get rid of old and irrelevant data, but the general rule of no deletion applies. This is because, as stated above, the data is dependent on time for analytical purposes.

Types of data warehouses

What sets apart an enterprise data warehouse from other forms of warehouses is its array of functions, which in turn leaves a lot of room for discussion on its technical design. In terms of data storage and processing, the warehouses are specific and distinct for different kinds of businesses. How the system is set up in an enterprise will be dependent on factors such as amount of data, budget, analytical complexity required, and security. While the advent of cloud computing has enabled the creation of serverless, cloud-based data warehouses, traditionally, data warehouses were hosted on-premises.

Classic data warehouse

A classic enterprise data warehouse has its own dedicated hardware, software, and physical storage. In the case of physical storage there is no need to set up data integration tools for multiple datasets. The enterprise data warehouse negates this by being connected to data sources via application programming interfaces and constantly sources information, transforming it in the process. This means that all the work is either done when it is being transformed before being loaded into the warehouse via the staging area, or in the warehouse itself.

Benefits of the classic data warehouse include:

  • Processing of data and reporting is easier as it is on premises
  • Organization retains total control of the data warehouse
  • Data security is controlled in-house
  • Quick access to hardware and software layers, making it easier to troubleshoot issues
  • Warehouse is custom built to suit organizational requirements

Challenges of the classic data warehouse include:

  • Expensive installation costs for both hardware and software
  • Requires dedicated team to manage
  • Need to buy more capacity if hardware or server capacity is exceeded

Virtual data warehouse

Think of these as multiple datasets connected virtually, which can be queried as a single system for business intelligence. This approach allows the organization to manage its data in a simple manner, allowing the data to stay at its source but available to be pulled with the assistance of analytical tools.

Cloud data warehouse

A cloud-based data warehouse is a dataset stored as a managed service on cloud servers. For today’s agile and ‘at the fingertips’ business intelligence needs, cloud-based data warehouses are the norm, and have over the last decade become almost the standard for setting up technologies in an organization.

Enterprises today no longer have to purchase hardware, set aside space for server rooms, or maintain a dedicated team to run the data warehouse. Now, with a few clicks and a credit card, businesses can access unlimited computing power as well as storage space.

Advantages include:

  • No need to purchase hardware or server rooms specifically to manage the data warehouse
  • Easier to connect to other cloud-based tools with the same security requirements within the same cloud vendor
  • Cloud data warehouses have a dedicated, skilled team to ensure security of their product, although this may need to be augmented with internal enterprise security measures
  • Top cloud-based service providers are already compliant with data privacy requirements, ensuring the organization is compliant
  • Optimized for speed and performance, as they are agile and offer less time-to-solution
  • Offers the flexibility to meet the current and future needs of the organization

Challenges include:

  • Data security is managed by an external provider
  • Reliance on a service provider for fixing issues
  • Changing to a different platform can be challenging

Enterprise data warehouse architecture

There are many architectural approaches which may extend the capabilities of an enterprise data warehouse in various ways. There are three essential layers:

  1. Data sources - raw data layer
  2. The ecosystem of the warehouse
  3. The analytical tools or the user interfaces

Data extraction, transformation, and loading (more often referred to as ETL) into a data warehouse involves a separate set of tools. Under these different tools come the data integration tools which manipulate the data before it is placed in the data warehouse.

Because data will generally undergo some transformation when loaded, the enterprise data warehouse requires some functionalities for performing cleaning, standardization, and dimensionalization. These plus other salient factors determine the complexity of architecture required.

One-tier architecture

One-tier architecture is data warehousing in its most basic, primitive form. The dataset is directly connected to the analytical interface and the end user can directly query it. This direct, one-dimensional connection between the enterprise data warehouse and analytical tools brings some challenges with limited flexibility and capability. Additionally, the querying must be specific and precise as the reporting complexity is limited. Such architecture is slow and unpredictable thus rarely used for large-scale data platforms.

Two-tier architecture or a data mart layer

A data mart is a low-level data repository which contains information specific to the domain in question. Put more simply, it is a smaller data base included to extend the enterprise data warehouse with dedicated information related to specific areas. For example, the data mart layers could be the marketing or operational departments. The dart mart layer is added between the user interface and enterprise data warehouse in a two-tier architecture.

Such an approach makes querying easier as each department will be able to access data easily via its specified data mart. It also makes the enterprise data warehouse more secure as it limits the access for end-users to data only relevant to them. But this approach will also require additional resources for hardware and dataset integration.

Three-tier architecture – online analytical processing

This architecture comes into play when an online analytical processing cube (OLAP) is added on top of the data mart layer. An online analytical processing cube is a type of dataset which specifically represents data from multiple dimensions.

Components of an enterprise data warehouse

There are six main components to an enterprise data warehouse:

  1. The source, or the datasets where the raw data is stored.
  2. Extract, load and transform layer (ELT). As mentioned above, these tools perform the actual connections between the source data, its extraction and loading it to the place where transformation takes place. It is important to differentiate between the extraction, transformation and loading tool and the extract, load and transform layer. In the ELT approach, transformation of data happens in a staging area before being placed in the enterprise data warehouse, while the ELT is a more modern approach which handles transformation within the data warehouse itself.
  3. The staging area is where data is cleaned, transformed and given a data model. It may also include tools for quality management of the data.
  4. The data warehouse is where the data is finally loaded into the storage space, usually in its final model. The data warehouse will also include a data management system as well as storage for metadata.
  5. Metadata is data about data. These are explanations which give a clear picture and information to users about what the data pertains to. It could be technical metadata (data source) or business metadata (sales region).
  6. Reporting layers are tools that give access to end users. These are also called the business intelligence interfaces and serve as dashboards to visualize the data and form reports.

Benefits of a robust enterprise data warehouse

Robust, accurate analytics

Enterprise data warehouses enable more robust analytics by seamlessly integrating with analytics software, making dashboarding and reporting on an enterprise’s key performance indicators more specific due to the availability of a full range of data.

Easier predictions and comparisons

An enterprise data warehouse enables clarity and definition between various data points, enabling information to be analyzed via context, but affecting the business as whole. Simply put, it is simpler to predict how even the most minor of adjustments can affect the entire operation.

Enterprise data warehouses collect and store massive data from all areas of business, sorting and standardizing it for accurate comparison. This can enable the identification of key connections between projects.

Highly flexible, scalable structure

The structure of an enterprise data warehouse is flexible, and adjustments do not require complete overhauling of the system. Quality improvements can be implemented easily, and the data warehouse gets refined as the business grows.

A versatile, agile data storage solution

The planning to set up an enterprise data warehouse may take years of planning or testing. Despite this, understanding the chain of how the data passes along can be a big step toward figuring out what actually fits the data platforming requirements of your enterprise. The right enterprise data warehouse helps make sense of random data which come through various inputs into the organization and saves valuable time by automatically aggregating that information. Enterprises are better placed to grow when their data is organized automatically and in a systematic manner.

Despite these strengths, enterprise data warehouses do not contain or cannot load the data in a timely manner. Data virtualization is often implemented alongside enterprise data warehousing to help with this challenge.

Enterprise data warehouse diagram