What is data virtualization?

Data virtualization software acts as a bridge across multiple, diverse data sources, bringing critical decision-making data together in one virtual place to fuel analytics.

Data virtualization provides a modern data layer that enables users to access, combine, transform, and deliver datasets with breakthrough speed and cost-effectiveness. Data virtualization technology gives users fast access to data housed throughout the enterprise—including in traditional databases, big data sources, and cloud and IoT systems—at a fraction of physical warehousing and extract/transform/load (ETL) time and cost.

With data virtualization, users can apply a range of analytics—including visualized, predictive, and streaming analytics—on fresh, up-to-the-minute data updates. Through integrated governance and security, data virtualization users are assured their data is consistent, high quality, and protected. Additionally, data virtualization allows for more business-friendly data, transforming native IT structures and syntax into easy-to-understand, IT-curated data services that are easy to find and use via a self-service business directory.

Data virtualization supports multiple lines of business, hundreds of projects, and thousands of users that can increase from project to enterprise scale.

Common data sources virtualized through data virtualization software

Packaged apps
RDBMS
Excel & flat files
Data warehouses
Data lakes
Big data
XML docs
Cloud data
Web services
IoT data

Common systems used with data virtualization

Oracle
SQL Server
Teradata
Netezza
DB2
Hive
Impala
Sharepoint
Excel
Flat files
Amazon Redshift
Google Big Query
Spark
Drill
REST
OData

Top 4 capabilities that a data virtualization system should have

Four components are needed to meet urgent business needs with data virtualization

Agile design and development: You need to be able to introspect available data, discover hidden relationships, model individual views/services, validate views/services, and modify as required. These capabilities automate difficult work, improve time to solution, and increase object reuse.

High-performance runtime: The application invokes a request, the optimized query executes a single statement, and the result is delivered in proper form. This capability allows for up-to-the-minute data, optimized performance, and less replication.

Use of caching when appropriate: Caching essential data, the application invokes a request, an optimized query (leveraging cached data) executes, and data is delivered in the proper form. This capability boosts performance, avoids network constraints, and allows 24x7 availability.

Business directory/catalog to make data easy to find: This capability includes features for search and data categorization, browsing all available data, selecting from a directory of views, and collaborating with IT to improve data quality and usefulness. This capability empowers business users with more data, improves IT/business user effectiveness, and enables data virtualization to be more widely adopted.

What are some data virtualization use cases?

Analytics use cases
- Physical data integration prototyping
- Data access/semantic layer for analytics
- Logical data warehouse
- Data preparation
- Regulatory constraints on moving data
Operational use cases
- Abstract data access layer/virtual operational data store (ODS)
- Registry-style master data management
- Legacy system migration
- Application data access
- Regulatory constraints on moving data
Emerging use cases
- Cloud data sharing
- Edge data access in IoT integration
- Data hub enablement
- Data and content integration
- Regulatory constraints on moving data

The benefits of data virtualization

Business value acceleration: Analytics applications can be applied sooner and greater value can be attained faster as changes occur

Business insight improvement: More complete, up-to-the-minute, easy to access and understand data, requiring less effort than ETL

Development cost avoidance: Reusable data services and interactive development and validation improves quality and avoids rework for new projects

Data management infrastructure cost reduction: Lower infrastructure costs, and fewer licenses to buy and depreciate, result in lower support and maintenance costs

How do various industry sectors use data virtualization?

Communications & Technology
- Differentiating market research services
- Increasing revenue per customer
- Building a virtual customer data lake
- Enabling leading-edge innovation
- Creating a real-time ODS for billing and marketing
- Optimizing customer care
- Managing customer entitlements
- Improving customer insights
Energy
- Optimizing upstream energy production
- Improving well maintenance and repair
- Analyzing offshore platform data
- Optimizing cross-refinery processes
- Providing SAP master data quality
Financial Services
- Managing fixed-risk income
- Improving trade reconciliation
- Accelerating new client onboarding
- Addressing mortgage data complexity
- Enriching cash management clients
- Empowering data democracy
Government
- Protecting the environment
Healthcare
- Driving new product innovation
- Accelerating M&A synergies
- Providing more efficient claims analysis
- Improving patient care
Manufacturing
- Optimizing a global supply chain
- Optimizing factories and logistics
- Differentiating via digitization
- Improving IT asset utilization

Getting started with data virtualization

The highest value implementation of data virtualization is a high-speed, virtualized data layer. Such a layer allows for robust management and governance, while also delivering self-service access to critical data, organizing it for scale, and making it available in a cost-effective manner to applications and analytics systems.

However, most data virtualization implementations start small and expand. A common way of starting is with a small and focused team charged with one or more projects. A small team can be versatile while also accepting some uncertainty. (Teams must be agile to move fast and complete several iterations of data projects.)

The next step is to deliver project datasets as the data layer is being built. This step addresses several data challenges including evolving requirements, multiple sources, mixed data types, up-to-the-minute data, data outside of the data warehouse, data too large to physically integrate, and data outside the firewall.

Teams also need to prioritize their data virtualization projects based on business value and ease of data virtualization implementation. The greater the business value and implementation ease, the higher the project’s priority. Data virtualization, and the people who implement it, also need to evolve to reuse various data services in the application layer, business layer, and source layer.