|
Rethinking
Data Quality for the Internet Age
As published at ITtoolbox by Paul Keiser. Copyright ©
2000.
Y2K
is quickly becoming a fading institutional memory. IT budgets
already have been redeployed and boosted to support new
initiatives designed to catapult the enterprise into the
brave new worlds of Internet e-commerce, enterprise resource
planning (ERP), enterprise application integration (EAI),
supply chain management and customer relationship management
(CRM) – and brave new worlds they are indeed.
Analysts
predict explosive growth in these segments. International
Data Corporation predicts e-commerce applications will leap
from $1.7 billion in 1999 to $4.2 billion in 2000. By 2003
they foresee a $13.1 billion market. AMR Research forecasts
the ERP/EAI markets will grow from $20.2 billion in 1999
to $27.7 billion in 2000; the CRM market will climb from
$3.7 billion in 1999 to $5.4 billion in 2000; while supply
chain management revenue will rise from $3.9 billion to
$5.8 billion.
However,
many of these new initiatives will cost too much, not meet
their lofty objectives or completely fail if the enterprise
employs traditional approaches to data quality. The Internet
age requires nothing less than a fundamental rethinking
of what data quality comprises and how it should be applied
within an interconnected enterprise and among its trading
partners.
Data
Quality and Data Integration Must Merge
Currently, the world separates data quality and data integration
into two buckets. They are each treated as events. There
is little in the way of common "glue" that holds
the two together, but that is exactly what they need if
the burgeoning use of the Internet to facilitate e-commerce
and trading partner relationships is to thrive. A new framework
for data quality and data integration is needed within the
IT architectures of budding dot-com companies and established
brick-and-mortar enterprises – one that is cheaper
to deploy, easier to implement and less costly to maintain.
A
data quality framework for the Internet age should encompass
the active control and management of both data definition
(format and business rules) and integration/migration behavior
(where I need to get it and where I want it to go) both
within the enterprise and among its disparate trading partners
regardless of data source or content.
Traditional
approaches and implementations of data quality are inadequate
to meet the growing needs of the Internet age enterprise.
A quick review of how we got where we are today can serve
as a springboard to a new framework for data quality.
Many
pioneering data warehouse projects failed to meet expectations
because the importance of data quality was treated as an
afterthought or, even worse, not understood. When data quality
was better understood, various first-generation point solutions
were integrated into the warehouse architecture to cleanse
and household customer-centric data.
The
next series of milestones transpired when the enterprise
began to build subject-specific data marts at significantly
lower cost. A proliferation of data mart projects ensued.
However, users quickly realized that data quality varied
in each, data could not be easily shared and, even more
dangerous, the business rules applied to define and transform
the data were different.
Experts
now believe the answer lies in opening disparate meta data
repository silos so interconnected systems can "talk"
to each other. But talking and understanding are two different
things. Open meta data standards represent a significant
first step in allowing multiple systems to share data and
begin making it understandable to users.
But
that is not enough in the Internet era. The enterprise requires
a new framework for implementing data quality in a distributed
environment where data is both internal and external to
the enterprise and must move seamlessly back and forth among
disparate systems.
As
e-commerce applications proliferate, the enterprise and
data users require a data quality framework that can:
Treat
data quality as a process, not an event.
Link business rules with the data itself.
Transform data – along with applied business rules
– directly on multiple source platforms prior to extraction.
Link and integrate data from disparate source systems on
the fly with no loss of data quality.
Incorporate trading or supply chain partner data without
major data quality reengineering on either the source or
target systems.
Plug in valuable internally developed or third-party data
quality, data transformation or data integration/migration
applications to execute a fully integrated process.
An
Internet age data quality framework must address these six
elements. All are required to handle complex e-commerce,
EAI, supply chain and CRM applications. The following lays
out a data quality framework that is cost-effective and
delivers data quality for the Internet age. Let’s
examine each of these elements in more detail.
Treat
Data Quality as a Process, not an Event
Users
must put together a series of point products to achieve
their data quality goals. Most often these solutions are
not integrated. That is because data quality is usually
treated as an event. It is a box on a logical model. Users
are forced to daisy chain together various software tools
and applications to achieve their desired outcome. This
often requires complex integration, which drives up costs
and delays delivery of a solution.
The
next generation data quality framework treats data quality
as part of a seamless process, not an event. A piece of
data may need to be treated differently at various points
in a process and should be able to transform based on the
business rules applied to it. Managing data quality through
a typical e-commerce or supply chain process continuum requires
that data be handled differently at various points in the
process. Instead of just a one-to-one or bi-directional
process, Internet age data quality must be flexible and
accurate enough to fulfill the needs of one- to-many or
many-to-many e-business processes.
Link
Business Rules with the Data Itself
Today,
valuable business intelligence is missing because business
rules are either lost or not easily accessible to users.
Issues surrounding meta data repository silos are well documented
and standards bodies are working on various open solutions.
XML holds great promise in breaking down these silos. Without
this common linkage and access to knowledge, users remain
hamstrung in their efforts to put together comprehensive
data quality and business intelligence solutions.
The
next generation data quality framework will contain a meta
data repository that links data formats with business rules.
The repository should be XML-ready and open and accessible
to authorized users so the knowledge it contains can be
shared throughout the enterprise. Authorized trading and
supply chain partners should also be able to easily share
data through their virtual private networks or extranet
communications infrastructures.
Transform
Data – Along with Business Rules – for Data
Quality Directly on Multiple Source Platforms Prior to Extraction
First
and second-generation data quality solutions primarily rely
on ETL tools to bring data from disparate source systems
to a single server for processing. That environment requires
sophisticated ETL functionality to pull data from multiple
operating environments, apply transformation rules and then
feed the data into the data quality application for additional
correction and transformation. The application or ETL tool
then loads the data into the target system.
The
next generation data quality framework will deal with the
data directly on the source system itself. In e-commerce
and supply chain applications, a dot-com or brick-and-mortar
company may have dozens, or even hundreds, of partners participating
in a process. Business rules and data formatting concerns
must be dealt with on the source system itself, or the production
environment will have to be over-engineered to avoid slow
response times.
Link
and Integrate Data from Disparate Source Systems with No
Loss of Data Quality
It
is difficult to bring data together from disparate systems
and apply universal data quality business rules without
bringing the data to a single platform. This often creates
situations where trading partners are asked – or even
forced – to adhere to rigid data standards such as
EDI. The expense incurred to adopt these standards is often
considerable and beyond the reach of many smaller trading
partners.
The
next generation data quality framework will bring data together
from any source platform, be it UNIX, NT or mainframe legacy
environments such as MVS or OS/390. Using a common messaging
infrastructure, the data needed to drive various e-commerce,
EAI, supply chain and CRM applications will be seamlessly
transported from its originating source system and integrated
into the appropriate target system. Trading partners of
any size can easily participate in trading communities since
communications and data quality between them are based on
open standards, and data is automatically transformed into
the appropriate format for each trading partner.
Incorporate
Trading or Supply Chain Partner Data Without Major Data
Quality Reengineering on Either the Source or Target Systems
E-commerce,
EAI, CRM and supply chain applications require extensive
data reengineering to work effectively. The enterprise goes
through a great deal of time and expense to develop a viable
architecture. Typically these solutions are not flexible
and don’t adapt well to a changing environment.
A
next generation data quality framework will provide a data
discovery tool box for users that will make it significantly
easier to understand, manage and control both their own
and trading partner data. With that knowledge, business
rules can be written that transform the data and make it
behave in the ways demanded by the application. These data
discovery tool boxes will analyze data patterns and use
fuzzy logic or artificial intelligence to prepare data prior
to its use in a data quality, e-commerce, EAI, supply chain
or CRM application.
Plug
in Valuable Internally Developed or Third-Party Data Quality,
Data Transformation or Data Integration/Migration Applications
to Run a Fully Integrated Process
Organizations
have had to develop workaround solutions to achieve their
data quality goals. These processes are difficult to maintain,
difficult to manage and take valuable IT time away from
more important projects.
The
next generation data quality framework will be easier to
use and require virtually no programming expertise. It will
be based on object-oriented technology allowing users to
drag and drop icons representing various data elements or
processes they wish to use in their application. The framework
will be open and permit users to integrate their own internally
generated processes or plug in third-party applications
for which they are already trained and fully invested.
The
Internet is changing the rules for everyone. Vendors are
scrambling to redefine and reposition their products for
the Internet age. In this dynamic environment, a renewed
look at data quality and its importance and relationship
to data integration is warranted. The need to move data
with high integrity between and among various trading partners
demands a fundamentally new approach – one that seamlessly
links data quality and data integration, costs less, takes
less time to implement, is easy to change on the fly and
costs less to maintain.
Paul
D. Keiser is chief marketing officer for Paladyne Corporation.
Based in Orlando, Florida, Paladyne has introduced its breakthrough
software, the Datagration e-Business Suite. Datagration
is a next generation seamless framework that addresses enterprise
needs for data quality and data integration between and
among disparate systems to enable the rapid implementation
of e-commerce, enterprise application integration, supply
chain management and customer relationship management applications.
Keiser can be reached at pkeiser@paladyne.com.
|