Data quality management, though perceived to be relevant and important at an enterprise level, is in reality mostly implemented in silos. The data quality rules are essentially defined and executed at application level against corresponding database schemas and file structures. The results are presented and analysed only in the context of specific application.
This siloed approach prevents organizations from getting an enterprise view of data quality for business-critical data elements, and results in missing out on the opportunity to meet the regulatory requirements of certifying data flow through various systems.
Described below is an approach that better integrates the data quality initiative with enterprise metadata management to enable an enterprise-wide view and control of data quality.
Metadata is generally and very commonly defined as data about data. Enterprise metadata management, however, provides a framework to acquire and integrate metadata about a variety of domains of information in an enterprise – including business, semantic, logical, physical, and organizational – to enable effective understanding and governance of business-critical data. Two vital aspects of this framework are traceability and lineage.
Traceability, in simple terms, is the linking of business metadata to technical metadata. Business metadata can be represented at enterprise, business unit or application level. It describes information about data that is important to the organization’s business. Business metadata is more descriptive, less structured and less detailed. The most important criteria for business metadata is that it should be understandable by business people and hence, devoid of technical jargon and details. Typically, it comprises of the business friendly name, definition, who owns it, who maintains it, what business processes it is used in and in what way.
Technical metadata, on the other hand, describes data that is defined, stored and operated by the actual technical implementations of the various business processes. Technical metadata is a broad category that includes logical, design and physical levels of information. It describes data structures as well as their relationships. This could be at a logical or design level in terms of data models (entity relationship/object modeling) as well as actual implementation level, as in database catalog information and record structures.
Adaptive Metadata Manager, now in its Version 8.0 form, enables bridging and linking this technical metadata with the corresponding business metadata through various levels of traceability linking. A business term could be linked to the corresponding logical model entity or attribute, which in turn will be linked to the physical model table or column that finally is linked to the actual database table or column. Thereby, a business person can start at the business term and traverse this traceability link to find out all the physical implementations of that business term in various systems/applications.
Similarly, a technical person can start at the technical column definition and traverse this traceability link all the way to the business definition to understand the business impact and relevance of that data item.
Traceability can be viewed like a pyramid where an enterprise-scoped business term at the top can be linked to its various technical implementations in different IT applications. Establishing such traceability enables implementing many metadata solutions such as data governance, data quality management, data conformance, change impact analysis, etc.
Data Lineage describes the journey of data through various stages of processing. Although the most common representation of data lineage is at the technical implementation level, it also applies at the business level.
At the technical level, it allows users to understand how a particular field on a report was derived, whilst also providing information about the origin of that data in the transaction system where it may have been created by a POS (point of sale) system, an operator or customer using a front end application, as well as any further processing it underwent to finally appear on this report. That processing could be within a single application or across various applications.
Together, the traceability and lineage paints the big picture of information present throughout the organization.
Integration of the data quality initiative with this framework of traceability and lineage can enable organizations to progress from siloed implementations to true enterprise-wide data quality management.
Data quality management involves the defining of the data quality rules for the critical data elements in the organization, executing these rules to discover the quality of data in different applications, and cleansing and correcting bad data.
Data quality rule definition is largely based on seven key dimensions:
Additionally, there can be specific rules defined for data elements that are essentially based on the business requirements or constraints around those elements. These rules can be defined at a business level by associating to the right level of business term. Depending on the level, the rules can be more generic and broadly defined (at enterprise level) or more specific (BU or application level).
Adaptive, through its extensive metamodel coverage, enables these rules to be defined at a more descriptive level or in a more structured way using DMN (decision model and notation) and associate to business terms as well as other technical metadata objects such as data model attributes or database columns.
The traceability links can then be used to apply (directly and indirectly) and enforce these rules at the corresponding technical implementations of those business terms.
The rules can be further modified to make them more specific at the technical level as appropriate by the technical teams. This enables and allows the data governance and conformance of critical data elements from data quality rules perspective from a top business level to the more technical implementations.
The next level of integration involves integrating with the data quality product by importing into Adaptive’s Metadata Manager the results of data quality rule execution.
Adaptive Metadata Manager’s extensive metamodel coverage, flexible standards-based import mechanism and configurable presentation layer, can be effectively used to capture the rule execution results and associate those with the corresponding technical metadata (database columns or file structures data elements) that the rules have been executed against.
The assessment metamodel in Adaptive provides effective structures for representing rule results and associates with the corresponding database and file structure metadata. It also enables storing and presenting historical data for easy trend analysis and comparisons.
With the technical metadata enhanced by associating the actual rule execution results, a powerful framework for assessing the overall data quality for critical data elements is provided.
With the traceability links established and the rule execution results associated at the technical level, any stakeholder can easily monitor the quality of data for any critical data elements in the various applications it is implemented and referenced in. All that has to be done is to start at the corresponding business term and traverse down the traceability to the corresponding technical implementation to find out the most recent as well as historical data quality measures.
This same navigation (Adaptive term for traversing metamodel relationship) can be utilized to generate a report or present the data quality measures at the business term level. Adaptive also provides the dashboard style homepage that can be easily configured to monitor data quality for highly-critical data elements.
Another powerful utilization of this integrated data quality measurement is to superimpose it on a typical data lineage report or diagram. This will allow the users to not only understand how the data flows through the various systems but also to find out the data quality measurements at all of those stages in the lineage path. This can be used for regulatory reporting to show the correctness of data appearing on the reports.
Adaptive Metadata Manager also enables the defining of the data quality rules in a more structured way by using the DMN metamodel. The structured modeling of the rules not only allows for better documenting and understanding of the business and technical metadata, but also allows possible generation of rule executable syntax. The structured representation of rules using standard operators and constructs allows generation of executable code that could be a form of open standards like SQL/SPARQL queries or proprietary formats defined by the corresponding data quality products.
To summarize, Adaptive provides a powerful framework for effective integration of data quality with the metadata management solution enabling enterprise-wide data quality monitoring and control.