藏器于身，待时而动: Designing the Optimal Metadata Tool

Published: October 1, 2004

Published in TDAN.com October 2004

Many government agencies and corporations are currently examining meta data tools in the marketplace to decide which of these tools, if any, meet the requirements for their meta data management solutions. Often times these same organizations want to know what types of functionality and features they should be looking for in this tool category. Unfortunately, this question becomes very complicated as each tool vendor has their own personalized “marketing spin” as to which functions and features are really the most advantageous. This leaves the consumer with a very difficult task, indeed especially when it seems like none of the vendors tools fully fit the requirements that your meta data management solution requires. At EWSolutions we have several clients that have these exact same concerns about the tools in the market.

Although I have no plans on starting a software company, I would like to take this opportunity to play software designer, and present my optimal meta data tool’s key functionality.

One of the challenges with this exercise is that meta data functionality has a great deal of depth and breath. Therefore, in order to properly categorize our tool’s functionality, I will use the six major components of a managed meta data environment (MME):

Meta Data Sourcing & Meta Data Integration Layers
Meta Data Repository
Meta Data Management Layer
Meta Data Marts
Meta Data Delivery Layer

I will now walk through each of these MME components and describe the key functionality that my optimal meta data tool would contain.

Meta Data Sourcing & Meta Data Integration Layers

For simplicity sake I will be discussing this “dream” tool’s functionality for both the meta data sourcing and the meta data integration layers together. The goal of the meta data sourcing and integration layers is to extract the meta data from its source, integrate it where necessary, and to bring it into the Meta Data Repository.

Platform Flexibility

It is important for the meta data sourcing technology to be able to work on mainframe applications, distributed systems and from files (databases, files, spreadsheets, etc.) off of a network. These functions would have to be able to run on each of these environments so that the meta data could be brought into the repository. I did not include AS 400 environments in my list of platforms because of its fairly sparse use; however, if your information technology (IT) shop’s preferred application platform is AS 400, clearly your optimal meta data tool would work on that platform.

Prebuilt Bridges

Many of the current meta data integration tools come with a series of prebuilt meta data integration bridges. The optimal meta data tool would also have these prebuilt bridges. Where our optimal tool would differ from the vendor tools is that this tool would have bridges to all of the major relational database management systems (e.g. Oracle, DB2, SQL Server, Informix, Sybase and Teradata), the most common vendor packages (e.g. Siebel, SAP, PeopleSoft, Oracle, etc.), several code parsers (COBOL, JCL, C+, SQL, XML, etc.), key data modeling tools (ERWin, Designer, Rational Rose, etc.), top ETL (extraction, transformation and load) tools (e.g. Informatica, Ascential) and the major front-end tools (e.g. Business Objects, Cognos, Hyperion, etc.),

As much as is possible I would want my meta data tool to use utilize XML (extensible markup language) as the transport mechanism for the meta data. While XML cannot directly interface with all meta data sources, it would cover a great number of them.

These meta data bridges would not just bring meta data from its source and load it into the repository. These bridges would be bi-directional and allow meta data to be extracted from the meta data repository and brought back into the tool.

Lastly, these meta data bridges wouldn’t just be extraction processes, but also have the ability to act as “pointers” to were the meta data is located. This distributed meta data capability is very important for a repository to have.

Error Checking & Restart

Any high quality meta data tool would have an extensive error checking capability built into the sourcing and integration layers. Meta data in a MME, like data in a data warehouse, must be of high quality or it will have little value. This error checking facility would check the meta data which it is reading and would check it for errors and then capture any statistics on the errors that the process is experiencing (meta meta data). In addition, the tool would have error levels of the meta data. For example it would give the tool administrator the ability to configure the actions based on the error that occurred in the process. For example, should the meta data be 1) flagged with an informational/error message; or 2) flagged as an error and then not loaded into the repository; or 3) flagged an a critical error and the entire meta data integration process is stopped.

Also this process would have “check points” that would allow the tool administrator to restart the process. These check points would be placed in the proper locations to ensure that the process could be restarted with the least degree of impact on the meta data itself and on its sourcing locations.

Meta Data Repository

The meta data repository component is the physical database which is persistently cataloging and storing the actual meta data. The repository, and its corresponding meta model comprise the backbone of the MME. Therefore, in listing out the optimal meta data tool’s functionality I will pay special attention to the design and implementation of the meta model.

Meta Model Design

A meta model is a physical database schema for meta data. Anytime an MME is being implemented there are integration processes that need to be custom built in order to bring meta data into the repository. Therefore, a good meta model needs to be understandable to the repository developers working with it. As a result, the meta model should not be designed in a highly abstracted, object-oriented manner. Instead mixing classic relational modeling with structured object-oriented design is the preferable approach to designing a meta model. On the other hand, when highly cryptic (abstracted) object-oriented design is used for the construction of the meta model, it becomes unwieldy and difficult for the IT developers to work with.

The possible exception to this guideline would be if the abstracted object-oriented model has relational views built on the model that would allow for read/write/update capabilities. These views must be understandable and fully extendible.

Meta Model Implementation

The meta data repository must not be housed in a proprietary database management system. Instead it should be stored on any of the major open relational database platforms (e.g. SQL Server, Oracle, DB2, Informix, Teradata, Sybase) so that standard SQL can be used with the repository.

Semantic Taxonomy

Many government agencies and large corporations IT departments are looking to define an enterprise level classification/definition scheme for their data. This semantic taxonomy would then provide these organizations with the ability to classify their data, in order to identify data and process redundancies in their IT environment. Therefore, the optimal meta data tool would provide the capabilities to capture, maintain and publish a semantic taxonomy for the meta data in the repository.

Meta Data Management Layer

The purpose of the meta data management layer is to provide the systematic management of the meta data repository and the other MME components. This layer includes many functions, including (see Figure 1: Meta Data Management Layer):

Archiving - of the meta data within the repository
Backup - of the meta data on a scheduled basis
Database Modifications - allows for the extending of the repository
Database Tuning - is the classic tuning of the database for the meta model
Environment Management - is the processes that allow the repository administrator to manage and migrate between the different versions/installs of the meta data repository
Job scheduling - would manage both the event-based and trigger-based meta data integration processes
Purging - should handle the definition of the criteria required to define the MME purging requirements
Recovery - process would be tightly tied into the backup and archiving facilities of repository
Security Processes - would provide the functionality to define security restrictions from an individual and group perspective
Versioning - meta data is historical, so this tool would need to version the meta data by date/time of entry into the MME

Figure 1: Meta Data Management Layer

The optimal meta data tool would also have very good documentation on all of its components, processes and functions. Interestingly enough too many of the current meta data vendors neglect to provide good documentation with their tools. If a company wants to be taken seriously in the meta data arena they must "eat their own dog food".

Meta Data Delivery Layer

The meta data delivery layer is responsible for the delivery of the meta data from the repository to the end users and to any applications or tools that require meta data feeds to them.

Web Enabled

A java based, web-enabled, thin-client front-end has become a standard in the industry on how to present information to the end user and it certainly is the best approach for an MME. This architecture provides the greatest degree of flexibility, lower TCO (total cost of ownership) for implementation and the web browser paradigm is widely understood by most end users within an organization.

This web enabled front-end would be fully and completely configurable. For example, I may want options that my users could select or I may want to put my company's logo in the upper right hand corner of the end user screen.

Pre-Built Reports

Impact analysis reports are technical meta data driven reports that help an IT department assess the impact of a potential change to their IT applications (see Figure 1: "Impact Analysis: Column Analysis for a Bank" for an example). Impact analysis can come in an almost infinite number of variations, certainly the optimum meta data tool would provide dozens of these type of reports pre-built and completely configurable. Also the tool would be able to push" these pre-built reports and any custom built reports to specific users or groups of users desktops, or even to their email address. These pushed reports could be configured to be released based on an event trigger or on a scheduled basis.

Figure 2: Impact Analysis: Column Analysis for a Bank

Website Meta Data Entry

Most enterprise meta data repositories provide their business users a web-based front-end so that the data stewards can enter meta data directly into the repository. This front-end capability would be fully integrated into the MME and it would be able to write back to the meta data repository. In addition, not only would this entry point allow meta data to be written to the repository, it would also allow for relationship constraints and drop-down boxes to be fully integrated into the end user front-end. Moreover many of these business meta data related entry/update screens would be pre-built and fully configurable to allow the repository administrator to modify them as required. The ability to use the web front-end to write back to the repository is a feature that is lacking in many of today's meta data tools.

Publish Graphics

The optimal meta data tool would also have the ability to publish graphics to its web front-end. The users would then be able to click on the meta data attributes within these graphics for meta data drill-down, drill-up, drill-through and drill-across. For example, a physical data model could be published to the website. As an IT developer looks at this data model they would have the ability to click on any of the columns within the physical model to look at the meta data associated with it. This is another weakness in many of the major meta data tools on the market.

Meta Data Marts

A meta data mart is a database structure, usually sourced from a meta data repository, that is designed for a homogenous meta data user group (see Figure 2: "Meta Data Marts"). "Homogenous meta data user group" is a fancy term for a group of users with like needs

Figure 3: Meta Data Marts

This tool would come with pre-build meta data marts for a few of the more complex and resource intensive impact analysis. In addition, we would have meta data marts for each of the significant industry standards like Common Warehouse Meta Model (CWM), Dublin Core and ISO 11179.

藏器于身，待时而动

2011年10月13日星期四

Designing the Optimal Metadata Tool