Building the Textual Data Warehouse

by Bill Inmon download a PDF brochure

Description

For years corporate decisions have been made on the basis of the data found in transaction based systems. Transaction oriented data fits well with standard database management systems because database management systems structure data in a repetitive manner, where each occurrence of data has the same structure as each other occurrence of data in a table. But there is another viable and important source of data in the corporation. That source of data is the information found in the form of text. There are many forms of text in the corporation – emails, spreadsheets, contracts, warranties, medical and healthcare information, and so forth. Because text is not repetitive it does not fit easily and well with standard database management systems. But now there is textual ETL and the ability to build databases and data warehouses that contain textual information. When textual data is able to be transformed so that the text fits inside a standard database management system, whole new opportunities for analysis and decision making are created.
This two day lecture/workshop is about what is required to create the textual, unstructured data warehouse. The first day is lecture and the second day is a hands on workshop.

Main Topics

An Introduction To Unstructured Data
Issues of Textual Integration
Forms of Text
Spreadsheets
Diverse Indexes

On day 2 Textual ETL will be run producing a wide variety of data bases/data warehouses using many of the features of Textual ETL. The attendees will observe and participate in the transformation of text into a data base ready for analytic processing.
The workshop begins by examining some textual data. A strategy for capturing and organizing the text is discussed. Then the workshop continues with several types of processing that are done dynamically, under the purview of the attendees. Some of the types of processing that are done include:

document metadata capture
document fracturing
named value indexing
simple indexing
semistructured indexing
merged indexing

Depending on the textual data that has been selected, some or all of these kinds of indexes will be chosen and created.