Smile Iberia, O... / Libros blancos / Business Intell... / ETL principles
We use a range of ETL “Extract, Transform, Load” tools to feed the datawarehouse. As the name indicates, these tools allow to extract data from different sources, to transform this data (format, name), and to load this data on the target database, here the datawarehouse.
The transformations entrusted to an ETL are often basic, but can in some cases include the procedural processes of specific programs.
With an ETL we avoid creating repetitive batch programs, often similar, the maintenance of which must also be ensured. The principle is that the integration of a new data flow does not require any development, and is operated by simple interactive configuration: we choose the data elements in the source repository, we specify the basic transformations to be carried out, and we indicate the data‟s destination in the datawarehouse.
The ETL can manage different types of data sources, both at entry and exit point, the main ones being RDBMS and XML flows, but fixed format files or with comma-separated value (CSV) files are also managed.
Once an ETL flow has been defined, it is generally triggered on a regular basis; this is controlled by a task planning or scheduling tool.
An ETL generally treats point-to-point flows, i.e. from a single source to a single destination.
ETLs are designed to function in deferred-processing mode, most often at night. A datawarehouse saves a succession of “photos” of the activity of the company; the ETL allows to feed the datawarehouse with an additional “photo”. Once the feed is complete, the loaded data becomes statistics which are available to users.