Data Warehouse – Effective Practices



Effective Practices

Effective practices are enablers, which can improve performance, data availability, environment stability, resource consumption, and data accuracy.

Use of an Enterprise Scheduler

The scheduling service in InfoSphere information Server (IIS) leverages the operating system (OS) scheduler, the common enterprise scheduler can provide these capabilities beyond those of a common OS scheduler:

  • Centralized control, monitoring, and maintenance of job stream processes
  • Improved insight into and control of cycle processes
  • Improved intervention capabilities, including alerts, job stream suspension, auto-restarts, and upstream/downstream dependency monitoring
  • Reduced time-to-recovery and increased flexibility in recovery options
  • Improved ability to monitor and alert for mission critical process that may be delayed or failing
  • Improved ability to automate disparate process requirements within and across systems
  • Improved load balancing to optimize use of resources or to compensate for loss of a given resource
  • Improved scalability and adaptability to infrastructure or application environment changes

Use of data Source Timestamps

When they exist or can be added to data, ‘created’ and ‘last updated’ timestamps can greatly reduce the impact of Change Data Capture (CDC) operations.  Especially, if the data warehouse, data model and load process store that last success run time of CDC jobs. This reduces the number of rows required to be processed and reduces the load on the RDBMS and/or ETL application server.  Leveraging ‘created’ and ‘last updated’ can, also, greatly reduce processing time required to perform the same CDC processes.

Event Based Scheduling

Event based scheduling, when coupled with an Enterprise scheduler, can increase data availability, distribute work opportunistically. Event based scheduling can allow all or part of a process stream to begin as soon as predecessor data sources have completed the requisite processes.  This can allow processes to begin soon as possible, which can reduce resource bottlenecks and contention. This, potentially, allows data to be made available earlier than a static time based schedule.  Event based scheduling can also delay processing, should the source system requisite processing completion be delayed; thereby, improving data accuracy in the receiving system.

Integrated RDBMS Maintenance

Integrating RDBMS Maintenance into the process job stream can perform on demand optimization as the processes move through their flow, improving performance.  Items such as indexing, distribution, and grooming, maintenance at key points ensures that the data structures are optimized for follow on processes to consume.

Application Server and Storage  Space Monitoring and Maintenance

Monitoring and actively clearing disk space can not only improves overall performance, and reduce costs, but it also improves application stability.

Data Retention Strategies

Data Retention strategies, an often overlooked form of data maintenance, which deals with establishing policies ensure only truly necessary data is kept and that information by essential category, which is no longer necessary is purged to limit legal liability, limit data growth, storage costs, and improve RDBMS performance.

Use Standard Practices

Use of standard practices both, application and industry, allows experienced resources to more readily understand the major application activities, their relationships, dependency, design and code.  This facilitates resourcing and support over the life cycle of the application.