Data Modeling – Fact Table Effective Practices

Database Table
Database Table

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Related References

Data Modeling – Dimension Table Effective Practices

Database Table
Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

Software Development Life Cycle – What is RAD?

Acronyms, Abbreviations, Terms, And Definitions
Acronyms, Abbreviations, Terms, And Definitions

 

What is RAD?

Rapid Application Development (RAD) is a type of incremental software development methodology, which emphasizes rapid prototyping and iterative delivery, rather than planning. In RAD model the components or major functions are developed in parallel as if they were small relatively independent projects, until integration.

RAD projects are iterative and incremental

RAD projects follow the SDLC iterative and incremental model:

  • During which more than one iteration of the software development cycle may be in progress at the same time
  • In RAD model the functional application modules are developed in parallel, as prototypes, and are integrated to complete the product for faster product delivery.
  • RAD teams are small and comprised of developers, domain experts, customer representatives and other information technology resources working progressively on their component and/or prototype.

InfoSphere DataStage – DataStage Parallel Job Peer Code Review Checklist Template

SDLC Development Phase
SDLC Development Phase

Peer code review happens during the development phase and focuses on the overall quality and compliance to standards of code and configuration artifacts. However, the hard part of performing a Peer code review isn’t, performing the review, but rather to achieving consistency and thoroughness in the review.   This is where a checklist can contribute significantly, providing a list of things to check and providing a relative weight for the findings.  I hope this template assists with your DataStage job review process.

What is a Peer Review?

Acronyms, Abbreviations, Terms, And Definitions
Acronyms, Abbreviations, Terms, And Definitions

A peer review is an examination of a Software Development Life Cycle (SDLC) work product by team members, other than the work Product’s author, to identify defects, omissions, and compliance to standards.  This process provides an opportunity for quality assurance, knowledge sharing, and product improvement early during the SDLC life cycle.

What is a software development baseline?

Acronyms, Abbreviations, Terms, And Definitions
Acronyms, Abbreviations, Terms, And Definitions

What, exactly, the definition of a baseline it depends on your role and perspective on the SDLC (Software Development Life Cycle) process.   The baseline concept plays in many aspects of SDLC execution, including project management, configuration management, and others.  Additionally, the baseline concept and practice is applicable to all the SLDC methodologies, including, but not limited to the Agile Model, Waterfall Model, Iterative Model, Spiral Model, and V-Model.

Baseline Definition

A baseline is a reference point in the software development life cycle marked by the completion and formal approval of a set of predefined work products for phase completion. The objective of a baseline is to reduce a project’s vulnerability to uncontrolled change and to provide a point in time set of artifacts for references and recovery, if necessary.    Baselining an artifact (requirements specification matrix, design, code, data model, etc.) moves it into formal change control (usually, in one or more repository tools) at milestone achievement points in the development life cycle. Baselines are also used to identify the essential software, hardware, and configuration assembly components that make up a specific release of a system.

Common Information Integration Testing Phases

Over the years I have seen a lot of patterns for Information integration testing process and these patterns will not be an exhaustive list of patterns a consultant will encounter over the course of a career.

However, the two most common patterns in the testing process are:

The Three Test Phase Pattern

In the three test phase pattern, normally, the environment and testing activities of SIT and SWIT are combined.

The Three Test Phase Pattern
The Three Test Phase Pattern

The Four Test Phase Pattern

In the four test phase pattern, normally, the environment and testing activities of SIT and SWIT are performed separately and, frequently, will have separate environments in the migration path.

The Four Test Phase Pattern
The Four Test Phase Pattern

Testing Phases

Unit Testing:

Testing of individual software components or modules. Typically done by the programmer and not by testers, as it requires detailed knowledge of the internal program design and code. may require developing test driver modules or test harnesses.

 System Integration Testing (SIT):

Integration testing – Testing of integrated modules to verify combined functionality after integration. Modules are typically code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems. Testing performed to expose defects in the interfaces and in the interactions between integrated components or systems. See also component integration testing, system integration testing.

 Software Integration Test (SWIT)

Similar to system testing, involves testing of a complete application environment, including scheduling, in a situation that mimics real-world use, such as interacting with a database, using network communications, or interacting with other hardware, applications, or systems if appropriate.

 User Acceptance Testing (UAT):

Normally, this type of testing is done to verify if the system meets the customer specified requirements. Users or customers do this testing to determine whether to accept the application.  Formal testing with respect to user needs, requirements, and business processes conducted to determine whether or not a system satisfies the acceptance criteria and to enable the user, customers or other authorized entity to determine whether or not to accept the system.

Related References