Data Modeling – Fact Table Effective Practices

Database Table

Database Table

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Related References

Data Modeling – Dimension Table Effective Practices

Database Table

Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

 

 

Software Development Life Cycle – What is RAD?

Acronyms, Abbreviations, Terms, And Definitions

Acronyms, Abbreviations, Terms, And Definitions

What is RAD?

Rapid Application Development (RAD) is a type of incremental software development methodology, which emphasizes rapid prototyping and iterative delivery, rather than planning. In RAD model the components or major functions are developed in parallel as if they were small relatively independent projects, until integration.

RAD projects are iterative and incremental

RAD projects follow the SDLC iterative and incremental model:

  • During which more than one iteration of the software development cycle may be in progress at the same time
  • In RAD model the functional application modules are developed in parallel, as prototypes, and are integrated to complete the product for faster product delivery.
  • RAD teams are small and comprised of developers, domain experts, customer representatives and other information technology resources working progressively on their component and/or prototype.

InfoSphere DataStage – DataStage Parallel Job Peer Code Review Checklist Template

SDLC Development Phase

SDLC Development Phase

Peer code review happens during the development phase and focus on the overall quality and compliance to standards of code and configuration artifacts. However, the hard part of performing a Peer code review isn’t, performing the review, but rather to achieving consistency and thoroughness in the review.   This is where a checklist can contribute significantly, providing a list of things to check and providing a relative weight for the findings.  I hope this template assists with your DataStage job review process.

 

What is a Peer Review?

Acronyms, Abbreviations, Terms, And Definitions

Acronyms, Abbreviations, Terms, And Definitions

A peer review is an examination of a Software Development Life Cycle (SDLC) work product by team members, other than the work Product’s author, to identify defects, omissions, and compliance to standards.  This process provides an opportunity for a quality assurance, knowledge sharing, and product improvement early during the SDLC life cycle.

What is a software development baseline?

Acronyms, Abbreviations, Terms, And Definitions

Acronyms, Abbreviations, Terms, And Definitions

What, exactly, the definition of a baseline is depends on your role and perspective on the SDLC (Software Development Life Cycle) process.   The baseline concept plays in many aspects of SDLC execution, including project management, configuration management, and others.  Additionally, the baseline concept and practice is applicable to all the SLDC methodologies, including, but not limit to the, Agile Model, Waterfall Model, Iterative Model, Spiral Model, and V-Model.

Baseline Definition

A baseline is a reference point in the software development life cycle marked by the completion and formal approval of a set of predefined work products for phase completion. The objective of a baseline is to reduce a project’s vulnerability to uncontrolled change and to provide a point in time set of artifacts for references and recovery, if necessary.    Baselining an artifact (requirements specification matrix, design, code, data model, etc.) moves it into formal change control (usually, in one or more repository tools) at milestone achievement points in the development life cycle. Baselines are also used to identify the essential software, hardware, and configuration assembly components that make up a specific release of a system.

Common Information Integration Testing Phases

Over the years I have seen a lot of patterns for Information integration testing process and these patterns will not be an exhaustive list of patterns a consultant will encounter over the course of a career.

However, two most common patterns in the testing process are:

The Three Test Phase Pattern

In the three test phase pattern, normally, the environment and testing activities of SIT and SWIT are combined.

The Three Test Phase Pattern

The Three Test Phase Pattern

The Four Test Phase Pattern

In the four test phase pattern, normally, the environment and testing activities of SIT and SWIT are performed separately and, frequently, will have separate environments in the migration path.

The Four Test Phase Pattern

The Four Test Phase Pattern

Testing Phases

Unit Testing:

Testing of individual software components or modules. Typically done by the programmer and not by testers, as it requires detailed knowledge of the internal program design and code. may require developing test driver modules or test harnesses.

 System Integration Testing (SIT):

Integration testing – Testing of integrated modules to verify combined functionality after integration. Modules are typically code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems. Testing performed to expose defects in the interfaces and in the interactions between integrated components or systems. See also component integration testing, system integration testing.

 Software Integration Test (SWIT)

Similar to system testing, involves testing of a complete application environment, including scheduling, in a situation that mimics real-world use, such as interacting with a database, using network communications, or interacting with other hardware, applications, or systems if appropriate.

 User Acceptance Testing (UAT):

Normally, this type of testing is done to verify if the system meets the customer specified requirements. Users or customers do this testing to determine whether to accept the application.  Formal testing with respect to user needs, requirements, and business processes conducted to determine whether or not a system satisfies the acceptance criteria and to enable the user, customers or other authorized entity to determine whether or not to accept the system.

Related References

 

 

Data Modeling – What is Data Modeling?

Data Models, Data Modeling, What is data Modeling, logical Model, Conceptual Model, Physical Model

Data Models

Data modeling is the documenting of data relationships, characteristics, and standards based on its intended use of the data.   Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data creating a blueprint and foundation for information technology development and reengineering.

A data model can be thought of as a diagram  that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, a well-documented models allow stakeholders to identify errors and make changes before any programming code has been written.

Data modelers often use multiple models to view the same data and ensure that all processes, entities, relationships and data flows have been identified.

There are several different approaches to data modeling, including:

Concept Data Model (CDM)

  • The Concept Data Model (CDM) identifies the high level information entities, their relationships, and organized in the Entity Relationship Diagram (ERD).

Logical Data Model (LDM)

  • The Logical Data Model (LDM)  defines detail business information (in business terms) within each of the Concept Data Model and is a refinement of the information entities of the Concept Data Model.   Logical data model are non-RDBMS specific  business definition of tables, fields, and attributes contained within each information entity from which the Physical Data Model (PDM) and Entity Relationship Diagram (ERD) is produced.

Physical Data Model (PDM)

  • The Physical Data Model (PDM)  provides the actual technical details of the model and database object (e.g. table names, field names, etc.) to facilitate creation of accurate detail technical designs and actual database creation.  Physical Data Models are RDBMS specific definition of the logical model used build database, create deployable DDL statements, and to produce the Entity Relationship Diagram (ERD).

Related References

 

Where do data models fit in the Software Development Life Cycle (SDLC) Process?

Data Model SDLC Relationship Diagram

Data Model SDLC Relationship Diagram

In the classic Software Development Life Cycle (SDLC) process, Data Models are typically initiated, by model type, at key process steps and are maintained as data model detail is added and refinement occurs.

The Concept Data Model (CDM) is, usually, created in the Planning phase.   However,  creation the Concept Data Model  can slide forwarded or backward,  somewhat , within the System Concept Development, Planning, and Requirements Analysis phases, depending upon  whether the application being modeled is a custom development effort or a modification of a Commercial-Off-The-Shelf (COTS) application.  The CDM is maintained, as necessary, through the remainder of the SDLC process.

The Logical Data Model (LDM) is created in the Requirement Analysis phase and is a refinement of the information entities of the Concept Data Model. The LDM is maintained, as necessary, through the remainder of the SDLC process.

The Physical Data Model (PDM) is created in the Design phase to facilitate creation of accurate detail technical designs and actual database creation. The PDM is maintained, as necessary, through the remainder of the SDLC process.

Related References:

 

What is an information technology artifact?

Acronyms, Abbreviations, Terms, And Definitions

Acronyms, Abbreviations, Terms, And Definitions

 

An artifact is software, meta data, configuration values, and documentation, whose purpose is to collect, organize, and document information designs, specification and other key facts to implement and provide for the future support/maintenance of an information technology system.