What is Source Control?

Acronyms, Abbreviations, Terms, And Definitions

Acronyms, Abbreviations, Terms, And Definitions

Source Control is an Information technology environment management system for storing, tracking and managing changes to software. This is commonly done through a process of creating branches (copies for safely creating new features) off of the stable master version of the software, then merging stable feature branches back into the master version. This is also known as version control or revision control.

Netezza / PureData – How To Get A List Of When A Store Procedure Was Last Changed Or Created

Netezza / Puredata - SQL (Structured Query Language)

Netezza / Puredata – SQL (Structured Query Language)

In the continuing journey to track down impacted objects and to determine when the code in a database was last changed or added, here is another quick SQL, which can be used in Aginity Workbench for Netezza to retrieve a list of when Store Procedures were last updated or were created.

SQL List of When A Stored Procedure was Last Changed or Created

select t.database — Database
, t.OWNER — Object Owner
, t.PROCEDURE — Procedure Name
, o.objmodified — The Last Modified Datetime
, o.objcreated — Created Datetime

from _V_OBJECT o
, _v_procedure t
where
o.objid = t.objid
and t.DATABASE = ‘<<Database Name>>
order by o.objmodified Desc, o.objcreated Desc;

 

Related References

 

What is a Common Data Model (CDM)?

Data Model, Common Data Model, CDM, What is a Common Data Model (CDM)

Data Model

 

What is a Common Data Model (CDM)?

 

A Common Data Model (CDM) is a share data structure designed to provide well-formed and standardized data structures within an industry (e.g. medical, Insurance, etc.) or business channel (e.g. Human resource management, Asset Management, etc.), which can be applied to provide organizations a consistent unified view of business information.   These common models can be leveraged as accelerators by organizations form the foundation for their information, including SOA interchanges, Mashup, data vitalization, Enterprise Data Model (EDM), business intelligence (BI), and/or to standardize their data models to improve meta data management and data integration practices.

Related references

IBM, IBM Analytics

IBM Analytics, Technology, Database Management, Data Warehousing, Industry Models

github.com

Observational Health Data Sciences and Informatics (OHDSI)/Common Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Utilities Data Model

Oracle

Industries, Communications, Service Providers, Products, Data Mode, Oracle Communications Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Airline data Model

 

Netezza / PureData – How to add a Foreign Key

DDL (Data Definition Language), Netezza PureData How to add a Foreign Key

DDL (Data Definition Language)

Adding a forging key to tables in Netezza / PureData is a best practice; especially, when working with dimensionally modeled data warehouse structures and with modern governance, integration (including virtualization), presentation semantics (including reporting, business intelligence and analytics).

Foreign Key (FK) Guidelines

  • A primary key must be defined on the table and fields (or fields) to which you intend to link the foreign key
  • Avoid using distribution keys as foreign keys
  • Foreign Key field should not be nullable
  • Your foreign key link field(s) must be of the same format(s) (e.g. integer to integer, etc.)
  • Apply standard naming conventions to constraint name:
    • FK_<<Constraint_Name>>_<<Number>>
    • <<Constraint_Name>>_FK<<Number>>
  • Please note that foreign key constraints are not enforced in Netezza

Steps to add a Foreign Key

The process for adding foreign keys involves just a few steps:

  • Verify guidelines above
  • Alter table add constraint SQL command
  • Run statistics, which is optional, but strongly recommended

Basic Foreign Key SQL Command Structure

Here is the basic syntax for adding Foreign key:

ALTER TABLE <<Owner>>.<<NAME_OF_TABLE_BEING_ALTERED>>

ADD CONSTRAINT <<Constraint_Name>>_fk<Number>>

FOREIGN KEY (<<Field_Name or Field_Name List>>) REFERENCES <<Owner>>.<<target_FK_Table_Name>.(<<Field_Name or Field_Name List>>) <<On Update | On Delete>> action;

Example Foreign Key SQL Command

This is a simple one field example of the foreign key (FK)

 

ALTER TABLE Blog.job_stage_fact

ADD CONSTRAINT job_stage_fact_host_dim_fk1

FOREIGN KEY (hostid) REFERENCES Blog.host_dim(hostid) ON DELETE cascade ON UPDATE no action;

Related References

Alter Table

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL command reference, Alter Table, constraints

 

 

Database – What is a foreign key?

Acronyms, Abbreviations, Terms, And Definitions, DDL (Data Definition Language), What is a foreign key

Acronyms, Abbreviations, Terms, And Definitions

Definition of a Foreign Key

  • A foreign Key (FK) is a constraint that references the unique primary key (PK) of another table.

Facts About Foreign Keys

  • Foreign Keys act as a cross-reference between tables linking the foreign key (Child record) to the Primary key (parent record) of another table, which establishing a link/relationship between the table keys
  • Foreign keys are not enforced by all RDBMS
  • The concept of referential integrity is derived from foreign key theory
  • Because Foreign keys involve more than one table relationship, their implementation can be more complex than primary keys
  • A foreign-key constraint implicitly defines an index on the foreign-key column(s) in the child table, however, manually defining a matching index may improve join performance in some database
  • The SQL, normally, provides the following referential integrity actions for deletions, when enforcing foreign-keys

Cascade

  • The deletion of a parent (primary key) record may cause the deletion of corresponding foreign-key records.

No Action

  • Forbids the deletion of a parent (primary key) record, if there are dependent foreign-key records.   No Action does not mean to suppress the foreign-key constraint.

Set null

  • The deletion of a parent (primary key) record causes the corresponding foreign-key to be set to null.

Set default

  • The deletion of a record causes the corresponding foreign-keys be set to a default value instead of null upon deletion of a parent (primary key) record

Related References

 

What is Process Asset Library?

Documentation, Process Asset Library, PAL, SOP, Procedures, Artifacts, CMM, CMMI

Documentation

 

What is Process Asset Library (PAL)?

Process Asset Library (PAL) is a centralized repository, within an organization, which contains essential artifacts that document processes or are process assets (e.g. configuration Items and designs) used by an organization, project, team, and/or work group.  The assets may, also, be leveraged to achieve process improvement, which is the intent of lessons learned document, for example.

What is in the Process Asset Library (PAL)?

Process Asset Library (PAL), usually, houses of the following types of artifacts:

  • Organizational policies
  • Process descriptions
  • Procedures
  • Development plans
  • Acquisition plans
  • Quality assurance plans
  • Training materials
  • Process aids (e.g. templates, checklists, job aides and forms)
  • Lessons learned reports

 

Related References

CMMI Institute

What Is Capability Maturity Model Integration (CMMI)?

Building Organizational Capability

 

Database – What is a Composite Primary Key?

Database Table

Database Table

What is a Composite Primary Key?

A Composite Primary key is Primary key What a primary key, which is defined by having multiple fields (columns) in it.  Like a Primary Key what a composite Primary Key is depends on the database.  Essentially a Composite Primary Key:

  • Is a combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And may define parent-Child relationship between tables

Related References

Data Modeling – Fact Table Effective Practices

Database Table

Database Table

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Related References

Data Modeling – Dimension Table Effective Practices

Database Table

Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

 

 

Netezza/ PureData – how to add a primary key

Netezza / PureData Create Primary Key (PK)

Netezza / PureData Create Primary Key (PK)

While primary keys (PK) are not enforced within Netezza, they still provide significant value and should be added.  Among the values that adding primary key provides are:

  • Inform tools, which have meta Data import capabilities; for example, Aginity, ETL tools, data modeling tools, Infosphere Data Architect, DataStage and DataQuality, Infosphere Information Server suite of tools (e.g. Governance Console, Information analyzer, etc.).
  • Visually helps developers, data modelers, and users to know what the keys primary keys of the table are, which may not be obvious with the table structure. This is especially true for table utilizing compound keys as the primary key.
  • The query optimizer will use these definitions define efficient query execution plans
  • Identifying Primary Keys information provides migratable self-documenting Meta Data
  • Aides in the facilitation of future data and application enrichment projects

 

Basic Primary Key syntax

ALTER TABLE <<schema.tablename>> ADD CONSTRAINT <<ConstraintName>> PRIMARY KEY (FieldNames);

 

Example Primary Key syntax

ALTER TABLEtime_dim ADD CONSTRAINT time_dim_pk PRIMARY KEY (time_srky);

 

Related References

 

Netezza / PureData – How to add comments on a table

Comment On A Netezza / PureData Table

Comment On A Netezza / PureData Table

Using the ‘Comment On’ capabilities of Netezza is an easy way to provide self-documentation within the DDL structure of a table.  This can be very helpful to developers and users, who can see the DDL of the table.  Personally, I think of it as a way to document the purpose and/or business meaning of a table.  This can be especially, when differing team may be working with tables. Also, reduce the reliance on developer and/or user memory, or to search of the technical documentation, which may or may not be readily available.

Basic ‘COMMENT ON Table’ Syntax

The basic syntax to add a comment to add a comment to a table is:

COMMENT ON TABLE <<Schema.TableName>> IS ‘<<Descriptive Comment>>’;

 

Example ‘COMMENT ON Table’ Syntax

COMMENT ON TABLEdate_dim IS ‘Date dimension table, which provides basic calendar date information’;

Related References

 

Surrogate Key File Effective Practices

Database Table, Surrogate Key File Effective Practices

Surrogate Key File Effective Practices

Here are a few thoughts on effectively working with IBM Infosphere, Information Server, DataStage surrogate key files, which may prove useful for developers.

 

Placement

  • The main thing about placement is that it be in a consistent location. Developers and production support teams should need to guess or look up where it is for every DataStage project. So, it is best to put the surrogate keys in same base path and that each project has its own subfolder to facilitate migrations and to reduce the possibility of human error. Here Is the patch structure, which is commonly use:

Path

  • /data/SRKY/<<Project Name>>

Parameter Sets

  • As a best practice, the surrogate key file path should be in a parameter set and the parameter used in the jobs, as needed.  This simplifies maintenance, if and when changes to the path are required, and during migrations.

Surrogate Key Parameter Set Screenshot – Example Parameter Tab

Surrogate Key Parameter Set Screen Screen

Surrogate Key Parameter Set Screenshot

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Permissions

  • DataStage must have permissions to:
    • The entire parent path
    • The project folder, and
    • The surrogate key files themselves.

To ensure the DataStage has access to the path and Surrogate files, ensure:

  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to folders in the path, with at least a “-rw-r–r–“ (644) permissions level. Keeping the permissions to a minimum can, for obvious reasons,  prevent inadvertent overwrites of the surrogate key files; thus, avoiding some, potentially, serious cleanup.
  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

Netezza/PureData – Table Distribution Key Practices

Database Table, Table Distribution Key Practices, netezza, puredata

Database Table

For those of us new to Netezza and coming from other database system, which use index, the optimization of the distribution key in Netezza can be counter initiative.  The distribution key is not about group like an index, but rather spreading the data in a which let the zone map functionality improve performance.  So, here are a few effective practices to help decision making when deciding on a distribution key for a table:

  1. Distribute on one key only
  2. Distribute on the lowest/finest possible grain of the data to improve your zone map balance
  3. Avoid distributing foreign keys and/or foreign Surrogate key
  4. Avoid distributing on fields, which may need to be updated

ETL Error Handling Effective Practices

Error, ETL Error Handling Effective Practices

Error

ETL Error Handling Effective Practices

ETL (extract, transform, and load) error handling practices can vary, but three basic approaches can significantly assist in having effective ETL error handling practices.  Effective error handling practices begin in the requirements and design phases. All too often, error handling practices are left to the build phase and the fall to the developer practices. This is an area where standard practices are not well defined or adapted by the ETL developer community.  So, here are a few effective error handling practices which will contribute to process stability, information timeliness, information accuracy, and reduce the level of effort required to support the application once in operation.

Anticipating ETL Errors

In the requirements and design phases, with proper consideration, many errors can be avoided altogether in the ETL process. When discussing requirements and preparing designs consideration should be given to error handling, especially, treatment of common errors. As an effective practice, anticipated errors should be treated within the ETL process. Some examples, to consider our:

  • Replacement of special characters: do any special characters need to be removed, if found? This is generally determined that the field level and should be considered in the source to target mapping (STTM) and business rules. Also, the passing between differing systems and working with VARCHAR fields, should the ‘unicode’ extended property be set.
  • Removal of leading and trailing spaces: removal of unnecessary leading and trailing spaces should be considered when changing fields from CHAR to VARCHAR and when working with keys used as primary keys, join keys, and/or lookup keys.
  • Deduplication of data: duplicate data prevention practices and business rules should always be considered. These can be of a couple of types:
    • First, is file processing conventions, such as, assigning timestamps to files and removal or movement of process files to prevent reprocessing.
    • Second, is rules for dedication of duplicate rows, including the dedication of appropriate keys for determining duplicate rows.
    • Third, if duplicate rows are being produced as a result of more than one input source system, identification of the authoritative source should be considered to resolve conflicts.
  • Null Value Treatment: null value treatment can be extraordinarily important, especially, when working with keys and traditional data warehouse models. It is important to be mindful of the fact that to the database and the ETL nulls and spaces are not the same thing. They may or may not be the same thing in the mind of the consumer of the information. So,  business rules should indicate the treatment of both spaces and nulls. In some circumstances, especially, when using surrogate keys in data warehousing business processes sometimes need to know the difference between a null in a space or even a no and the space and an unknown value. So these three scenarios should be considered when forming business rules and treating the ETL.  Here are a couple questions that could be asked informing your solution:
    • do nulls and spaces mean the same thing to the business community?
    • Is a space considered an unknown value?
    • Does a null need to be uniquely identified as different from a space and/or an unknown look up value?
    • If surrogate keys are in use for the field in question, which of these scenarios require a unique surrogate key, other than the unknown unique surrogate key?
  • Missing or Invalid Value Replacement or Defaults: having replacement values or defaults is especially important for any fields which are not nullable and/or require a surrogate key for data warehouse dimensions.  Also, for reporting to be meaningful replacement or default value assignments can be important, as well (e.g. for cubes, and statistical calculations).

Rejecting Rows

Rows should not be rejected, unless there is a specific business requirement and/or need to do so.  Rejecting rows causes data inaccuracies by omission and undermines the consumer’s confidence in the accuracy of the information being delivered.  This can be, especially, problematic for accounting and other activities, which must balance across information sets.

  • If value look ups are in use:
    • Unknown and null values need to have a treatment rule to prevent errors.
    • Two surrogate key or transformation default values may be necessary, if the ability to distinguish between an unknown/Invalid value and a null value is required.
    • Make sure the look up ‘Key Type’ are aligned (e.g. equality, caseless equality) to the formatting of both inputs to the look up
    • That the complete unique key is in use.

Information Consistency Practices

Information consistency practices allow the information to be transformed and enriched to make the information more consistent for ‘like to like’ comparisons, usability, and/or readability. As an effective practice consider these Standard formatting recommendations, which can be good requirements questions and should be included in the STTM:

  • Making descriptive and/or text fields consistent in their format (e.g. mixed case, Proper case, upper case).
  • Have use consistent date formatting, when converting dates to text fields.
  • When dealing with currency, convert the currency to consistent ISO currency codes (e.g. USD, CAD, EUR) and decimal (e.g.  two decimal places).
  • Identification of financial records into categories (e.g. credit and debit) with a default group behavior included (e.g. N/A or Unknown).