Netezza / PureData – How to add multiple columns to a Netezza table in one SQL

add multiple columns to a Netezza table , alter table

SQL (Structured Query Language)

 

I had this example floating around in a notepad for a while, but I find myself coming back it occasionally.  So, I thought I would add it to this blog for future reference.

The Table Alter Process

This is an outline of the Alter table process I follow, for reference, in case it is helpful.

  • Generate DDL in Aginity and make backup original table structure
  • Perform Insert into backup table from original table
  • Create Alter SQL
  • Execute Alter SQL
  • Refresh Aginity table columns
  • Generate new DDL
  • visually validate DDL Structure
  • If correct, archive copy of DDL to version control system
  • Preform any update commands, if required, required to populate the new columns.
  • Execute post alter table cleanup
    • Groom Versions
    • Groom table
    • Generate statistics
  • Once the any required processes and the data have been validated, drop the backup table.

 

Basic Alter SQL Command Structure

Here is the basic syntax for adding multiple columns:

ALTER TABLE <<OWNER>>.<<TABLENAME>>

ADD COLUMN <<FieldName1>> <<Field Type>> <<Constraint, if applicable>>

, <<FieldName2>> <<Field Type>> <<Constraint, if applicable>>;

 

Example Alter SQL Command to a Multiple Columns

Here is a quick example, which is adding four columns:

Example SQL Adding Multiple Columns

ALTER TABLE BLOG.PRODUCT_DIM

ADD COLUMN MANUFACTURING_PLANT_KEY NUMERIC(6,0) NOT NULL DEFAULT 0

, LEAD_TIME_PRODUCTION NUMERIC(2,0)  NOT NULL DEFAULT 0

, PRODUCT_CYCLE CHARACTER VARYING(15)  NOT NULL DEFAULT ‘ ‘::”NVARCHAR”

, PRODUCT_CLASS CHARACTER VARYING(2)  NOT NULL  DEFAULT ‘ ‘::”NVARCHAR” ;

 

Cleanup Table SQL Statements

GROOM TABLE BLOG.PRODUCT_DIM VERSIONS;

GROOM TABLE BLOG.PRODUCT_DIM;

GENERATE STATISTICS ON BLOG.PRODUCT_DIM;

 

Related References

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL command reference, ALTER TABLE

Netezza / PureData – How to add a Foreign Key

DDL (Data Definition Language), Netezza PureData How to add a Foreign Key

DDL (Data Definition Language)

Adding a forging key to tables in Netezza / PureData is a best practice; especially, when working with dimensionally modeled data warehouse structures and with modern governance, integration (including virtualization), presentation semantics (including reporting, business intelligence and analytics).

Foreign Key (FK) Guidelines

  • A primary key must be defined on the table and fields (or fields) to which you intend to link the foreign key
  • Avoid using distribution keys as foreign keys
  • Foreign Key field should not be nullable
  • Your foreign key link field(s) must be of the same format(s) (e.g. integer to integer, etc.)
  • Apply standard naming conventions to constraint name:
    • FK_<<Constraint_Name>>_<<Number>>
    • <<Constraint_Name>>_FK<<Number>>
  • Please note that foreign key constraints are not enforced in Netezza

Steps to add a Foreign Key

The process for adding foreign keys involves just a few steps:

  • Verify guidelines above
  • Alter table add constraint SQL command
  • Run statistics, which is optional, but strongly recommended

Basic Foreign Key SQL Command Structure

Here is the basic syntax for adding Foreign key:

ALTER TABLE <<Owner>>.<<NAME_OF_TABLE_BEING_ALTERED>>

ADD CONSTRAINT <<Constraint_Name>>_fk<Number>>

FOREIGN KEY (<<Field_Name or Field_Name List>>) REFERENCES <<Owner>>.<<target_FK_Table_Name>.(<<Field_Name or Field_Name List>>) <<On Update | On Delete>> action;

Example Foreign Key SQL Command

This is a simple one field example of the foreign key (FK)

 

ALTER TABLE Blog.job_stage_fact

ADD CONSTRAINT job_stage_fact_host_dim_fk1

FOREIGN KEY (hostid) REFERENCES Blog.host_dim(hostid) ON DELETE cascade ON UPDATE no action;

Related References

Alter Table

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL command reference, Alter Table, constraints

 

 

Database – What is a foreign key?

Acronyms, Abbreviations, Terms, And Definitions, DDL (Data Definition Language), What is a foreign key

Acronyms, Abbreviations, Terms, And Definitions

Definition of a Foreign Key

  • A foreign Key (FK) is a constraint that references the unique primary key (PK) of another table.

Facts About Foreign Keys

  • Foreign Keys act as a cross-reference between tables linking the foreign key (Child record) to the Primary key (parent record) of another table, which establishing a link/relationship between the table keys
  • Foreign keys are not enforced by all RDBMS
  • The concept of referential integrity is derived from foreign key theory
  • Because Foreign keys involve more than one table relationship, their implementation can be more complex than primary keys
  • A foreign-key constraint implicitly defines an index on the foreign-key column(s) in the child table, however, manually defining a matching index may improve join performance in some database
  • The SQL, normally, provides the following referential integrity actions for deletions, when enforcing foreign-keys

Cascade

  • The deletion of a parent (primary key) record may cause the deletion of corresponding foreign-key records.

No Action

  • Forbids the deletion of a parent (primary key) record, if there are dependent foreign-key records.   No Action does not mean to suppress the foreign-key constraint.

Set null

  • The deletion of a parent (primary key) record causes the corresponding foreign-key to be set to null.

Set default

  • The deletion of a record causes the corresponding foreign-keys be set to a default value instead of null upon deletion of a parent (primary key) record

Related References

 

Database – What is a Composite Primary Key?

Database Table

Database Table

What is a Composite Primary Key?

A Composite Primary key is Primary key What a primary key, which is defined by having multiple fields (columns) in it.  Like a Primary Key what a composite Primary Key is depends on the database.  Essentially a Composite Primary Key:

  • Is a combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And may define parent-Child relationship between tables

Related References

Database – What is a Primary Key?

Database Table

Database Table

What is a primary Key?

What a primary key is depends, somewhat, on the database.  However, in its simplest form a primary key:

  • Is a field (Column) or combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And, depending on the data model can, define parent-Child relationship between tables

Related References

Netezza/ PureData – how to add a primary key

Netezza / PureData Create Primary Key (PK)

Netezza / PureData Create Primary Key (PK)

While primary keys (PK) are not enforced within Netezza, they still provide significant value and should be added.  Among the values that adding primary key provides are:

  • Inform tools, which have meta Data import capabilities; for example, Aginity, ETL tools, data modeling tools, Infosphere Data Architect, DataStage and DataQuality, Infosphere Information Server suite of tools (e.g. Governance Console, Information analyzer, etc.).
  • Visually helps developers, data modelers, and users to know what the keys primary keys of the table are, which may not be obvious with the table structure. This is especially true for table utilizing compound keys as the primary key.
  • The query optimizer will use these definitions define efficient query execution plans
  • Identifying Primary Keys information provides migratable self-documenting Meta Data
  • Aides in the facilitation of future data and application enrichment projects

 

Basic Primary Key syntax

ALTER TABLE <<schema.tablename>> ADD CONSTRAINT <<ConstraintName>> PRIMARY KEY (FieldNames);

 

Example Primary Key syntax

ALTER TABLEtime_dim ADD CONSTRAINT time_dim_pk PRIMARY KEY (time_srky);

 

Related References

 

Surrogate Key File Effective Practices

Database Table, Surrogate Key File Effective Practices

Surrogate Key File Effective Practices

Here are a few thoughts on effectively working with IBM Infosphere, Information Server, DataStage surrogate key files, which may prove useful for developers.

 

Placement

  • The main thing about placement is that it be in a consistent location. Developers and production support teams should need to guess or look up where it is for every DataStage project. So, it is best to put the surrogate keys in same base path and that each project has its own subfolder to facilitate migrations and to reduce the possibility of human error. Here Is the patch structure, which is commonly use:

Path

  • /data/SRKY/<<Project Name>>

Parameter Sets

  • As a best practice, the surrogate key file path should be in a parameter set and the parameter used in the jobs, as needed.  This simplifies maintenance, if and when changes to the path are required, and during migrations.

Surrogate Key Parameter Set Screenshot – Example Parameter Tab

Surrogate Key Parameter Set Screen Screen

Surrogate Key Parameter Set Screenshot

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Permissions

  • DataStage must have permissions to:
    • The entire parent path
    • The project folder, and
    • The surrogate key files themselves.

To ensure the DataStage has access to the path and Surrogate files, ensure:

  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to folders in the path, with at least a “-rw-r–r–“ (644) permissions level. Keeping the permissions to a minimum can, for obvious reasons,  prevent inadvertent overwrites of the surrogate key files; thus, avoiding some, potentially, serious cleanup.
  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

Netezza/PureData – Table Distribution Key Practices

Database Table, Table Distribution Key Practices, netezza, puredata

Database Table

For those of us new to Netezza and coming from other database system, which use index, the optimization of the distribution key in Netezza can be counter initiative.  The distribution key is not about group like an index, but rather spreading the data in a which let the zone map functionality improve performance.  So, here are a few effective practices to help decision making when deciding on a distribution key for a table:

  1. Distribute on one key only
  2. Distribute on the lowest/finest possible grain of the data to improve your zone map balance
  3. Avoid distributing foreign keys and/or foreign Surrogate key
  4. Avoid distributing on fields, which may need to be updated