Adding a forging key to tables in Netezza / PureData is a best practice; especially, when working with dimensionally modeled data warehouse structures and with modern governance, integration (including virtualization), presentation semantics (including reporting, business intelligence and analytics).
Foreign Key (FK) Guidelines
A primary key must be defined on the table and fields (or fields) to which you intend to link the foreign key
Avoid using distribution keys as foreign keys
Foreign Key field should not be nullable
Your foreign key link field(s) must be of the same format(s) (e.g. integer to integer, etc.)
Apply standard naming conventions to constraint name:
Please note that foreign key constraints are not enforced in Netezza
Steps to add a Foreign Key
The process for adding foreign keys involves just a few steps:
Verify guidelines above
Alter table add constraint SQL command
Run statistics, which is optional, but strongly recommended
Basic Foreign Key SQL Command Structure
Here is the basic syntax for adding Foreign key:
ALTER TABLE <<Owner>>.<<NAME_OF_TABLE_BEING_ALTERED>>
ADD CONSTRAINT <<Constraint_Name>>_fk<Number>>
FOREIGN KEY (<<Field_Name or Field_Name List>>) REFERENCES <<Owner>>.<<target_FK_Table_Name>.(<<Field_Name or Field_Name List>>) <<On Update | On Delete>> action;
Example Foreign Key SQL Command
This is a simple one field example of the foreign key (FK)
ALTER TABLE Blog.job_stage_fact
ADD CONSTRAINT job_stage_fact_host_dim_fk1
FOREIGN KEY (hostid) REFERENCES Blog.host_dim(hostid) ON DELETE cascade ON UPDATE no action;
PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL command reference, Alter Table, constraints
A foreign Key (FK) is a constraint that references the unique primary key (PK) of another table.
Facts About Foreign Keys
Foreign Keys act as a cross-reference between tables linking the foreign key (Child record) to the Primary key (parent record) of another table, which establishing a link/relationship between the table keys
Foreign keys are not enforced by all RDBMS
The concept of referential integrity is derived from foreign key theory
Because Foreign keys involve more than one table relationship, their implementation can be more complex than primary keys
A foreign-key constraint implicitly defines an index on the foreign-key column(s) in the child table, however, manually defining a matching index may improve join performance in some database
The SQL, normally, provides the following referential integrity actions for deletions, when enforcing foreign-keys
The deletion of a parent (primary key) record may cause the deletion of corresponding foreign-key records.
Forbids the deletion of a parent (primary key) record, if there are dependent foreign-key records. No Action does not mean to suppress the foreign-key constraint.
The deletion of a parent (primary key) record causes the corresponding foreign-key to be set to null.
The deletion of a record causes the corresponding foreign-keys be set to a default value instead of null upon deletion of a parent (primary key) record
A Composite Primary key is Primary key What a primary key, which is defined by having multiple fields (columns) in it. Like a Primary Key what a composite Primary Key is depends on the database. Essentially a Composite Primary Key:
Is a combination of Fields (columns) which uniquely identifies every row.
Is an index in database systems which use indexes for optimization
Is a type of table constraint
Is applied with a data definition language (DDL) alter command
And may define parent-Child relationship between tables
While primary keys (PK) are not enforced within Netezza, they still provide significant value and should be added. Among the values that adding primary key provides are:
Inform tools, which have meta Data import capabilities; for example, Aginity, ETL tools, data modeling tools, Infosphere Data Architect, DataStage and DataQuality, Infosphere Information Server suite of tools (e.g. Governance Console, Information analyzer, etc.).
Visually helps developers, data modelers, and users to know what the keys primary keys of the table are, which may not be obvious with the table structure. This is especially true for table utilizing compound keys as the primary key.
The query optimizer will use these definitions define efficient query execution plans
Identifying Primary Keys information provides migratable self-documenting Meta Data
Aides in the facilitation of future data and application enrichment projects
Basic Primary Key syntax
ALTER TABLE <<schema.tablename>> ADD CONSTRAINT <<ConstraintName>> PRIMARY KEY (FieldNames);
Example Primary Key syntax
ALTER TABLEtime_dim ADD CONSTRAINT time_dim_pk PRIMARY KEY (time_srky);
Here are a few thoughts on effectively working with IBM Infosphere, Information Server, DataStage surrogate key files, which may prove useful for developers.
The main thing about placement is that it be in a consistent location. Developers and production support teams should need to guess or look up where it is for every DataStage project. So, it is best to put the surrogate keys in same base path and that each project has its own subfolder to facilitate migrations and to reduce the possibility of human error. Here Is the patch structure, which is commonly use:
As a best practice, the surrogate key file path should be in a parameter set and the parameter used in the jobs, as needed. This simplifies maintenance, if and when changes to the path are required, and during migrations.
Surrogate Key Parameter Set Screenshot – Example Parameter Tab
Surrogate Key Parameter Set Screenshot – Example Values Tab
Surrogate Key Job Parameter Example Path using Parameter
DataStage must have permissions to:
The entire parent path
The project folder, and
The surrogate key files themselves.
To ensure the DataStage has access to the path and Surrogate files, ensure:
The ‘dsadm’ (owner) and ‘dstage’ (group) have access to folders in the path, with at least a “-rw-r–r–“ (644) permissions level. Keeping the permissions to a minimum can, for obvious reasons, prevent inadvertent overwrites of the surrogate key files; thus, avoiding some, potentially, serious cleanup.
The ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files