Netezza/ PureData – how to add a primary key

Netezza / PureData Create Primary Key (PK)

Netezza / PureData Create Primary Key (PK)

While primary keys (PK) are not enforced within Netezza, they still provide significant value and should be added.  Among the values that adding primary key provides are:

  • Inform tools, which have meta Data import capabilities; for example, Aginity, ETL tools, data modeling tools, Infosphere Data Architect, DataStage and DataQuality, Infosphere Information Server suite of tools (e.g. Governance Console, Information analyzer, etc.).
  • Visually helps developers, data modelers, and users to know what the keys primary keys of the table are, which may not be obvious with the table structure. This is especially true for table utilizing compound keys as the primary key.
  • The query optimizer will use these definitions define efficient query execution plans
  • Identifying Primary Keys information provides migratable self-documenting Meta Data
  • Aides in the facilitation of future data and application enrichment projects

 

Basic Primary Key syntax

ALTER TABLE <<schema.tablename>> ADD CONSTRAINT <<ConstraintName>> PRIMARY KEY (FieldNames);

 

Example Primary Key syntax

ALTER TABLEtime_dim ADD CONSTRAINT time_dim_pk PRIMARY KEY (time_srky);

 

Related References

 

What are the Factor Affecting the Selection of Data Warehouse Naming Convention?

Data Warehouse Naming Convention

Database Table

The primary factors affecting the choices in the creation of Data Warehouse (DW) naming convention policy standards are the type of implementation, pattern of the implementation, and any preexisting conventions.

Type of implementation

The type of implementation will affect your naming convention choices. Basically, this boils down to, are you working with a Commercial-Off-The-Shelf (COTS) data warehouse or doing a custom build?

Commercial-Off-The-Shelf (COTS)

If it is a Commercial-Off-The-Shelf (COTS) warehouse, which you are modifying and or enhancing, then it is very strongly recommended that you conform to the naming conventions of the COTS product.  However, you may want to add an identifier to the conventions to identify your custom object.

Using this information as an exemplar:

  • FAV = Favinger, Inc. (Company Name – Custom Identifier)
  • GlobalSales = Global Sales (Subject)
  • MV = Materialized View (Object Type)

 

Suffix Pattern Naming Convention

<<Custom Identifier>>_<<Object Subject Name>>_<<Object Type>>

Example:  FAV_GlobalSales_MV

Prefix Pattern Naming Convention

<<Object Type>>_<<Custom Identifier>>_<<Object Subject Name>>

Example: MV_FAV_GlobalSales

Custom Data Warehouse Build

If you are creating a custom data warehouse from scratch, then you have more flexibility to choose your naming convention.  However, you will still need to take into account a few factors to achieve the maximum benefit from you naming conventions.

  • What is the high level pattern of you design?
  • Are there any preexisting naming conventions?

Data Warehouse Patterns

Your naming convention will need to take into account the overall intent and design pattern of the data warehouse, the objects and naming conventions of each pattern will vary, if for no other reason than the differences in the objects, their purpose, and the depth of their relationships.

High level Pattern of the Data Warehouse Implementation

The high level pattern of you design whether an Operational Data Store (ODS), Enterprise Data Warehouse (EDW), Data Mart (DM) or something else, will need to guide your naming convention, as the depth of logical and/or processing zone of each pattern will vary  and have some industry generally accepted conventions.

Structural Pattern of the Data Warehouse Implementation

The structural pattern of your data warehouse design whether, Snowflake, 3rd Normal Form, or Star Schema, will  need to guide your naming convention, as the depth of relationships each pattern will vary, have some industry generally accepted conventions, and will relate directly to you High level Data Warehouse pattern.

Preexisting Conventions

Often omitted factor of data warehouse naming conventions are the sources of preexisting conventions, which can have significant impacts both from an engineering and political point of view. The sources of these conventions can vary and may or may not be formally documented.

A common source naming convention conflict is with preexisting implementations, which may not even be document.  However, system objects and consumers are familiar will be exposed to these conventions, will need to be taken into account when accessing impacts to systems, political culture, user training, and the creation of a standard convention for your data warehouse.

The Relational Database Management System (RDBMS) in which you intend to build the data warehouse may have generally accepted conventions, which consumers may be familiar and have a preconceived expectations whether expressed or intended).

Change Management

Whatever data warehouse naming convention you chose, the naming conventions along with the data warehouse design patterns assumptions, should be well documented and placed in a managed and readily accessible, change management (CM) repository.

 

 

IIS Datastage Naming Conventions and Standards

Standards, Infosphere, Information Server, Datastage Naming Conventions and Standards

Standards

The standardized naming conventions ease the burden on developers switching from one project to another.  Knowing the names and where things are located are very useful to understand jobs or sequences and aides in the time to productivity of new team members.  This is also true for augmentation resources, which may need to be brought in assist your team.

The following tables identify DataStage elements and their standard naming convention.

Job Naming Conventions

Entity Convention
Parallel Job <<Application>>_<<job_Name>>_Prl
Server Job <<Application>>_<<job_Name>>_Svr
Sequence <<Application>>_<<job_Name>>_Seq
Extract Job  (Multipart processes only) <<Application>>Src<<job_Name>>_<<Job Type (Prl or Svr)>>
Load (Multipart processes only) <<Application>>Load<<job_Name>>_<<Job Type (Prl or Svr)>>
File Generation/Output (Multipart processes only) <<Application>>_FG<<job_Name>><<Job Type (Prl or Svr)>>

 

Properties Naming Conventions

Entity Convention
Parallel Shared Container <<Application>>_<<job_Name>>_Psc
Parameter <<Application>>_<<job_Name>>_Parm
Parameter_set <<Application>>_<<job_Name>>_ParmSet
Server Shared Container <<Application>>_<<job_Name>>_Ssc

Job Processing Stage Naming Conventions

Entity Convention
Aggregator Agg_<<PrimaryFunction>>
Change Apply ChAp_<<PrimaryFunction>>
Change Capture  ChCp_<<PrimaryFunction>>
Copy Cp_<<PrimaryFunction>>
Funnel Funl_<<PrimaryFunction>>
Join (Inner) InJn_<<PrimaryFunction>>
Join (Left Outer) LOJn_<<PrimaryFunction>>
Join (Right Outer) OJn_<<PrimaryFunction>>
Join (Full Outer) FOJn_<<PrimaryFunction>>
FTP Enterprise Ftp_<<PrimaryFunction>>
Lookup Lkp_<< Value Name or table Name>>
Merge Mrg_<<PrimaryFunction>>
Column Export CExp_<<PrimaryFunction>>
Column Import XImp_<<PrimaryFunction>>
Pivot Pivt_<<PrimaryFunction>>
Remove Duplicates RmDp_<<PrimaryFunction>>
Row Generator RGen_<<Job_Name>>
Slowly Changing Dimension SCD__<<PrimaryFunction>>
Sort Srt_<<PrimaryFunction>>
Surrogate Key Generator SKey_<<PrimaryFunction>>

 

Links Naming Conventions

Entity Convention
Reference (Lookup) Ln_Ref_<<Number or Additional descriptor, if needed to form a unique object name>>
Reject (Lookup, File, DB) Ln_Rej_<<Number or Additional descriptor, if needed to form a unique object name>>
Get (Shared Container) Ln_Get_<<Number or Additional descriptor, if needed to form a unique object name>>
Put (Shared Container) Ln_Put_<<Number or Additional descriptor, if needed to form a unique object name>>
Input Ln_In_<<Number or Additional descriptor, if needed to form a unique object name>>
Output Ln_Out_<<Number or Additional descriptor, if needed to form a unique object name>>
Delete Ln_Del_<<Number or Additional descriptor, if needed to forma a unique object name>>
Insert Ln_Ins_<<Number or Additional descriptor, if needed to form a unique object name>>
Update Ln_Upd_<<Number or Additional descriptor, if needed to form a unique object name>>

 

Database Stage Naming Conventions

Entity Convention
Reject Rej_<<Table or Object Name>>
Lookup Lkp_<<Table or Object Name>>
Source Src_<<Table or Object Name>>
Target Tgt_<<Table or objectName>>

Transformer Stage Naming Conventions

Entity Convention
Loop Variable Lv_<<objectName>>
Stage Variable SV Sv_<<objectName>>
Transformer (native parallel) Tfm__<<objectName>>

File Stage Naming Conventions

Entity Convention
Sequential File SF__<<objectName>>
Complex Flat File CFF__<<objectName>>
File Set FS__<<objectName>>
Parallel dataset DS__<<objectName>>
Lookup File Set LFS__<<objectName>>
External Source XSrc__<<objectName>>
External Target XTgt__<<objectName>>

XML & Real-Time Stages Naming Conventions

Entity Convention
RTI Input RTIi_<<objectName>>
RTI Output RTIo_<<objectName>>
XML Input XMLi_<<objectName>>
XML Output XMLo_<<objectName>>
XML Transformer XMLt_<<objectName>>

Sequence Object Naming Conventions

Entity Convention
Sequence <<Application>>_SEQ_<<Application job Stream Name>>
Master Control Sequence (parent) <<Application>>_SEQ_Master_<<Application job Stream Name>>

 

Sequence Stage Naming Conventions

Entity Convention
Error Handler EH_<<PrimaryFunction>>
Job Activity Job_<<PrimaryFunction>>
Nested Condition NC_<<PrimaryFunction>>
Routine Activity Rtn_<<PrimaryFunction>>
Sequencer (All) SeqAll_<<Identifier>>
Sequencer (Any) SeqAny_<<Identifier>>
Notify Notify_<<PrimaryFunction>>
Terminator Activity TA__<<PrimaryFunction>>
Sequence Links (Non-messages) Lnk_<<Number or Description>>
Sequence Links (messages) Msg_<<Number or Description>>

Related References

 

IBM InfoSphere DataStage Migration Checklist

IBM InfoSphere DataStage Migration Checklist

IBM Infosphere Information Server (IIS)

Assuming that your InfoSphere instance has been installed and configured, here is a quick migration checklist to assist in making sure that you have performed the essential tasks.

 

Major Tasks Parent-Tasks Child-task Completion Status
Create Migration Package
Create Database scripts
Export DataStage components
Gather support files
Compress migration package
Baseline migration package in CM Tool
Upload package to target environment
Deploy Database Components
Backup target databases
Deploy database components
Resolve script errors
  Create JDBC, ODBC,  and/or TNSNAMES entries
  Install and Configure RDBMS client on Infosphere server
Load configuration and conversion data (if not loaded by ETL)
Deploy Support Files
  Create File Structures
WSDLs
Certificates
  Surrogate Key Files
  System Administration Scripts
  Job Scripts
  Node Configuration Files
Deploy DataStage Components
  Create Project (if required)
  Configure Project and/or Project Parameters (if required)
Import ETL’s into DataStage
Update Parameters and Parameter sets (if required)
File paths
Database names
Database credentials
Update job properties
File paths
Compile ETL using Multiple Job Compile
Resolve compilation errors
Smoke Test
Finalize CM Baseline