Netezza / PureData – How to add a Foreign Key

DDL (Data Definition Language), Netezza PureData How to add a Foreign Key

DDL (Data Definition Language)

Adding a forging key to tables in Netezza / PureData is a best practice; especially, when working with dimensionally modeled data warehouse structures and with modern governance, integration (including virtualization), presentation semantics (including reporting, business intelligence and analytics).

Foreign Key (FK) Guidelines

  • A primary key must be defined on the table and fields (or fields) to which you intend to link the foreign key
  • Avoid using distribution keys as foreign keys
  • Foreign Key field should not be nullable
  • Your foreign key link field(s) must be of the same format(s) (e.g. integer to integer, etc.)
  • Apply standard naming conventions to constraint name:
    • FK_<<Constraint_Name>>_<<Number>>
    • <<Constraint_Name>>_FK<<Number>>
  • Please note that foreign key constraints are not enforced in Netezza

Steps to add a Foreign Key

The process for adding foreign keys involves just a few steps:

  • Verify guidelines above
  • Alter table add constraint SQL command
  • Run statistics, which is optional, but strongly recommended

Basic Foreign Key SQL Command Structure

Here is the basic syntax for adding Foreign key:

ALTER TABLE <<Owner>>.<<NAME_OF_TABLE_BEING_ALTERED>>

ADD CONSTRAINT <<Constraint_Name>>_fk<Number>>

FOREIGN KEY (<<Field_Name or Field_Name List>>) REFERENCES <<Owner>>.<<target_FK_Table_Name>.(<<Field_Name or Field_Name List>>) <<On Update | On Delete>> action;

Example Foreign Key SQL Command

This is a simple one field example of the foreign key (FK)

 

ALTER TABLE Blog.job_stage_fact

ADD CONSTRAINT job_stage_fact_host_dim_fk1

FOREIGN KEY (hostid) REFERENCES Blog.host_dim(hostid) ON DELETE cascade ON UPDATE no action;

Related References

Alter Table

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL command reference, Alter Table, constraints

 

 

Advertisements

Database – What is a foreign key?

Acronyms, Abbreviations, Terms, And Definitions, DDL (Data Definition Language), What is a foreign key

Acronyms, Abbreviations, Terms, And Definitions

Definition of a Foreign Key

  • A foreign Key (FK) is a constraint that references the unique primary key (PK) of another table.

Facts About Foreign Keys

  • Foreign Keys act as a cross-reference between tables linking the foreign key (Child record) to the Primary key (parent record) of another table, which establishing a link/relationship between the table keys
  • Foreign keys are not enforced by all RDBMS
  • The concept of referential integrity is derived from foreign key theory
  • Because Foreign keys involve more than one table relationship, their implementation can be more complex than primary keys
  • A foreign-key constraint implicitly defines an index on the foreign-key column(s) in the child table, however, manually defining a matching index may improve join performance in some database
  • The SQL, normally, provides the following referential integrity actions for deletions, when enforcing foreign-keys

Cascade

  • The deletion of a parent (primary key) record may cause the deletion of corresponding foreign-key records.

No Action

  • Forbids the deletion of a parent (primary key) record, if there are dependent foreign-key records.   No Action does not mean to suppress the foreign-key constraint.

Set null

  • The deletion of a parent (primary key) record causes the corresponding foreign-key to be set to null.

Set default

  • The deletion of a record causes the corresponding foreign-keys be set to a default value instead of null upon deletion of a parent (primary key) record

Related References

 

Database – What is a Composite Primary Key?

Database Table

Database Table

What is a Composite Primary Key?

A Composite Primary key is Primary key What a primary key, which is defined by having multiple fields (columns) in it.  Like a Primary Key what a composite Primary Key is depends on the database.  Essentially a Composite Primary Key:

  • Is a combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And may define parent-Child relationship between tables

Related References

Database – What is a Primary Key?

Database Table

Database Table

What is a primary Key?

What a primary key is depends, somewhat, on the database.  However, in its simplest form a primary key:

  • Is a field (Column) or combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And, depending on the data model can, define parent-Child relationship between tables

Related References

Data Modeling – Dimension Table Effective Practices

Database Table

Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

 

 

Infosphere DataStage – Designer Client Repository Structure

Default Repository Structure

When a project is created, there is a default repository structure created for use in the DataStage designer client.

Default DataStage Repository Structure

Default DataStage Repository Structure

However, some additional organization will be required for most DataStage projects.  Usually, this organization occurs in in these areas:

  • Addition of structure within the “Jobs” folder
  • Addition of a “Parameter Sets” folder
  • Addition of structure within the “Table Definitions” folder
  • Addition of a “Developer Work Area” folder

Repository Structure within the “Jobs” folder

Below is a sample of a folder structure for multiple applications that share a common Repository.  Pattern includes, but does not illustrate all other delivered folders. In addition to the core folder structure, developers can create individual working, test, and in progress folders, which do not migrate, but keep work segregated.

Jobs Folder Pattern Datastage Repository Structure

Jobs Folder Pattern Datastage Repository Structure

Parameter Sets Folders

The parameter set folders or for two sets of information.

  • First, are the database parameters, which include data connections and the attached parameter sets.
  • The second, for job parameters, which may include parameter sets, for things like e-mail parameters, surrogate key file paths, etc.; which is a best practice, rather creating them as project level parameters.
Parameter Sets Folder Pattern Datastage Repository Structure

Parameter Sets Folder Pattern Datastage Repository Structure

Table Definitions

The Tables Definition folder have folders added to segregate the imported meta data for source and target system and, in some case, may need folders to logically organize imported meta which may reside within the same database and/or schema, but belong to different logical layer.

Table Definitions Folder Pattern DataStage Repository Structure

Table Definitions Folder Pattern DataStage Repository Structure

Infosphere Datastage – Data type conversions from Netezza to DataStage

IBM Infosphere Information Server (IIS)

IBM Infosphere Information Server (IIS)

 

Some recent research to eliminate some ETL Data Conversion issues, made me want to make an enhancement to the documentation provided by IBM, to prevent repeating the research.

Netezza data types and their equivalent InfoSphere DataStage data types

Netezza data types

InfoSphere DataStage data types (SQL types)

Notes

 
BYTEINT TinyInt
SMALLINT SmallInt
INT Integer
BIGINT BigInt
NUMERIC(p, s) Numeric, decimal, double Decimal and double are aliases of Numeric
FLOAT(p) Float
REAL Real
DOUBLE PRECISION Double
CHAR(n) Char
VARCHAR(n) VarChar
NCHAR(n) NChar
NVARCHAR(n) NVarChar
BOOLEAN Bit
DATE Date
TIME Time
TIME WITH TIME ZONE VarChar The value of time with time zone will be returned without the time zone information.
TIMESTAMP TimeStamp
INTERVAL VarChar You cannot load the interval data type from an external table.
ROWID BigInt
TRANSACTION ID BigInt
DATASLICE Integer

Related References

Data type conversions from Netezza to DataStage

InfoSphere Information Server,InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

Netezza / PureData – What is the maximum length of CHAR, VARCHAR, and NVARCHAR fields

IBM Netezza / IBM PureData for Analytics

IBM Netezza / IBM PureData for Analytics

 

 

During a recent project, I had reason to investigate what the maximum character field links for Netezza to be able to fit the data ETL ‘ed from the source.  The constraints on Netezza character fields are not as long as some other databases, therefore, I thought I would document and a quick post for future reference.   Knowing these limits will help with your ETL design and construction but you may know where to truncate your field input data.

 

Data type Alias names InfoSphere DataStage data types (SQL type) Maximum Length Notes
CHAR CHARACTER, CHAR(n), CHARACTER(n) Char 64,000 Fixed-length character string, blank padded to length n. If you do not specify n, the default is an unsized CHAR value.
VARCHAR CHARACTER VARYING, VARCHAR(n), CHARACTER VARYING(n), CHAR VARYING(n) VarChar, LongVarChar 64,000 Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized VARCHAR value. There is no blank padding, and the value is stored as entered.
NCHAR NATIONAL CHARACTER, NATIONAL CHAR(n), NCHAR(size) NChar, Char, 16,000 Fixed-length character string, blank padded to length n. If you do not specify n, the default is an unsized NCHAR value.
NVARCHAR NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NATIONAL CHAR VARYING(n), NATIONAL CHARACTER VARYING(n), and NVARCHAR(n) VarChar, NVarChar ,LongVarChar, LongNVarChar, VarChar 16,000 Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized NVARCHAR value.

Related References

Data types and aliases

PureData System for Analytics ,PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

Data type conversions from DataStage to Netezza

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from DataStage to Netezza

Netezza / PureData – How to add comments on a field

Netezza / PureData Table Documentation Practices

Netezza / PureData Table Documentation Practices

The ‘Comment on Column’ provides the same self-documentation capability as ‘Comment On table’, but drives the capability to the column field level.  This provides an opportunity to describe the purpose, business meaning, and/or source of a field to other developers and users.  The comment code is part of the DDL and can be migrated with the table structure DDL.  The statement can be run independently, or working with Aginity for PureData System for Analytics, they can be run as a group, with the table DDL, using the ‘Execute as a Single Batch (Ctrl+F5) command.

Basic ‘COMMENT ON field’ Syntax

  • The basic syntax to add a comment to add a comment to a column is:

COMMENT ON COLUMN <<Schema.TableName.ColumnName>> IS ‘<<Descriptive Comment>>’;

Example ‘COMMENT ON Field’ Syntax

  • This is example syntax, which would need to be changed and applied to each column field:

COMMENT ON COLUMN time_dim.time_srky IS ‘time_srky is the primary key and is a surrogate key derived from the date business/natural key’;

 

Related References

 

Netezza/ PureData – how to add a primary key

Netezza / PureData Create Primary Key (PK)

Netezza / PureData Create Primary Key (PK)

While primary keys (PK) are not enforced within Netezza, they still provide significant value and should be added.  Among the values that adding primary key provides are:

  • Inform tools, which have meta Data import capabilities; for example, Aginity, ETL tools, data modeling tools, Infosphere Data Architect, DataStage and DataQuality, Infosphere Information Server suite of tools (e.g. Governance Console, Information analyzer, etc.).
  • Visually helps developers, data modelers, and users to know what the keys primary keys of the table are, which may not be obvious with the table structure. This is especially true for table utilizing compound keys as the primary key.
  • The query optimizer will use these definitions define efficient query execution plans
  • Identifying Primary Keys information provides migratable self-documenting Meta Data
  • Aides in the facilitation of future data and application enrichment projects

 

Basic Primary Key syntax

ALTER TABLE <<schema.tablename>> ADD CONSTRAINT <<ConstraintName>> PRIMARY KEY (FieldNames);

 

Example Primary Key syntax

ALTER TABLEtime_dim ADD CONSTRAINT time_dim_pk PRIMARY KEY (time_srky);

 

Related References

 

Netezza / PureData – How to add comments on a table

Comment On A Netezza / PureData Table

Comment On A Netezza / PureData Table

Using the ‘Comment On’ capabilities of Netezza is an easy way to provide self-documentation within the DDL structure of a table.  This can be very helpful to developers and users, who can see the DDL of the table.  Personally, I think of it as a way to document the purpose and/or business meaning of a table.  This can be especially, when differing team may be working with tables. Also, reduce the reliance on developer and/or user memory, or to search of the technical documentation, which may or may not be readily available.

Basic ‘COMMENT ON Table’ Syntax

The basic syntax to add a comment to add a comment to a table is:

COMMENT ON TABLE <<Schema.TableName>> IS ‘<<Descriptive Comment>>’;

 

Example ‘COMMENT ON Table’ Syntax

COMMENT ON TABLEdate_dim IS ‘Date dimension table, which provides basic calendar date information’;

Related References

 

InfoSphere Datastage – How to Improve Sequential File Performance Using Parallel Environment Variables

APT_FILE_EXPORT_BUFFER_SIZE and APT_FILE_IMPORT_BUFFER_SIZE in DataStage Administrator

APT_FILE_EXPORT_BUFFER_SIZE and APT_FILE_IMPORT_BUFFER_SIZE in DataStage Administrator

While extensive use of sequential files is not best practice, sometimes there is no way around it, due to legacy systems and/or existing processes. However, recently, I have encountered a number of customers who are seeing significant performance issues with sequential file intensive processes. Sometimes it’s the job design, but often when you look at the project configuration they still have the default values. This is a quick and easy thing to check and to adjust to get a quick performance win, if they’ve not already been adjusted.These are delivered variables, but should seriously be considered for adjustment in nearly all data stage ETL projects. The adjustment must be based on the amount of available memory, the volume of work load that is sequential file intensive, and the environment you’re working in. Some experiential adjustment may be required, but I have provided a few recommendations below.

 

Environment Variable Properties

 

Category Name Type Parameter Name Prompt Size Default Value
Parallel > Operator Specific String APT_FILE_EXPORT_BUFFER_SIZE Sequential write buffer size Adjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096. 128
Parallel > Operator Specific String APT_FILE_IMPORT_BUFFER_SIZE Sequential read buffer size Adjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096. 128

Related References

 

 

InfoSphere DataStage – How to calculate age in a transformer

Age

Age

Occasionally, there is a need to calculate the between two dates for any number of reasons. For example, the age of a person, of an asset, age of an event.  So, having recently had to think about how to do this in a DataStage Transformer, rather in SQL, I thought it might be good to document a couple of approaches, which can provide the age.  This code does it at the year level, however, if you need the decimal digits or other handling them the rounding within the DecimalToDecimal function can be changed accordingly.

Age Calculation using Julian Date

DecimalToDecimal((JulianDayFromDate(<>) – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT) )/365.25, ‘trunc_zero’)

Age Calculation using Julian Date with Null Handling

If a date can be missing from you source input data, then null handling is recommended to prevent job failure.  This code uses 1901-01-01 as the null replacement values, but it can be any date your business requirement stipulates.

DecimalToDecimal((JulianDayFromDate( NullToValue(<>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) )  – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT)) /365.25, ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate

DecimalToDecimal(DaysSinceFromDate(<>, <>) /365.25 , ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate with Null Handling

Here is a second example of null handling being applied to the input data.

DecimalToDecimal(DaysSinceFromDate(<>, NullToValue(<< Input date (e.g.Date of Birth) >>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) ) /365.25 , ‘trunc_zero’)

 

 

 

 

Surrogate Key File Effective Practices

Database Table, Surrogate Key File Effective Practices

Surrogate Key File Effective Practices

Here are a few thoughts on effectively working with IBM Infosphere, Information Server, DataStage surrogate key files, which may prove useful for developers.

 

Placement

  • The main thing about placement is that it be in a consistent location. Developers and production support teams should need to guess or look up where it is for every DataStage project. So, it is best to put the surrogate keys in same base path and that each project has its own subfolder to facilitate migrations and to reduce the possibility of human error. Here Is the patch structure, which is commonly use:

Path

  • /data/SRKY/<<Project Name>>

Parameter Sets

  • As a best practice, the surrogate key file path should be in a parameter set and the parameter used in the jobs, as needed.  This simplifies maintenance, if and when changes to the path are required, and during migrations.

Surrogate Key Parameter Set Screenshot – Example Parameter Tab

Surrogate Key Parameter Set Screen Screen

Surrogate Key Parameter Set Screenshot

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Permissions

  • DataStage must have permissions to:
    • The entire parent path
    • The project folder, and
    • The surrogate key files themselves.

To ensure the DataStage has access to the path and Surrogate files, ensure:

  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to folders in the path, with at least a “-rw-r–r–“ (644) permissions level. Keeping the permissions to a minimum can, for obvious reasons,  prevent inadvertent overwrites of the surrogate key files; thus, avoiding some, potentially, serious cleanup.
  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

What are the Factor Affecting the Selection of Data Warehouse Naming Convention?

Data Warehouse Naming Convention

Database Table

The primary factors affecting the choices in the creation of Data Warehouse (DW) naming convention policy standards are the type of implementation, pattern of the implementation, and any preexisting conventions.

Type of implementation

The type of implementation will affect your naming convention choices. Basically, this boils down to, are you working with a Commercial-Off-The-Shelf (COTS) data warehouse or doing a custom build?

Commercial-Off-The-Shelf (COTS)

If it is a Commercial-Off-The-Shelf (COTS) warehouse, which you are modifying and or enhancing, then it is very strongly recommended that you conform to the naming conventions of the COTS product.  However, you may want to add an identifier to the conventions to identify your custom object.

Using this information as an exemplar:

  • FAV = Favinger, Inc. (Company Name – Custom Identifier)
  • GlobalSales = Global Sales (Subject)
  • MV = Materialized View (Object Type)

 

Suffix Pattern Naming Convention

<<Custom Identifier>>_<<Object Subject Name>>_<<Object Type>>

Example:  FAV_GlobalSales_MV

Prefix Pattern Naming Convention

<<Object Type>>_<<Custom Identifier>>_<<Object Subject Name>>

Example: MV_FAV_GlobalSales

Custom Data Warehouse Build

If you are creating a custom data warehouse from scratch, then you have more flexibility to choose your naming convention.  However, you will still need to take into account a few factors to achieve the maximum benefit from you naming conventions.

  • What is the high level pattern of you design?
  • Are there any preexisting naming conventions?

Data Warehouse Patterns

Your naming convention will need to take into account the overall intent and design pattern of the data warehouse, the objects and naming conventions of each pattern will vary, if for no other reason than the differences in the objects, their purpose, and the depth of their relationships.

High level Pattern of the Data Warehouse Implementation

The high level pattern of you design whether an Operational Data Store (ODS), Enterprise Data Warehouse (EDW), Data Mart (DM) or something else, will need to guide your naming convention, as the depth of logical and/or processing zone of each pattern will vary  and have some industry generally accepted conventions.

Structural Pattern of the Data Warehouse Implementation

The structural pattern of your data warehouse design whether, Snowflake, 3rd Normal Form, or Star Schema, will  need to guide your naming convention, as the depth of relationships each pattern will vary, have some industry generally accepted conventions, and will relate directly to you High level Data Warehouse pattern.

Preexisting Conventions

Often omitted factor of data warehouse naming conventions are the sources of preexisting conventions, which can have significant impacts both from an engineering and political point of view. The sources of these conventions can vary and may or may not be formally documented.

A common source naming convention conflict is with preexisting implementations, which may not even be document.  However, system objects and consumers are familiar will be exposed to these conventions, will need to be taken into account when accessing impacts to systems, political culture, user training, and the creation of a standard convention for your data warehouse.

The Relational Database Management System (RDBMS) in which you intend to build the data warehouse may have generally accepted conventions, which consumers may be familiar and have a preconceived expectations whether expressed or intended).

Change Management

Whatever data warehouse naming convention you chose, the naming conventions along with the data warehouse design patterns assumptions, should be well documented and placed in a managed and readily accessible, change management (CM) repository.

 

 

PureData – Table Effective Practices

IBM Netezza PureData Table Effective Practices

Database Table

Here a few tips, which can make a significant difference to the efficiency and effectiveness of developers and users, making information available to them when developing and creating analytic objects.  This information can, also, be very help to data modelers.  While some of these recommendations are not enforced by Netezza/PureData, this fact makes them no less helpful to your community.

Alter table to Identify Primary Keys (PK)

  • Visually helps developers and users to know what the keys primary keys of the table are
  • Primary key information can, also, be imported as meta data by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
  • The query optimizer will use these definitions define efficient query execution plans

Alter table to Identify Foreign Keys (FK)

  • Illustrate table relationships for developers and users
  • Foreign key information can, also, be imported as meta data by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
  • The query optimizer will use these definitions define efficient query execution plans

Limit Distribution Key to Non-Updatable Fields

  • This one seems obvious, but this problem occurs regularly, if tables and optimizations are not properly planned; Causing an error will be generated, if an update is attempted against a field contain in the distribution of a table.

Use Null on Fields

  • Using ‘Not Null’ whenever the field data and ETL transformation rules can enforce it, helps improve performance by reducing the number of null condition checks performed and reduces storage.

Use Consistence Field Properties

  • Use the same data type and field length in all tables with the same field name, reduces the amount of interpretation/conversion required by the system, developers, and report SQL.

Schedule Table Optimizations

  • Work with your DBA’s to determine the best scheduling time, system user, and priority of groom and generate statistics operations. Keep in mind the relationship to when the optimizations occur in relation to when users need to consume the data. All too often, this operation is not performed before users need the performance and/or is driven by DBA choice, without proper consideration to other processing performance needs.  This has proven, especially true, in data warehousing when the DBA does not have Data warehousing experience and/or does not understand the load patterns of the ETL/ELT process.

Related Links

 

The Value of Data Indicator Flags

Data Indicator Flags

Data Indicator Flags

I cannot count the times, which using a flag (also, called an indicator) is described as a nice to have in database table design, at least, until the code runs into complexity and/or performance challenges.

When designing your data models, ETL’s, and reports it is useful to consider how indicator flags can help.  While indicator flags are, normally, binary in nature, such as True/False or Yes/No, but indicator flags don’t always need to be binary.

How indicator flags can help your processes and reporting:

  • Provide an equijoin for complex business rules, which can otherwise result ‘Not In List’, ‘In list’, ‘Not Exists., ‘Exists’, ‘Not Equal To’ and sub-selects SQL statements
  • Provide processing maker to prevent look ups to other tables to determine an attribute. For example, a snapshot type (daily, weekly, monthly) flag, which can be used to apply data retention rules.
  • Provide a mark for special circumstance. For example, a Legal Hold flag to mark record to be exempted from removal and/or change to meet legal proceeding requirements.
  • The judicious planning and use of flags can reduce the number of full table scans required against large tables.

Related References

DB2 to Netezza data type conversions

IBM DB2 Netezza SQL

IBM DB2 Netezza SQL

 

This is a  extract table I created from the IBM source, a while back ,when investing what format to convert data fields into for IBM Informsphere Datastage.  I have had it floating around in my notes, but lately, I have found myself referencing  it to help other team members, so, it seems useful to include it here.   The notes column is just a few snippits of information, which I have found useful to reference when planning data field conversions.

 

Transformer data type DB2 data type Netezza data type Notes
BIGINT BIGINT BIGINT
BINARY CHARACTER Not supported
BIT SMALLINT BOOLEAN
CHAR CHARACTER CHAR
DATE DATE DATE
DECIMAL DECIMAL NUMERIC or DECIMAL
DOUBLE DOUBLE DOUBLE PRECISION or FLOAT(15)
FLOAT DOUBLE FLOAT
INTEGER INTEGER INTEGER
LONGVARBINARY BLOB Not supported
LONGVARCHAR CLOB VARCHAR The maximum character string size is 64,000.
NUMERIC DECIMAL NUMERIC
REAL REAL REAL or FLOAT(6)
SMALLINT SMALLINT SMALLINT
TIME TIME TIME DB2 9.1 TIME data type does not support fractional digits or microseconds.
TIMESTAMP TIMESTAMP TIMESTAMP DB2 9.1 TIME data type does not support fractional digits or microseconds.
TINYINT SMALLINT BYTEINT
VARBINARY VARCHAR Not supported
VARCHAR VARCHAR VARCHAR The maximum character string size is 64,000.
WCHAR GRAPHIC Not supported
WLONGVARCHAR LONG VARGRAPHIC Not supported
WVARCHAR VARGRAPHIC Not supported

 

Related References

InfoSphere Information Server InfoSphere Information Server 11.5.0 – Data types

IBM PureData System for Analytics, Version 7.1 – Data types and aliases