Netezza / PureData – How to add comments on a field

Netezza / PureData Table Documentation Practices
Netezza / PureData Table Documentation Practices

The ‘Comment on Column’ provides the same self-documentation capability as ‘Comment On table’, but drives the capability to the column field level.  This provides an opportunity to describe the purpose, business meaning, and/or source of a field to other developers and users.  The comment code is part of the DDL and can be migrated with the table structure DDL.  The statement can be run independently, or working with Aginity for PureData System for Analytics, they can be run as a group, with the table DDL, using the ‘Execute as a Single Batch (Ctrl+F5) command.

Basic ‘COMMENT ON field’ Syntax

  • The basic syntax to add a comment to add a comment to a column is:

COMMENT ON COLUMN <<Schema.TableName.ColumnName>> IS ‘<<Descriptive Comment>>’;

Example ‘COMMENT ON Field’ Syntax

  • This is example syntax, which would need to be changed and applied to each column field:

COMMENT ON COLUMN time_dim.time_srky IS ‘time_srky is the primary key and is a surrogate key derived from the date business/natural key’;

Related References

What is the URL for InfoSphere Information Analyzer

Information Analyzer Icon
Information Analyzer Icon

This is one of those things, which usually comes up during new installs.   To access the Infosphere information analyzer thin client there are two approaches. First, is via the Infosphere launchpad page and, the second is to go directly to the information analyzer page. Both URLs are provided in below.

InfoSphere LaunchPad URL

https:// <<Host_Server>> : 9443/ibm/iis/launchpad/

Information Analyzer URL 

https:// <<Host_Server>> : 9443/ibm/iis/dq/da/login.jsp

Related References

InfoSphere Datastage – How to Improve Sequential File Performance Using Parallel Environment Variables

APT_FILE_EXPORT_BUFFER_SIZE and APT_FILE_IMPORT_BUFFER_SIZE in DataStage Administrator
APT_FILE_EXPORT_BUFFER_SIZE and APT_FILE_IMPORT_BUFFER_SIZE in DataStage Administrator

While extensive use of sequential files is not best practice, sometimes there is no way around it, due to legacy systems and/or existing processes. However, recently, I have encountered a number of customers who are seeing significant performance issues with sequential file intensive processes. Sometimes it’s the job design, but often when you look at the project configuration they still have the default values. This is a quick and easy thing to check and to adjust to get a quick performance win, if they’ve not already been adjusted.These are delivered variables, but should seriously be considered for adjustment in nearly all data stage ETL projects. The adjustment must be based on the amount of available memory, the volume of work load that is sequential file intensive, and the environment you’re working in. Some experiential adjustment may be required, but I have provided a few recommendations below.

Environment Variable Properties

Category NameTypeParameter NamePromptSizeDefault Value
Parallel > Operator SpecificStringAPT_FILE_EXPORT_BUFFER_SIZESequential write buffer sizeAdjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096.128
Parallel > Operator SpecificStringAPT_FILE_IMPORT_BUFFER_SIZESequential read buffer sizeAdjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096.128

Related References

InfoSphere DataStage – How to calculate age in a transformer

Age
Age

Occasionally, there is a need to calculate the between two dates for any number of reasons. For example, the age of a person, of an asset, age of an event.  So, having recently had to think about how to do this in a DataStage Transformer, rather in SQL, I thought it might be good to document a couple of approaches, which can provide the age.  This code does it at the year level, however, if you need the decimal digits or other handling them the rounding within the DecimalToDecimal function can be changed accordingly.

Age Calculation using Julian Date

DecimalToDecimal((JulianDayFromDate(<>) – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT) )/365.25, ‘trunc_zero’)

Age Calculation using Julian Date with Null Handling

If a date can be missing from you source input data, then null handling is recommended to prevent job failure.  This code uses 1901-01-01 as the null replacement values, but it can be any date your business requirement stipulates.

DecimalToDecimal((JulianDayFromDate( NullToValue(<>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) )  – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT)) /365.25, ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate

DecimalToDecimal(DaysSinceFromDate(<>, <>) /365.25 , ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate with Null Handling

Here is a second example of null handling being applied to the input data.

DecimalToDecimal(DaysSinceFromDate(<>, NullToValue(<< Input date (e.g.Date of Birth) >>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) ) /365.25 , ‘trunc_zero’)

Netezza Connector Stage, Table name required warning for User-defined SQL Write mode

Recently, while working at a customer site and I encountered an anomaly in the Netezza Connector stage, when choosing ‘User-defined SQL’ write mode, the ‘Table name’ displays a caution / warning even though a table name should not be required.  If you are using a user-defined SQL statement and/or have parametrized your SQL scripts to make the job reusable, each SQL and/or SQL script would have its own schema and table name being passed in.  After some investigation, a workaround was found, which both allows you to populate table name and leverage with different schema and table names within your SQl statement and/or.

Table Name, User-defined SQL, Warning

You will notice, in a screenshot below the ‘User-defined SQL’, ‘write mode’, the property has been chosen, a parameter has been placed in the ‘User-defined SQL’ property, and ‘Read user-defined SQL from a file’ property has been set to ‘Yes’.  However, the yellow triangle displays on the ‘Table name’ property marking it as a required item.  This, also, occurs when placing SQL statements in the User-defined SQL property, whether reading from a file or not.

Netezza Connector User-Defined SQL , Table Name Required , Warning
Netezza Connector User-Defined SQL , Table Name Required , Warning

Table Name, User-defined SQL, Warning Workaround

After some experimentation, the workaround is straight forward enough.  Basically, give the ‘table name’ property something to read successfully, so it can move on to the user-defined SQL and/or user defined SQl file script, which the process actually needs to execute. In the screenshot below, the SYSTEM.DEFINITION_SCHEMA._V_DUAL view was used, so, it could be found, then the script file passed by the parameter runs fine.  Another view or table, which the DataStage user has access to, should just as well.

Netezza Connector, User-Define SQL, Table Name Required Fix
Netezza Connector, User-Define SQL, Table Name Required Fix

Related References

Surrogate Key File Effective Practices

Database Table, Surrogate Key File Effective Practices
Surrogate Key File Effective Practices

Here are a few thoughts on effectively working with IBM Infosphere, Information Server, DataStage surrogate key files, which may prove useful for developers.

Placement

  • The main thing about placement is that it be in a consistent location. Developers and production support teams should need to guess or look up where it is for every DataStage project. So, it is best to put the surrogate keys in same base path and that each project has its own subfolder to facilitate migrations and to reduce the possibility of human error. Here Is the patch structure, which is commonly use:

Path

  • /data/SRKY/<<Project Name>>

Parameter Sets

  • As a best practice, the surrogate key file path should be in a parameter set and the parameter used in the jobs, as needed.  This simplifies maintenance, if and when changes to the path are required, and during migrations.

Surrogate Key Parameter Set Screenshot – Example Parameter Tab

Surrogate Key Parameter Set Screen Screen
Surrogate Key Parameter Set Screenshot

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab
Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter
Surrogate Key Job Parameter Example Path using Parameter

Permissions

  • DataStage must have permissions to:
    • The entire parent path
    • The project folder, and
    • The surrogate key files themselves.

To ensure the DataStage has access to the path and Surrogate files, ensure:

  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to folders in the path, with at least a “-rw-r–r–“ (644) permissions level. Keeping the permissions to a minimum can, for obvious reasons,  prevent inadvertent overwrites of the surrogate key files; thus, avoiding some, potentially, serious cleanup.
  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions
Surrogate Key File Minimum Permissions