DataStage – Netezza Connector Action Column

Over the years have occasionally use the action column feature, however, the last month or so I have found myself using it quite a lot. This is especially true in relation to the tea set and not just in relation to the change capture stage.

The first thing you need to know is, if you want to prevent getting the ‘no action column found’ notice on the target stage, need to ensure that the action column has been coded to be a single character field char (1). Otherwise, the Netezza connector stage will not recognize your field as an action column.

While most developers will commonly work with the action column feature in relation to the change capture stage, it can also be very useful if you have created a field from one or more inputs to tell you what behavior the row requires. I have found that this approach can be very useful and efficient under the right circumstances.

Example Pattern for Action Column Using Multiple Source Selects

Example Pattern for Action Column Using Multiple Source Selects

Action column configuration example

Action Column Field Type

Action Column Field Type

 Change Code Values Mapping To Action Column

  • Here’s a quick reference table to provide the interpretation of the change type code to the actual one character action column value to which it will need to be interpreted.

Change Code Type

Change Type Code

Action Column Value

Copy (Data Without Changes)

0

No
value for this Change Type

Insert

1

I

Delete

2

D

Update

3

U

Example Transformer Stage, Derivation

  •  Here is a quick transformer stage derivation coding example to take advantage of the action call capabilities. If you haven’t already handled the removal of the copy rows, you may also want to add a constraint.
  • The combination I most frequently find myself using is the insert and update combination.
if Lnk_Out_To_Tfm.change_code=1 then ‘I’

Else if Lnk_Out_To_Tfm.change_code=2 then ‘D’

Else if Lnk_Out_To_Tfm.change_code=3 then ‘U’

Related References

Home > InfoSphere Information Server 11.7.0 > InfoSphere DataStage and QualityStage > Developing parallel jobs > Introduction to InfoSphere DataStage Balanced Optimization > Job design considerations  > Specific considerations for the Netezza connector

DataStage – How to use single quoted parameter list in an Oracle Connector

Data Integration

Data Integration

While working with a client’s 9.1 DataStage version, I ran into a situation where they wanted to parameterize SQL where clause lists in an Oracle Connector stage, which honestly was not very straight forward to figure out.  First, if the APT_OSL_PARAM_ESC_SQUOTE is not set and single quotes are used in the parameter, the job creates unquoted invalid SQL when the parameter is populated.  Second, I found much of the information confusing and/or incomplete in its explanation.   After some research and some trial and error, here is how I resolved the issue.  I’ll endeavor to be concise, but holistic in my explanation.

When this Variable applies

This where I know this process applies, there may be other circumstances to which is this applicable, but I’m listing the ones here with which I have recent experience.

Infosphere Information Server Datastage

  • Versions 91, 11.3, and 11.5

Oracle RDBMS

  • Versions 11g and 12c

Configurations process

Here is a brief explanation of the steps I used to implement the where clause as a parameter.  Please note that in this example, I am using a job parameter to populate on a portion of the where clause, you can certainly pass the entire where clause as a parameter, if it is not too long.

Configure Project Variable in Administrator

  • Add APT_OSL_PARAM_ESC_SQUOTE to project in Administrator
  • Populate the APT_OSL_PARAM_ESC_SQUOTE Variable \
APT_OSL_PARAM_ESC_SQUOTE Project Variable

APT_OSL_PARAM_ESC_SQUOTE Project Variable

Create job parameter

Following your project name convention or standard practice, if you customer and/or project do not have established naming conventions, create the job parameter in the job. See jp_ItemSource parameter in the image below.

Job Parameter In Oracle Connector

Job Parameter In Oracle Connector

Add job parameter to Custom SQL in Select Oracle Connector Stage

On the Job parameter has been created, add the job parameter to the SQL statement of the job.

Job Parameter In SQL

Job Parameter In SQL

Related References

IBM Knowledge Center > InfoSphere Information Server 11.5.0

Connecting to data sources > Databases > Oracle databases > Oracle connector

IBM Support > Limitation of the Parameter APT_OSL_PARAM_ESC_SQUOTE on Plugins on Parallel Canvas

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and Quality > Stage > Reference > Parallel Job Reference > Environment Variables > Miscellaneous > APT_OSL_PARAM_ESC_SQUOTE

 

How to know if your Oracle Client install is 32 Bit or 64 Bit

Oracle Database, How to know if your Oracle Client install is 32 Bit or 64 Bit

Oracle Database

 

How to know if your Oracle Client install is 32 Bit or 64 Bit

Sometimes you just need to know if your Oracle Client install is 32 bit or 64 bit. But how do you figure that out? Here are two methods you can try.

The first method

Go to the %ORACLE_HOME%\inventory\ContentsXML folder and open the comps.xml file.
Look for <DEP_LIST> on the ~second screen.

If you see this: PLAT=”NT_AMD64” then your Oracle Home is 64 bit
If you see this: PLAT=”NT_X86” then your Oracle Home is 32 bit.

It is possible to have both the 32-bit and the 64-bit Oracle Homes installed.

The second method

This method is a bit faster. Windows has a different lib directory for 32-bit and 64-bit software. If you look under the ORACLE_HOME folder if you see a “lib” AND a “lib32” folder you have a 64 bit Oracle Client. If you see just the “lib” folder you’ve got a 32 bit Oracle Client.

Related References

 

Oracle – How to get a list of user permission grants

IBM Infosphere Information Server (IIS), Oracle – How to get a list of user permission grants

IBM Infosphere Information Server (IIS)

Since the Infosphere, information server, repository, has to be installed manually with the scripts provided in the IBM software, sometimes you run into difficulties. So, here’s a quick script, which I have found useful in the past to identify user permissions for the IAUSER on Oracle database’s to help rundown discrepancies in user permissions.

 

SELECT *

FROM ALL_TAB_PRIVS

WHERE  GRANTEE = ‘iauser’

 

If we cannot run against the ALL_TAB_PRIVS view, then we can try the ALL_TAB_PRIVS view:

 

SELECT *

FROM USER_TAB_PRIVS

WHERE  GRANTEE = ‘iauser’

 

Related References

oracle help Center > Database Reference > ALL_TAB_PRIVS view

InfoSphere / Datastage – What are The support Connectors stages for dashDB?

dashDB

dashDB

In a recent discussion, the question came up concern which Infosphere Datastage connectors and/or stages are supported by IBM for dashDB.  So, it seems appropriate to share the insight gained from the question being answered.

What Datastage Connectors and/or stages are Supported for dashDB

You have three choices as to connectors, which may best meet you your needs based on the nature of your environment and the configuration chooses which have been applied:

  1. The DB2 Connector Stage
  2. The JDBC Connector stage
  3. The ODBC Stage

Related References

Connecting to IBM dashDB

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Information Server on Cloud offerings, Connecting to other systems, Connecting to IBM dashDB

DB2 connector

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, IBM DB2 databases, DB2 connector

ODBC stage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Older stages for connectivity, ODBC stage

JDBC data sources

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Multiple data sources, JDBC data sources

What is the convert function in Datastage?

Algorithm

Algorithm

 

What is the convert function in Datastage?

In its simplest form, the convert function in Infosphere DataStage is a string replacement operation.  Convert can be used to replace a specific character, a list of characters, or a unicode character (e.g. thumbs Up Sign or Grinning Face).

Convert Syntax

convert(‘<<Value to be replaced’,'<<Replacement value >>’,<<Input field>>)

Using the Convert Function to remove a list of Characters

Special Characters in DataStage Handles/converts special characters in a transformer stage, which can cause issues in XML processing and certain databases.

Convert a list of General Characters

Convert(“;:?\+&,*`#’$()|^~@{}[]%!”,”, TrimLeadingTrailing(Lnk_In.Description))

Convert Decimal and Double Quotes

Convert(‘ ” . ‘,”, Lnk_In.Description)

Convert Char(0)

This example replaces Char(0) with nothing essentially removing it as padding and/or space.

convert(char(0),”,Lnk_In.Description)

 

Related References

String functions

InfoSphere Information Server, InfoSphere Information Server 11.5.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Parallel transform functions, String functions

Data Modeling – Fact Table Effective Practices

Database Table

Database Table

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Related References

Data Modeling – Dimension Table Effective Practices

Database Table

Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

 

 

Datastage – When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0

APT_TSortOperator Warning

APT_TSortOperator Warning

The APT_TSortOperator  warning happens when there is a conflict in the portioning behavior between stages.  Usually, because the successor (down Stream) stage has the ‘Partitioning / Collecting’ and ‘Sorting’ property set in a way that conflicts with predecessor (upstream) stage’s properties, which it is set to preserver.  This can occur when the successor stage has the “Preserve Partitioning” property set to:

  • ‘Default (Propagate)’
  • ‘Propagate’, or
  • ‘Set’
Preserve Partitioning Property - list

Preserve Partitioning Property – list

Message ID

  • IIS-DSEE-TFOR-00074

Message Text

  • <<Link Name Where Warning Occurred>>: When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0.

Warning Fixes

  • First, if the verify that the partitioning behaviors of both stages are correct
  • If so, set the predecessor ‘Preserve Partitioning’ property to “Clear”
  • If not, then correct the partitioning behavior of the stage which is in error

Clear Partitioning Property Screenshot

Preserve Partitioning Property - Set To Clear

Preserve Partitioning Property – Set To Clear

Infosphere DataStage – Boolean Handling for Netezza

Datastage Director Message - Numeric string expected

Datastage Director Message – Numeric string expected

 

Beware when you see this message when working with Boolean in DataStage, the message displays as informational (at list it did for me) not as a warning or an error.  Even though it seems innocuous, what it meant for my job, was the Boolean (‘true’ / ‘false’) was not being interpreted and everything posted to ‘false’.

In DataStage the Netezza ‘Boolean’ field/Data SQL type maps to the ‘Bit’ SQL type, which expects a numeric input of Zero (0) or one (1).  So, my solution (once I detected the problem during unit testing) was to put Transformer Stage logic in place to convert the Boolean input to the expected number value.

 

Netezza to Datastage Data Type Mapping

Netezza data types

InfoSphere DataStage

data types (SQL types)

Expected Input value

BOOLEAN Bit 0 or 1 (1 = true, 0 = false)

 

Transformer Stage logic Boolean Handling Logic

A Netezza Boolean field can store: true values, false values, and null. So, some thought should be given to you desired data outcome for nulls

This first example sets a that the nulls are set to a specific value, which can support a specific business rule for null handling and, also, provide null handling for non-nullable fields.  Here we are setting nulls to the numeric value for ‘true’ and all other non-true inputs to ‘false’.

If isnull(Lnk_Src_In.USER_ACTIVE) then 1 Else if Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

These second examples sets a that the nulls are set by the Else value, if your logic direction is correct value and still provides null handling for non-nullable fields.

  • If  Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

  • If  Lnk_Src_In.USER_ACTIVE = ‘False’ Then 0 Else 1

Director Log Message

Message ID

  • IIS-DSEE-TBLD-00008

Message Text

  • <<Link Name Where Message Occurred>>: Numeric string expected. Use default value.

Or something like this:

  • <<Link Name Where Message Occurred>>: Numeric string expected for input column ‘<<Field Name Here>>‘. Use default value.

Related References

Boolean

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza user-defined functions, UDX data types reference information, Supported data types, Boolean

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.udf.doc/r_udf_boolean_datatype.html

Data types and aliases

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.sproc.doc/c_sproc_data_types_aliases.html

Logical data types

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Data types, Logical data types

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_data_types_logical.html

Data type conversions from Netezza to DataStage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/com.ibm.swg.im.iis.conn.netezza.use.doc/topics/nzcc_mappingdatatypes.html

InfoSphere DataStage – Operations Manual Template

Documentation

Documentation

When projects transition to production, the receiving team need some transition documentation to help with knowledge transfer and to guide them while they get accustomed to operating the application and can form their own documentation.  This temple is a shell, which usually provides enough detail, with some updating, to provide a quick reference regarding the application.  Also, having a temple can be a real time saver, as opposed to writing it from scratch.

Related References

Netezza JDBC Error – Unterminated quoted string

The ‘Unterminated quoted string’ error occurs from time to time when working with the InfoSphere DataStage Netezza JDBC Connector stage and is nebulas, at best.  However, the solution is, normally, straight forward enough once you understand it.  Usually, this error is the result of target table fields or field being shorter than the input data.  The fix is, normally, to compare you input field lengths (or composite field length, if consolidation fields into one field) and adjusting the field length higher.  In some cases, if business rules allow you may be able to substring or truncate the input data length (not a recommended approach), but information can be lost with this approach.

Error

org.netezza.error.NzSQLException: ERROR:  Unterminated quoted string

Example Error Message

 

Tgt_IIS_Job_Dim,0: The connector encountered a Java exception:  org.netezza.error.NzSQLException: ERROR:  Unterminated quoted string    at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:287)    at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:76)  at org.netezza.sql.NzConnection.execute(NzConnection.java:2904)       at org.netezza.sql.NzStatement._execute(NzStatement.java:885)           at org.netezza.sql.NzPreparedStatament.executeUpdate(NzPreparedStatament.java:229)   at com.ibm.is.cc.jdbc.CC_JDBCRecordDataSetConsumer.executeStatements(CC_JDBCRecordDataSetConsumer.java:2846)               at com.ibm.is.cc.jdbc.CC_JDBCBigBufferRecordDataSetConsumer.consumeBigBuffer(CC_JDBCBigBufferRecordDataSetConsumer.java:712)

Infosphere Information Server (IIS) – Metadata Repository Databases

Databases & Schemas

Databases & Schemas

This is snippet of very useful information, which is buried in the IBM documentation. So, I thought I would bring it up a level for those of you who may be making decisions regarding Information server installations (IIS).  The table below provides an excerpt of the IBM IIS repository databases placement and guidance.  Please, keep in mind that which of these database may be required for your installation may be a subset of these database depending upon your IIS product offering license.

Metadata Repository Databases

Repository or Database

Description

Default Database And Schema

Active Infosphere Information Server Metadata Repository

Stores the metadata about external data sources that are governed, managed, and analyzed by InfoSphere Information Server components. Normally referred to as the metadata repository. Database: XMETA

Schema: XMETA

Database must be the same database that is used for the staging metadata repository.

Infosphere Information Server Staging Area Stores metadata that is imported from external data sources so that it can be examined before it is moved to the active metadata repository. Database: XMETA

Schema: XMETASR

Database must be the same database that is used for the active metadata repository.

Analysis Database Stores results of information analysis by InfoSphere Information Analyzer. Database: IADB

Schema: IAUSER

Database cannot be the same database that is used for the active metadata repository or staging area.

Operations Database Stores monitoring data that is displayed by the InfoSphere DataStage® and QualityStage®Operations Console. Database: XMETA

Schema: User-defined repository user name, typically DSODB

Database can be the same or different as the database that is used for the metadata repository.

Standardization Rules Designer Database Stores a copy of revisions to InfoSphere QualityStage rule sets that have been made in the IBM InfoSphere QualityStage Standardization Rules Designer. Database: XMETA

Schema: User-defined data store user name, typically SRDUSER

Database can be the same or different as the database that is used for the metadata repository.

Exceptions Database Stores exceptions that are generated by InfoSphere Information Server products and components. Database: XMETA

Schema: User-defined repository user name, typically ESDB

Database can be the same or different as the database that is used for the metadata repository.

Match Designer Database Stores the results of match test passes by InfoSphere QualityStage Match Designer, a component of InfoSphere QualityStage. This data store is an ODBC data source that is used as a staging area before match designs are checked in to the active metadata repository.

Database: MDDB

User-defined database name and schema name. No default, but typically MDDB.

Database cannot be the same database as that used for the metadata repository.

 analytics, #DataIntegration,#informationserver,#Informationtechnology, #IT,Component, 

IBM InfoSphere DataStage Migration Checklist

IBM InfoSphere DataStage Migration Checklist

IBM Infosphere Information Server (IIS)

Assuming that your InfoSphere instance has been installed and configured, here is a quick migration checklist to assist in making sure that you have performed the essential tasks.

 

Major Tasks Parent-Tasks Child-task Completion Status
Create Migration Package
Create Database scripts
Export DataStage components
Gather support files
Compress migration package
Baseline migration package in CM Tool
Upload package to target environment
Deploy Database Components
Backup target databases
Deploy database components
Resolve script errors
  Create JDBC, ODBC,  and/or TNSNAMES entries
  Install and Configure RDBMS client on Infosphere server
Load configuration and conversion data (if not loaded by ETL)
Deploy Support Files
  Create File Structures
WSDLs
Certificates
  Surrogate Key Files
  System Administration Scripts
  Job Scripts
  Node Configuration Files
Deploy DataStage Components
  Create Project (if required)
  Configure Project and/or Project Parameters (if required)
Import ETL’s into DataStage
Update Parameters and Parameter sets (if required)
File paths
Database names
Database credentials
Update job properties
File paths
Compile ETL using Multiple Job Compile
Resolve compilation errors
Smoke Test
Finalize CM Baseline