DataStage – Netezza Connector Action Column

Over the years have occasionally use the action column feature, however, the last month or so I have found myself using it quite a lot. This is especially true in relation to the tea set and not just in relation to the change capture stage.

The first thing you need to know is, if you want to prevent getting the ‘no action column found’ notice on the target stage, need to ensure that the action column has been coded to be a single character field char (1). Otherwise, the Netezza connector stage will not recognize your field as an action column.

While most developers will commonly work with the action column feature in relation to the change capture stage, it can also be very useful if you have created a field from one or more inputs to tell you what behavior the row requires. I have found that this approach can be very useful and efficient under the right circumstances.

Example Pattern for Action Column Using Multiple Source Selects

Example Pattern for Action Column Using Multiple Source Selects

Action column configuration example

Action Column Field Type

Action Column Field Type

 Change Code Values Mapping To Action Column

  • Here’s a quick reference table to provide the interpretation of the change type code to the actual one character action column value to which it will need to be interpreted.

Change Code Type

Change Type Code

Action Column Value

Copy (Data Without Changes)

0

No
value for this Change Type

Insert

1

I

Delete

2

D

Update

3

U

Example Transformer Stage, Derivation

  •  Here is a quick transformer stage derivation coding example to take advantage of the action call capabilities. If you haven’t already handled the removal of the copy rows, you may also want to add a constraint.
  • The combination I most frequently find myself using is the insert and update combination.
if Lnk_Out_To_Tfm.change_code=1 then ‘I’

Else if Lnk_Out_To_Tfm.change_code=2 then ‘D’

Else if Lnk_Out_To_Tfm.change_code=3 then ‘U’

Related References

Home > InfoSphere Information Server 11.7.0 > InfoSphere DataStage and QualityStage > Developing parallel jobs > Introduction to InfoSphere DataStage Balanced Optimization > Job design considerations  > Specific considerations for the Netezza connector

Infosphere Information Server (IIS) – Where you can view DataStage and QualityStage Logs?

During the course of the week, the discussion happened regarding the different places where a person might read the DataStage and QualityStage logs in InfoSphere. I hadn’t really thought about it, but here are a few places that come to mind:

  • IBM InfoSphere DataStage and QualityStage Operations Console
  • IBM InfoSphere DataStage and QualityStage Director client
  • IBM InfoSphere DataStage and QualityStage Designer client by pressing Ctrl+L

Printable PDF Version of this Article

Related Reference

IBM Knowledge Center> InfoSphere Information Server 11.7.0 > InfoSphere DataStage and QualityStage > Monitoring jobs

IBM Knowledge Center > InfoSphere Information Server 11.7.0 > Installing > Troubleshooting software installation > Log files

What Are The DataStage / QualityStage Join Stages?

Three Stages Which Join Records

Three Stages Which Join Records

While chasing an error to which only applied to join type stages, I thought it might be nice to identify what the InfoSphere Information Server DataStage / QualityStage are.  There are three of them, as you can see from the picture above, which are the:

  • Join Stage,
  • Lookup Stage,
  • And, Merge Stage.

All three stages that join data based on the values of identified key columns.

Related References

IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Lookup Stage

IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Join Stage

IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Merge Stage

https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/c_deeref_Merge_Stage.html

Parallel jobs on Windows fail with APT_IOPort::readBlkVirt;error

APT_IOPort::readBlkVirt Error Screenshot

APT_IOPort::readBlkVirt Error Screenshot

This a known error for windows systems and applies to DataStage and DataQuality jobs using the any or all the three join type stages (Join, Merge, and Lookup).

Error Message

  • <<Link name>>,0: APT_IOPort::readBlkVirt: read for block header, partition 0, [fd 4], returned -1 with errno 10,054 (Unknown error)

Message ID

  • IIS-DSEE-TFIO-00223

Applies To

  • Windows systems only
  • Parallel Engine Jobs the three join type stages (Join, Merge, and Lookup). It does not apply to Server Engine jobs.
  • Infosphere Information Server (IIS), Datastage and DataQuality 9.1 and higher

The Fix

  • Add the APT_NO_IOCOMM_OPTIMIZATION in project administrator and set to blank or 0. I left it blank so it would not impact other jobs
  • Add the environment variable to the job producing the error and set to 1

What it APT_NO_IOCOMM_OPTIMIZATION Does

  • Sets the use of shared memory as the transport type, rather than using the default sockets transport type.
  • Note that in most cases sockets transport type is faster, so, you likely will not to set this across the project as the default for all job. It is best to apply it as necessary for problematic jobs.

Related References

InfoSphere DataStage and QualityStage, Version 9.1 Job Compatibility

IBM Support, JR54078: PARALLEL JOBS ON WINDOWS FAIL WITH APT_IOPORT::READBLKVIRT; ERROR

IBM Support, Information Server DataStage job fails with unknown error 10,054.

 

SFDC – Using a timestamp literal in a where clause

Salesforce Connector

Salesforce Connector

Working with timestamp literals in the Infosphere SFDC Connector soql is much like working date literals.  So, here a quick example which may save you some time.

SOQL Timestamp String Literals Where Clause Rules

Basically, the timestamp pattern is straight forward and like the process for dates, but there are some differences. The basic rules are for a soql where clause:

  • No quotes
  • No functions
  • No Casting function, or casting for the where soql where clause to read it
  • It only applies to datetime fields
  • A Timestamp identifier ‘T’
  • And the ISO 1806 time notations

Example SOQL Timestamp String Literals

So, here are a couple of timestamp string literal examples in SQL:

  • 1901-01-01T00:00:00-00:00
  • 2016-01-31T00:00:00-00:00
  • 9999-10-31T00:00:00-00:00

Example SQL with Timestamp String Literal Where Clause

 

Select e.Id,

e.AccountId,

e.StartDateTime

From Event e

WHERE e.StartDateTime > 2014-10-31T00:00:00-00:00

 

Related References

Salesforce Developer Documentation

Home, Developer Documentation, Force.com SOQL and SOSL Reference

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_dateformats.htm

Salesforce Workbench

Home, Technical Library, Workbench

W3C

Date Time Formats

 

SFDC Salesforce Connector – Column Returns Null values, when SOQL Returns Data in Workbench

Salesforce Connector

Salesforce Connector

Recently, encountered a scenario, which is a little out of the norm while using the SFDC Connector.  Once the issue is understood, it is easily remedied.

The problem / Error

  • SOQL run in Salesforce workbench and column returns data
  • The DataStage job/ETL runs without errors or warnings
  • The target column output only returns null values

The Cause

In short the cause is a misalignment between the SOQL field name and the column name in the columns tab of the connector.

The Solution

The fix is simply to convert the dots in the field name to underscores.   Basically, a field name on SOQL of Account.RecordType.Name becomes Account_RecordType_Name.

Example Field / Column Name  Fix

Example SQL

Select c.Id,

c.AccountId,

c.CV_Account_Number__c,

c.Name,

c.Role__c,

c.Status__c,

c.Account.RecordType.Name

From Contact c

Columns Tab With Correct Naming Alignment

Please note that the qualifying dots have been converted to underscores.

infosphere datastage SFDC Connector Columns Tab

SFDC Connector Columns Tab

Related References

 

InfoSphere / Datastage – What are The support Connectors stages for dashDB?

dashDB

dashDB

In a recent discussion, the question came up concern which Infosphere Datastage connectors and/or stages are supported by IBM for dashDB.  So, it seems appropriate to share the insight gained from the question being answered.

What Datastage Connectors and/or stages are Supported for dashDB

You have three choices as to connectors, which may best meet you your needs based on the nature of your environment and the configuration chooses which have been applied:

  1. The DB2 Connector Stage
  2. The JDBC Connector stage
  3. The ODBC Stage

Related References

Connecting to IBM dashDB

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Information Server on Cloud offerings, Connecting to other systems, Connecting to IBM dashDB

DB2 connector

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, IBM DB2 databases, DB2 connector

ODBC stage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Older stages for connectivity, ODBC stage

JDBC data sources

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Multiple data sources, JDBC data sources

What is the convert function in Datastage?

Algorithm

Algorithm

 

What is the convert function in Datastage?

In its simplest form, the convert function in Infosphere DataStage is a string replacement operation.  Convert can be used to replace a specific character, a list of characters, or a unicode character (e.g. thumbs Up Sign or Grinning Face).

Convert Syntax

convert(‘<<Value to be replaced’,'<<Replacement value >>’,<<Input field>>)

Using the Convert Function to remove a list of Characters

Special Characters in DataStage Handles/converts special characters in a transformer stage, which can cause issues in XML processing and certain databases.

Convert a list of General Characters

Convert(“;:?\+&,*`#’$()|^~@{}[]%!”,”, TrimLeadingTrailing(Lnk_In.Description))

Convert Decimal and Double Quotes

Convert(‘ ” . ‘,”, Lnk_In.Description)

Convert Char(0)

This example replaces Char(0) with nothing essentially removing it as padding and/or space.

convert(char(0),”,Lnk_In.Description)

 

Related References

String functions

InfoSphere Information Server, InfoSphere Information Server 11.5.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Parallel transform functions, String functions

Data Modeling – Dimension Table Effective Practices

Database Table

Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

 

 

Datastage – When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0

APT_TSortOperator Warning

APT_TSortOperator Warning

The APT_TSortOperator  warning happens when there is a conflict in the portioning behavior between stages.  Usually, because the successor (down Stream) stage has the ‘Partitioning / Collecting’ and ‘Sorting’ property set in a way that conflicts with predecessor (upstream) stage’s properties, which it is set to preserver.  This can occur when the successor stage has the “Preserve Partitioning” property set to:

  • ‘Default (Propagate)’
  • ‘Propagate’, or
  • ‘Set’
Preserve Partitioning Property - list

Preserve Partitioning Property – list

Message ID

  • IIS-DSEE-TFOR-00074

Message Text

  • <<Link Name Where Warning Occurred>>: When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0.

Warning Fixes

  • First, if the verify that the partitioning behaviors of both stages are correct
  • If so, set the predecessor ‘Preserve Partitioning’ property to “Clear”
  • If not, then correct the partitioning behavior of the stage which is in error

Clear Partitioning Property Screenshot

Preserve Partitioning Property - Set To Clear

Preserve Partitioning Property – Set To Clear

Infosphere DataStage – Designer Client Repository Structure

Default Repository Structure

When a project is created, there is a default repository structure created for use in the DataStage designer client.

Default DataStage Repository Structure

Default DataStage Repository Structure

However, some additional organization will be required for most DataStage projects.  Usually, this organization occurs in in these areas:

  • Addition of structure within the “Jobs” folder
  • Addition of a “Parameter Sets” folder
  • Addition of structure within the “Table Definitions” folder
  • Addition of a “Developer Work Area” folder

Repository Structure within the “Jobs” folder

Below is a sample of a folder structure for multiple applications that share a common Repository.  Pattern includes, but does not illustrate all other delivered folders. In addition to the core folder structure, developers can create individual working, test, and in progress folders, which do not migrate, but keep work segregated.

Jobs Folder Pattern Datastage Repository Structure

Jobs Folder Pattern Datastage Repository Structure

Parameter Sets Folders

The parameter set folders or for two sets of information.

  • First, are the database parameters, which include data connections and the attached parameter sets.
  • The second, for job parameters, which may include parameter sets, for things like e-mail parameters, surrogate key file paths, etc.; which is a best practice, rather creating them as project level parameters.
Parameter Sets Folder Pattern Datastage Repository Structure

Parameter Sets Folder Pattern Datastage Repository Structure

Table Definitions

The Tables Definition folder have folders added to segregate the imported meta data for source and target system and, in some case, may need folders to logically organize imported meta which may reside within the same database and/or schema, but belong to different logical layer.

Table Definitions Folder Pattern DataStage Repository Structure

Table Definitions Folder Pattern DataStage Repository Structure

Infosphere Datastage – Standard Practice- Sequence Naming Conventions

Standards

Standards

Standard practices help you and other understand your work.  This can be very important when working on large teams, working across team boundaries, or when large complex sets of process and objects may be involved.  When you consider the importance of naming convention, when coupled with standard practice, the benefit should be obvious, but often practice doesn’t execute or document their conventions.  So, these standard naming conventions may help when none exist or you need to assemble your own naming conventions.

<<SomeIdentifier >> = should be replaced with appropriate information

  1. Sequence Object Naming Conventions
Entity Convention
Master Control Sequence (parent) Master_<<Application>>_ <<Application Job Stream Name>>_Seq
Sequence <<Application>>_<<job_Name>>_Seq

 

  1. Sequence Stage Naming Conventions
Entity Convention
End Loop EL__<<PrimaryFunction>>
Error Handler EH_<<PrimaryFunction>>
Execute Command EC_<<PrimaryFunction>>
Job Activity Job_<<PrimaryFunction>>
Nested Condition NC_<<PrimaryFunction>>
Notify Notify_<<PrimaryFunction>>
Routine Activity Rtn_<<PrimaryFunction>>
Sequence Links (messages) Msg_<<Number or Description>>
Sequence Links (Non-messages) Lnk_<<Number or Description>>
Sequencer Seqr_<<Number or Description>>
Sequencer (All) SeqAll_<<Identifier>>
Sequencer (Any) SeqAny_<<Identifier>>
Start Loop SL__<<PrimaryFunction>>
Terminator Activity TA__<<PrimaryFunction>>
User Variables UV__<<PrimaryPurpose>>
Wait For File WFF__<<PrimaryFunction>>

 

 

Related References

InfoSphere DataStage – Ways to Create a Datastage Parameter Set

Parameter Sets

Parameter Sets

There are three primary ways to create a parameter sets and is a different practice from adding ‘User Defined’ variables in InfoSphere DataStage Administrator. The ways to create a parameter set are:

  • Create a parameter set from a data connection stage
  • Create a Parameter Set from the navigation of DataStage designer, and
  • Create a Parameter Set from a job

Create a parameter set from Data Connection Stage

This is used to create parameter sets for Database connections parameters

To create a new Parameter Set from a Data Connection

  • Select: File > New > Other and select “Data Connection
  • Complete the data Connection stage properties, then save the stage.
  • Open the Connection stage and navigate to the “Parameters” Tab
  • Then, click on the “Associated Parameter Set” button, and Chose the “Create & Attach” menu item
  • This will Launch a Dialog
  • Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
  • In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set
  • On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
  • Click OK to save the Parameter set.

Create a Parameter Set from the navigation of DataStage designer

This is, perhaps, the more traditional way of creating a parameter set.

To create a new Parameter Set

  • Select: File > New > Other and select “Parameter Set”
  • This will Launch a Dialog
  • Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
  • In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set.
Note: Existing Environment Variables can also added.
  • Create a Parameter Set from a job
    On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
  • Click OK to save the Parameter set.

This approach is, perhaps, less traditional, but is equally effective, if you find yourself creating additional jobs and now need to share the same parameters.  This is a quick and easy to generate a parameter set from an existing job.

To create a new Parameter Set from a job

  • Open the job that you want to create a parameter set for.
  • Click “Edit > Job Properties” to open the “Job Properties” window.
  • Click the “Parameters” tab.
  • Press and hold the Ctrl key, then select the parameters that you want to include in the parameter set.
  • With your parameters highlighted, click “Create Parameter Set”.  The Parameter Set window opens.
    • Enter a name and short description for your parameter set.
    • Click the “Parameters” tab; the parameters that you selected are listed.
    • Click the ”Values” tab.
    • Enter a name in the Value File name field, then press Enter.  The value for each of your parameters is automatically populated with the path name that you entered.
    • If a default value is not already set, enter a value for each parameter. For example, if the variable is a Pathname type, enter a default path name.
    • Click “OK” to close the Parameter Set window.
    • In the Save Parameter Set As window, select the folder where you want to save your parameter set and click Save. When prompted to replace the selected parameters with the parameter set, click Yes.
  • Click “OK” to close the Job Properties window.

Related References

Infosphere Datastage – Data type conversions from Netezza to DataStage

IBM Infosphere Information Server (IIS)

IBM Infosphere Information Server (IIS)

 

Some recent research to eliminate some ETL Data Conversion issues, made me want to make an enhancement to the documentation provided by IBM, to prevent repeating the research.

Netezza data types and their equivalent InfoSphere DataStage data types

Netezza data types

InfoSphere DataStage data types (SQL types)

Notes

 
BYTEINT TinyInt
SMALLINT SmallInt
INT Integer
BIGINT BigInt
NUMERIC(p, s) Numeric, decimal, double Decimal and double are aliases of Numeric
FLOAT(p) Float
REAL Real
DOUBLE PRECISION Double
CHAR(n) Char
VARCHAR(n) VarChar
NCHAR(n) NChar
NVARCHAR(n) NVarChar
BOOLEAN Bit
DATE Date
TIME Time
TIME WITH TIME ZONE VarChar The value of time with time zone will be returned without the time zone information.
TIMESTAMP TimeStamp
INTERVAL VarChar You cannot load the interval data type from an external table.
ROWID BigInt
TRANSACTION ID BigInt
DATASLICE Integer

Related References

Data type conversions from Netezza to DataStage

InfoSphere Information Server,InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

*DataStage*DSR_PROJECT (Action=8); check DataStage is set up correctly in project

Error

Error

 

Basically, the Action=8 error, which I normally see when opening the DataStage Director Client application, means that one or more of the RT_LOG files have become corrupted.  Usually, this problem occurs in relation to disk space issues; although, there can be other causes.

Error Message

Error calling subroutine: *DataStage*DSR_PROJECT (Action=8); check DataStage is set up correctly in project

(Subroutine failed to complete successfully (30107))

The Cleanup approach

The cleanup process really consists of three primary steps:

  • Free disk space
  • Restart application process
  • And, fix corrupted log

Free Disk Space

This can consist of:

  • Cleaning ‘/tmp’ Space
  • Removing any large unnecessary files
  • Enlarging ‘/tmp’ space allocation
  • Adding addition disks space, if necessary

Restart Application Processes

Once you have free the disk space available restarting VM/server is recommended, However, if that is not a realistic option, then at least reboot the Infosphere Datastage engine to ensure the newly freed memory is registering with the applications and to ensure everything is restarted and running.

Fix Corrupted logs

Perhaps, the cleanest way reset the all logs is to perform a ‘Multiple Job Compile’.  Running the jobs will also over write the logs, but is a little more hit and miss, if not all the jobs are not in job streams/batches, which can be run at this time.  The logs can be manually overwritten by compiling the job or performing a reset.  The trick, with manual reset is that you have to know which job to reset, so, this could take a while to get them all. The logs can be manually, dropped and reset, but I recommend that approach only as a last resort.

 

 

*DataStage*DSR_SELECT (Action=3); check DataStage is set up correctly in project

Error

Error

Having encountered this DataStage client error in Linux a few times recently, I thought I would document the solution, which has worked for me.

Error Message:

Error calling subroutine: *DataStage*DSR_SELECT (Action=3); check DataStage is set up correctly in project

(Subroutine failed to complete successfully (30107))

Probable Cause of Error

  • NodeAgents has stopped running
  • Insufficient /temp disk space

Triage Approach

To fix this error in Linux:

  • Ensure disk space is available and you may want clean up the /tmp directory of any excel non-required files.
  • Start the NodeAgents.sh, if it is not running

Command to verify Node Agent is running

ps -ef | grep java | grep Agent

 

Command to Start Node Agent

This example command assumes the shell script is in its normal location, if not you will need to adjust the path.

/opt/IBM/InformationServer/ASBNode/bin/NodeAgents.sh start

Node Agent Logs

These logs may be helpful:

  • asbagent_startup.err
  • asbagent_startup.out

Node Agent Logs Location

This command will get you to where the logs are normally located:

cd /opt/IBM/InformationServer/ASBNode/

InfoSphere Datastage – How to Improve Sequential File Performance Using Parallel Environment Variables

APT_FILE_EXPORT_BUFFER_SIZE and APT_FILE_IMPORT_BUFFER_SIZE in DataStage Administrator

APT_FILE_EXPORT_BUFFER_SIZE and APT_FILE_IMPORT_BUFFER_SIZE in DataStage Administrator

While extensive use of sequential files is not best practice, sometimes there is no way around it, due to legacy systems and/or existing processes. However, recently, I have encountered a number of customers who are seeing significant performance issues with sequential file intensive processes. Sometimes it’s the job design, but often when you look at the project configuration they still have the default values. This is a quick and easy thing to check and to adjust to get a quick performance win, if they’ve not already been adjusted.These are delivered variables, but should seriously be considered for adjustment in nearly all data stage ETL projects. The adjustment must be based on the amount of available memory, the volume of work load that is sequential file intensive, and the environment you’re working in. Some experiential adjustment may be required, but I have provided a few recommendations below.

 

Environment Variable Properties

 

Category Name Type Parameter Name Prompt Size Default Value
Parallel > Operator Specific String APT_FILE_EXPORT_BUFFER_SIZE Sequential write buffer size Adjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096. 128
Parallel > Operator Specific String APT_FILE_IMPORT_BUFFER_SIZE Sequential read buffer size Adjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096. 128

Related References

 

 

InfoSphere DataStage – How to calculate age in a transformer

Age

Age

Occasionally, there is a need to calculate the between two dates for any number of reasons. For example, the age of a person, of an asset, age of an event.  So, having recently had to think about how to do this in a DataStage Transformer, rather in SQL, I thought it might be good to document a couple of approaches, which can provide the age.  This code does it at the year level, however, if you need the decimal digits or other handling them the rounding within the DecimalToDecimal function can be changed accordingly.

Age Calculation using Julian Date

DecimalToDecimal((JulianDayFromDate(<>) – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT) )/365.25, ‘trunc_zero’)

Age Calculation using Julian Date with Null Handling

If a date can be missing from you source input data, then null handling is recommended to prevent job failure.  This code uses 1901-01-01 as the null replacement values, but it can be any date your business requirement stipulates.

DecimalToDecimal((JulianDayFromDate( NullToValue(<>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) )  – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT)) /365.25, ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate

DecimalToDecimal(DaysSinceFromDate(<>, <>) /365.25 , ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate with Null Handling

Here is a second example of null handling being applied to the input data.

DecimalToDecimal(DaysSinceFromDate(<>, NullToValue(<< Input date (e.g.Date of Birth) >>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) ) /365.25 , ‘trunc_zero’)

 

 

 

 

Surrogate Key File Effective Practices

Database Table, Surrogate Key File Effective Practices

Surrogate Key File Effective Practices

Here are a few thoughts on effectively working with IBM Infosphere, Information Server, DataStage surrogate key files, which may prove useful for developers.

 

Placement

  • The main thing about placement is that it be in a consistent location. Developers and production support teams should need to guess or look up where it is for every DataStage project. So, it is best to put the surrogate keys in same base path and that each project has its own subfolder to facilitate migrations and to reduce the possibility of human error. Here Is the patch structure, which is commonly use:

Path

  • /data/SRKY/<<Project Name>>

Parameter Sets

  • As a best practice, the surrogate key file path should be in a parameter set and the parameter used in the jobs, as needed.  This simplifies maintenance, if and when changes to the path are required, and during migrations.

Surrogate Key Parameter Set Screenshot – Example Parameter Tab

Surrogate Key Parameter Set Screen Screen

Surrogate Key Parameter Set Screenshot

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Permissions

  • DataStage must have permissions to:
    • The entire parent path
    • The project folder, and
    • The surrogate key files themselves.

To ensure the DataStage has access to the path and Surrogate files, ensure:

  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to folders in the path, with at least a “-rw-r–r–“ (644) permissions level. Keeping the permissions to a minimum can, for obvious reasons,  prevent inadvertent overwrites of the surrogate key files; thus, avoiding some, potentially, serious cleanup.
  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions