What Are The DataStage / QualityStage Join Stages?

Three Stages Which Join Records
Three Stages Which Join Records

While chasing an error to which only applied to join type stages, I thought it might be nice to identify what the InfoSphere Information Server DataStage / QualityStage are.  There are three of them, as you can see from the picture above, which are the:

  • Join Stage,
  • Lookup Stage,
  • And, Merge Stage.

All three stages that join data based on the values of identified key columns.

Related References

IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Lookup Stage

IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Join Stage

IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Merge Stage

https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/c_deeref_Merge_Stage.html

Parallel jobs on Windows fail with APT_IOPort::readBlkVirt;error

APT_IOPort::readBlkVirt Error Screenshot
APT_IOPort::readBlkVirt Error Screenshot

This a known error for windows systems and applies to DataStage and DataQuality jobs using the any or all the three join type stages (Join, Merge, and Lookup).

Error Message

  • <<Link name>>,0: APT_IOPort::readBlkVirt: read for block header, partition 0, [fd 4], returned -1 with errno 10,054 (Unknown error)

Message ID

  • IIS-DSEE-TFIO-00223

Applies To

  • Windows systems only
  • Parallel Engine Jobs the three join type stages (Join, Merge, and Lookup). It does not apply to Server Engine jobs.
  • Infosphere Information Server (IIS), Datastage and DataQuality 9.1 and higher

The Fix

  • Add the APT_NO_IOCOMM_OPTIMIZATION in project administrator and set to blank or 0. I left it blank so it would not impact other jobs
  • Add the environment variable to the job producing the error and set to 1

What it APT_NO_IOCOMM_OPTIMIZATION Does

  • Sets the use of shared memory as the transport type, rather than using the default sockets transport type.
  • Note that in most cases sockets transport type is faster, so, you likely will not to set this across the project as the default for all job. It is best to apply it as necessary for problematic jobs.

Related References

InfoSphere DataStage and QualityStage, Version 9.1 Job Compatibility

IBM Support, JR54078: PARALLEL JOBS ON WINDOWS FAIL WITH APT_IOPORT::READBLKVIRT; ERROR

IBM Support, Information Server DataStage job fails with unknown error 10,054.

 

SQL server table Describe (DESC) equivalent

Transact SQL (T-SQL)
Transact SQL (T-SQL)

Microsoft SQL Server doesn’t seem have a describe command and usually, folks seem to want to build a stored procedure to get the describe behaviors.  However, this is not always practical based on your permissions. So, the simple SQL below will provide describe like information in a pinch.  You may want to dress it up a bit; but I usually just use it raw, as shown below by adding the table name.

Describe T-SQL Equivalent

Select *

 

From INFORMATION_SCHEMA.COLUMNS

Where TABLE_NAME = ‘<<TABLENAME>>’;

Related References

Microsoft SQL Server – Useful links

Microsoft SQL Server 2017
Microsoft SQL Server 2017

Here are a few references for the Microsoft SQL Server 2017 database, which may be helpful.

Table Of Useful Microsoft SQL Server Database References

Reference Type

Link

SQL Server 2017 Download Page

https://www.microsoft.com/en-us/sql-server/sql-server-downloads

SQL SERVER version, edition, and update level

https://support.microsoft.com/en-us/help/321185/how-to-determine-the-version–edition-and-update-level-of-sql-server-a

SQL Server 2017 Release Notes

https://docs.microsoft.com/en-us/sql/sql-server/sql-server-2017-release-notes

SQL Server Transact SQL Commands

https://technet.microsoft.com/en-us/library/ms189826(v=sql.90).aspx

Related References

What is Source Control?

Acronyms, Abbreviations, Terms, And Definitions
Acronyms, Abbreviations, Terms, And Definitions

Source Control is an Information technology environment management system for storing, tracking and managing changes to software. This is commonly done through a process of creating branches (copies for safely creating new features) off of the stable master version of the software, then merging stable feature branches back into the master version. This is also known as version control or revision control.

DataStage – How to Pass the Invocation ID from one Sequence to another

DataStage Invocation ID Passing Pattern Overview
DataStage Invocation ID Passing Pattern Overview

When you are controlling a chain of sequences in the job stream and taking advantage of reusable (multiple instances) jobs it is useful to be able to pass the Invocation ID from the master controlling sequence and have it passed down and assigned to the job run.  This can easily be done with needing to manual enter the values in each of the sequences, by leveraging the DSJobInvocationId variable.  For this to work:

  • The job must have ‘Allow Multiple Instance’ enabled
  • The Invocation Id must be provided in the Parent sequence must have the Invocation Name entered
  • The receiving child sequence will have the invocation variable entered
  • At runtime, a DataStage invocation id instance of the multi-instance job will generate with its own logs.

Variable Name

  • DSJobInvocationId

Note

This approach allows for the reuse of job and the assignment of meaningful instance extension names, which are managed for a single point of entry in the object tree.

Related References: 

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and QualityStage > Designing DataStage and QualityStage jobs > Building sequence jobs > Sequence job activities > Job Activity properties

Why Consilience Is Important?

Tree of knowledge
Tree of knowledge

What is Consilience?

Consilience is the confluence of concepts and/or principles from different disciplines, especially, when forming a comprehensive unifying theory.

Independent Confirmation

Why are some inventions discovered at the same time in different parts of the world? Does this have something to do with the scientific process of “sharing important discoveries?” Generally, scientists believe that they are part of a community of knowledge. Their discoveries do not occur in a vacuum. They must give credit to those who went before and created the foundation for their work. Therefore, when they discover something new, they are required to share it with the entire world. This sharing is part of knowledge evolution. Interestingly enough, it is also key to the World Wide Web. Collaboration is one of the key strengths of the Internet. It is a way to increase overall knowledge of Planet Earth. Science can also increase the strength of their theories through independent confirmation.

Result Conciliation

There are oftentimes prescriptions for the types and numbers of witnesses to accomplish certain legal requirements. Anyone who has completed an experiment understands the importance of result conciliation. A hypothesis is not proven to be true unless it can be repeated by independent sources. This shows that the reality is objective. The word, Consilience was formed by two Latin words – “com” meaning “together” and the suffix “-silence” meaning “jumping.” Therefore, Consilience means “jumping together” or a “convergence of proof from independent sources.” Scientists should use different methods to reach the same conclusion. Business and economics have a similar concept. Just think of the concept of a Recession or Depression. These are officially declared when a variety of indicators are in agreement – stock market, employment, inflation, money supply and so forth.

Knowledge Evolution

Consulting can use the concept of Consilience to teach firms how to follow objective norms. Technology consulting can compare a subjective company’s practices to objective industry norms. The best career development is successful based on objective, independent analysis. The concordance of evidence can help a business create a successful strategy. Consilience is the convergence of evidence from independent sources to prove the validity of a conclusion. Objective corporate success can be achieved by satisfying objective needs of your customers. Business intelligence requires an objective standard, such as Consilience to be useful.

Conclusion

Consilience is important to you because the answer to any given problem may not necessarily come from within your field of expertise and experience. rather, to be truly competitive in an ever in an ever increasing world of knowledge, we need to adopt a broad-scoped renaissance approach to learning and thinking, which folds in other sets of concepts and principles to create the durable solutions for today and tomorrow.