Infosphere DataStage – Boolean Handling for Netezza

Datastage Director Message - Numeric string expected

Datastage Director Message – Numeric string expected

 

Beware when you see this message when working with Boolean in DataStage, the message displays as informational (at list it did for me) not as a warning or an error.  Even though it seems innocuous, what it meant for my job, was the Boolean (‘true’ / ‘false’) was not being interpreted and everything posted to ‘false’.

In DataStage the Netezza ‘Boolean’ field/Data SQL type maps to the ‘Bit’ SQL type, which expects a numeric input of Zero (0) or one (1).  So, my solution (once I detected the problem during unit testing) was to put Transformer Stage logic in place to convert the Boolean input to the expected number value.

 

Netezza to Datastage Data Type Mapping

Netezza data types

InfoSphere DataStage

data types (SQL types)

Expected Input value

BOOLEAN Bit 0 or 1 (1 = true, 0 = false)

 

Transformer Stage logic Boolean Handling Logic

A Netezza Boolean field can store: true values, false values, and null. So, some thought should be given to you desired data outcome for nulls

This first example sets a that the nulls are set to a specific value, which can support a specific business rule for null handling and, also, provide null handling for non-nullable fields.  Here we are setting nulls to the numeric value for ‘true’ and all other non-true inputs to ‘false’.

If isnull(Lnk_Src_In.USER_ACTIVE) then 1 Else if Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

These second examples sets a that the nulls are set by the Else value, if your logic direction is correct value and still provides null handling for non-nullable fields.

  • If  Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

  • If  Lnk_Src_In.USER_ACTIVE = ‘False’ Then 0 Else 1

Director Log Message

Message ID

  • IIS-DSEE-TBLD-00008

Message Text

  • <<Link Name Where Message Occurred>>: Numeric string expected. Use default value.

Or something like this:

  • <<Link Name Where Message Occurred>>: Numeric string expected for input column ‘<<Field Name Here>>‘. Use default value.

Related References

Boolean

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza user-defined functions, UDX data types reference information, Supported data types, Boolean

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.udf.doc/r_udf_boolean_datatype.html

Data types and aliases

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.sproc.doc/c_sproc_data_types_aliases.html

Logical data types

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Data types, Logical data types

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_data_types_logical.html

Data type conversions from Netezza to DataStage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/com.ibm.swg.im.iis.conn.netezza.use.doc/topics/nzcc_mappingdatatypes.html

Infosphere Datastage – Data type conversions from Netezza to DataStage

IBM Infosphere Information Server (IIS)

IBM Infosphere Information Server (IIS)

 

Some recent research to eliminate some ETL Data Conversion issues, made me want to make an enhancement to the documentation provided by IBM, to prevent repeating the research.

Netezza data types and their equivalent InfoSphere DataStage data types

Netezza data types

InfoSphere DataStage data types (SQL types)

Notes

 
BYTEINT TinyInt
SMALLINT SmallInt
INT Integer
BIGINT BigInt
NUMERIC(p, s) Numeric, decimal, double Decimal and double are aliases of Numeric
FLOAT(p) Float
REAL Real
DOUBLE PRECISION Double
CHAR(n) Char
VARCHAR(n) VarChar
NCHAR(n) NChar
NVARCHAR(n) NVarChar
BOOLEAN Bit
DATE Date
TIME Time
TIME WITH TIME ZONE VarChar The value of time with time zone will be returned without the time zone information.
TIMESTAMP TimeStamp
INTERVAL VarChar You cannot load the interval data type from an external table.
ROWID BigInt
TRANSACTION ID BigInt
DATASLICE Integer

Related References

Data type conversions from Netezza to DataStage

InfoSphere Information Server,InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

Netezza / PureData – What is the maximum length of CHAR, VARCHAR, and NVARCHAR fields

IBM Netezza / IBM PureData for Analytics

IBM Netezza / IBM PureData for Analytics

 

 

During a recent project, I had reason to investigate what the maximum character field links for Netezza to be able to fit the data ETL ‘ed from the source.  The constraints on Netezza character fields are not as long as some other databases, therefore, I thought I would document and a quick post for future reference.   Knowing these limits will help with your ETL design and construction but you may know where to truncate your field input data.

 

Data type Alias names InfoSphere DataStage data types (SQL type) Maximum Length Notes
CHAR CHARACTER, CHAR(n), CHARACTER(n) Char 64,000 Fixed-length character string, blank padded to length n. If you do not specify n, the default is an unsized CHAR value.
VARCHAR CHARACTER VARYING, VARCHAR(n), CHARACTER VARYING(n), CHAR VARYING(n) VarChar, LongVarChar 64,000 Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized VARCHAR value. There is no blank padding, and the value is stored as entered.
NCHAR NATIONAL CHARACTER, NATIONAL CHAR(n), NCHAR(size) NChar, Char, 16,000 Fixed-length character string, blank padded to length n. If you do not specify n, the default is an unsized NCHAR value.
NVARCHAR NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NATIONAL CHAR VARYING(n), NATIONAL CHARACTER VARYING(n), and NVARCHAR(n) VarChar, NVarChar ,LongVarChar, LongNVarChar, VarChar 16,000 Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized NVARCHAR value.

Related References

Data types and aliases

PureData System for Analytics ,PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

Data type conversions from DataStage to Netezza

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from DataStage to Netezza

IBM InfoSphere DataStage – Parallel Environment Variables

Most of this list of Parallel Environment Variables can be found in the IBM InfoSphere DataStage, Version 11.5 documentation.  However, I have started to find variables,  which I use and are not included in the IBM list.  So, for simplicity, I will make additions and clarifications to the IBM list, as I run across them, on this page.

Performance Tuning

 These environment variable are frequently used in tuning Datastage performance.

  • APT_BUFFER_FREE_RUN (See, also, Buffering)
  • APT_BUFFER_MAXIMUM_MEMORY (See, also,  Buffering)
  • APT_COMPRESS_BOUNDED_FIELDS
  • APT_FILE_IMPORT_BUFFER_SIZE  (See, also, Reading and Writing Files)
  • APT_FILE_EXPORT_BUFFER_SIZE (See, also, Reading and Writing Files)
  • TMPDIR (This variable also specifies the directory for Netezza log files on all operating systems.  Setting TMDIR paths is a Netezza best practice.)

Buffering

These environment variable are all concerned with the buffering InfoSphere DataStage performs on stage links to avoid deadlock situations

  • APT_BUFFER_FREE_RUN
  • APT_BUFFER_MAXIMUM_MEMORY
  • APT_BUFFER_MAXIMUM_TIMEOUT
  • APT_BUFFER_DISK_WRITE_INCREMENT
  • APT_BUFFERING_POLICY
  • APT_DISABLE_ROOT_FORKJOIN

Building Custom Stages

These environment variables are concerned with the building of custom operators that form the basis of customized stages.

  • APT_BUILDOP_SOURCE_CHARSET
  • APT_SINGLE_CHAR_CASE
  • DS_OPERATOR_BUILDOP_DIR
  • DS_OPERATOR_WRAPPED_DIR
  • OSH_BUILDOP_CODE
  • OSH_BUILDOP_HEADER
  • OSH_BUILDOP_NO_OPTIMIZE
  • OSH_BUILDOP_OBJECT
  • OSH_BUILDOP_WRAPPER
  • OSH_BUILDOP_XLC_BIN
  • OSH_CBUILDOP_XLC_BIN
  • OSH_STDOUT_MSG

Compiler

These environment variables specify details about the C++ compiler used by InfoSphere DataStage in connection with parallel jobs.

  • APT_COMPILER
  • APT_COMPILEOPT
  • APT_LINKER
  • APT_LINKOPT

DB2 Support

These environment variables are concerned with setting up access to DB2 databases from InfoSphere DataStage.

  • APT_DB2INSTANCE_HOME
  • APT_DB2READ_LOCK_TABLE
  • APT_DBNAME
  • APT_DEBUG_DB2
  • APT_RDBMS_COMMIT_ROWS
  • APT_TIME_ALLOW_24
  • DB2DBDFT

Debugging

These environment variables are concerned with the debugging of InfoSphere DataStage parallel jobs.

  • APT_DEBUG_OPERATOR
  • APT_DEBUG_MODULE_NAMES
  • APT_DEBUG_PARTITION
  • APT_DEBUG_SIGNALS
  • APT_DEBUG_STEP
  • APT_DEBUG_SUBPROC
  • APT_EXECUTION_MODE
  • APT_NO_PM_SIGNAL_HANDLERS
  • APT_PM_DBX
  • APT_PM_SHOW_PIDS
  • APT_PM_XTERM
  • APT_PXDEBUGGER_FORCE_SEQUENTIAL
  • APT_SHOW_LIBLOAD
  • DS_OSH_WRAPPER_DEBUG_CONNECT
  • DS_OSH_WRAPPER_TIMEOUT
  • DS_PXDEBUG

Decimal Support

These environment variables are concerned with support for decimal columns.

  • APT_DECIMAL_INTERM_PRECISION
  • APT_DECIMAL_INTERM_SCALE
  • APT_DECIMAL_INTERM_ROUNDMODE
  • DS_USECOLPREC_FOR_DECIMAL_KEY

Disk I/O

These environment variables are all concerned with when and how InfoSphere DataStage parallel jobs write information to disk.

  • APT_BUFFER_DISK_WRITE_INCREMENT
  • APT_EXPORT_FLUSH_COUNT
  • APT_IO_MAP/APT_IO_NOMAP and APT_BUFFERIO_MAP/APT_BUFFERIO_NOMAP
  • APT_PHYSICAL_DATASET_BLOCK_SIZE

General Job Administration

These environment variables are concerned with details about the running of InfoSphere DataStage and IBM InfoSphere QualityStage® parallel jobs

  • APT_CLOBBER_OUTPUT
  • APT_CONFIG_FILE
  • APT_DISABLE_COMBINATION
  • APT_DONT_COMPRESS_BOUNDED_FIELDS
  • APT_EXECUTION_MODE
  • APT_FILE_EXPORT_ADD_BOM
  • APT_IMPORT_FORCE_QUOTE_DELIM
  • APT_ORCHHOME
  • APT_STARTUP_SCRIPT
  • APT_NO_STARTUP_SCRIPT
  • APT_STARTUP_STATUS
  • DSForceTerminate
  • DSIPC_OPEN_TIMEOUT
  • DSJOB_DOMAIN
  • DSWaitShutdown
  • DS_FORCE_ABORT_AT_WARN_LIMIT
  • DS_LOG_AUTOPURGE_IGNORE_STATUS

Job Monitoring

These environment variables are concerned with the Job Monitor on InfoSphere DataStage.

  • APT_MONITOR_SIZE
  • APT_MONITOR_MINTIME
  • APT_MONITOR_TIME
  • APT_NO_JOBMON
  • APT_PERFORMANCE_DATA

Lookup support

This environment variable is concerned with lookup tables.

  • APT_LUTCREATE_NO_MMAP

Miscellaneous

These environment variables do not fit into the other categories.

  • APT_AGGREGATOR_NULLABLE_OUTPUT
  • APT_COPY_TRANSFORM_OPERATOR
  • APT_DATASET_FLUSH_NOFSYNC
  • APT_DATASET_FLUSH_NOSYNC
  • APT_DATE_CENTURY_BREAK_YEAR
  • APT_DATE_ADD_ROLLOVER
  • APT_DISABLE_FASTALLOC
  • APT_DONT_OPTIMIZE_MODIFY
  • APT_EBCDIC_VERSION
  • APT_FILE_EXPORT_DEFAULTS_TO_CONDUCTOR
  • APT_FIFO_DIRECTORY
  • APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL
  • APT_IMPORT_REJECT_INVALID_CHARS
  • APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS
  • APT_INSERT_COPY_BEFORE_MODIFY
  • APT_ISVALID_BACKCOMPAT
  • APT_OLD_BOUNDED_LENGTH
  • APT_OLD_CUTOFF
  • APT_OUTPUT_LOCALE
  • APT_OPERATOR_REGISTRY_PATH
  • APT_OSL_PARAM_ESC_SQUOTE
  • APT_OSL_RESTORE_BACKSLASH
  • APT_PLAYERS_REPORT_IN
  • APT_PM_ACCEPT_CONNECTION_RETRIES
  • APT_PM_ACCEPT_CONNECTION_TIMEOUT
  • APT_PM_NO_SHARED_MEMORY
  • APT_PM_NO_NAMED_PIPES
  • APT_PM_SCORE_DIR
  • APT_PM_STARTUP_CONCURRENCY
  • APT_RECORD_COUNTS
  • APT_RESPATH
  • APT_SHOW_COMPONENT_CALLS
  • APT_STACK_TRACE
  • APT_SURRKEY_BLOCK_WRITE
  • APT_SURRKEY_LOCKSTATE_RETRIES
  • APT_THREAD_SAFE_FAST_ALLOC
  • APT_TRANSFORM_ABORT_ON_CONVERSION_ERROR
  • APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING
  • APT_TRANSFORM_LOOP_WARNING_THRESHOLD
  • APT_TRANSFORM_OPERATOR_DEBUG
  • APT_USE_CRLF
  • APT_VIEWDATA_TEMP_DIR
  • DSAttLockWait
  • DSE_SLAVE_CLOSE_SOCKET_ON_EXEC
  • DSJobStartedMax
  • DSODBC_EXECUTE_MULTI_STMTS_AS_ONE
  • DSODBC_FATAL_ERRORS
  • DSODBC_NEW_FVMARKERS
  • DSODBC_NO_BIGINT_WARNINGS
  • DSODBC_NO_DB2_DELETE_WARNINGS
  • DSODBC_NO_METADATA_WARNINGS
  • DSR_RECORD_DELETE_BEFORE_WRITERAW
  • DSSocketNotifyTimeout
  • DSWaitResetStartup
  • DSWaitStartup
  • DS_API_DEBUG
  • DS_CHANGE_SEQ_PUT_BEHAVIOR
  • DS_EXECUTE_NO_MASKING
  • DS_HOSTNAME_ALIAS
  • DS_IPCPUT_OLD_TIMEOUT_BEHAVIOR
  • DS_LEGACY_AUDIT
  • DS_LOGDETAIL_INVOCATION_OLDFORM
  • DS_LONG_JOBSTATUS_LOCK
  • DS_MAX_PREC_FOR_FLOATS
  • DS_MMAPPATH
  • DS_MMAPSIZE
  • DS_NO_INSTANCE_PURGING
  • DS_OPTIMIZE_FILE_BROWSE
  • DS_SEQ_BYPASS_CHOWN
  • DS_STATUSSTARTED_CHECK
  • DS_TRX_ALLOW_LINKVARIABLES
  • OSH_PRELOAD_LIBS
  • PX_DBCONNECTHOME
  • DS_USE_OLD_STATUS_PURGE
  • DS_USE_OSHSCRIPT
  • DS_USE_SERVER_AUTH_ONLY
  • ODBCBindingOrder
  • ODBCstripCRLF
  • OSHMON_INIT_RETRY
  • OSHMON_TRACE

Network

These environment variables are concerned with the operation of InfoSphere DataStage parallel jobs over a network.

  • APT_DEBUG_ENVIRONMENT
  • APT_PM_ENV_DEBUG
  • APT_PM_ENV_NOCLOBBER
  • APT_IO_MAXIMUM_OUTSTANDING
  • APT_IOMGR_CONNECT_ATTEMPTS
  • APT_PLAYER_CONNECTION_PORT
  • APT_PM_CONDUCTOR_TIMEOUT
  • APT_PM_CONDUCTOR_HOSTNAME
  • APT_PM_CONNECT_USING_NAME
  • APT_PM_NO_TCPIP
  • APT_PM_NODE_TIMEOUT
  • APT_PM_PLAYER_TIMEOUT
  • APT_PM_SHOWRSH
  • APT_PM_STARTUP_PORT
  • APT_RECVBUFSIZE
  • APT_USE_IPV4

National Language Support (NLS)

These environment variables are concerned with InfoSphere DataStage’s implementation of NLS.

  • APT_COLLATION_SEQUENCE
  • APT_COLLATION_STRENGTH
  • APT_ENGLISH_MESSAGES
  • APT_IMPEXP_CHARSET
  • APT_INPUT_CHARSET
  • APT_OS_CHARSET
  • APT_OUTPUT_CHARSET
  • APT_STRING_CHARSET

Oracle Support

These environment variables are concerned with the interaction between InfoSphere DataStage and Oracle databases.

  • APT_ORACLE_LOAD_OPTIONS
  • APT_ORACLE_NO_OPS
  • APT_ORACLE_PRESERVE_BLANKS
  • APT_ORA_IGNORE_CONFIG_FILE_PARALLELISM
  • APT_ORA_WRITE_FILES
  • APT_ORAUPSERT_COMMIT_ROW_INTERVAL
  • APT_ORAUPSERT_COMMIT_TIME_INTERVAL
  • ODBCKeepSemicolon

Partitioning

The following environment variables are concerned with how InfoSphere DataStage automatically partitions data.

  • APT_NO_PART_INSERTION
  • APT_NO_PARTSORT_OPTIMIZATION
  • APT_PARTITION_COUNT
  • APT_PARTITION_NUMBER

Reading and Writing Files

These environment variables are concerned with reading and writing files.

  • APT_DELIMITED_READ_SIZE
  • APT_FILE_IMPORT_BUFFER_SIZE
  • APT_FILE_EXPORT_BUFFER_SIZE
  • APT_IMPORT_FILE_PATTERN_CMD
  • APT_IMPORT_HANDLE_SHORT
  • APT_IMPORT_PATTERN_USES_CAT
  • APT_IMPORT_PATTERN_USES_FILESET_MOUNTED
  • APT_MAX_DELIMITED_READ_SIZE
  • APT_STRING_PADCHAR

Reporting

These environment variables are concerned with various aspects of InfoSphere DataStage jobs reporting their progress.

  • APT_DUMP_SCORE
  • APT_ERROR_CONFIGURATION
  • APT_MSG_FILELINE
  • APT_PM_PLAYER_MEMORY
  • APT_PM_PLAYER_TIMING
  • APT_RECORD_COUNTS
  • OSH_DUMP
  • OSH_ECHO
  • OSH_EXPLAIN
  • OSH_PRINT_SCHEMAS

SAS Support

These environment variables are concerned with InfoSphere DataStage interaction with SAS.

  • APT_HASH_TO_SASHASH
  • APT_NO_SASOUT_INSERT
  • APT_NO_SAS_TRANSFORMS
  • APT_SAS_ACCEPT_ERROR
  • APT_SAS_CHARSET
  • APT_SAS_CHARSET_ABORT
  • APT_SAS_COMMAND
  • APT_SASINT_COMMAND
  • APT_SAS_DEBUG
  • APT_SAS_DEBUG_IO
  • APT_SAS_DEBUG_LEVEL
  • APT_SAS_DEBUG_VERBOSE
  • APT_SAS_S_ARGUMENT
  • APT_SAS_NO_PSDS_USTRING
  • APT_SAS_SCHEMASOURCE_DUMP
  • APT_SAS_SHOW_INFO
  • APT_SAS_TRUNCATION

Sorting

The following environment variables are concerned with how InfoSphere DataStage automatically sorts data.

  • APT_NO_SORT_INSERTION
  • APT_SORT_INSERTION_CHECK_ONLY
  • APT_TSORT_NO_OPTIMIZE_BOUNDED
  • APT_TSORT_STRESS_BLOCKSIZE

Teradata Support

The following environment variables are concerned with InfoSphere DataStage interaction with Teradata databases.

  • APT_TERA_64K_BUFFERS
  • APT_TERA_NO_ERR_CLEANUP
  • APT_TERA_NO_PERM_CHECKS
  • APT_TERA_NO_SQL_CONVERSION
  • APT_TERA_SYNC_DATABASE
  • APT_TERA_SYNC_PASSWORD
  • APT_TERA_SYNC_USER

Transport Blocks

The following environment variables are all concerned with the block size used for the internal transfer of data as jobs run. Some of the settings only apply to fixed length records.

  • APT_AUTO_TRANSPORT_BLOCK_SIZE
  • APT_LATENCY_COEFFICIENT
  • APT_DEFAULT_TRANSPORT_BLOCK_SIZE
  • APT_MAX_TRANSPORT_BLOCK_SIZE
  • APT_MIN_TRANSPORT_BLOCK_SIZE

Related References

TMPDIR

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Environment variables: Netezza connector, TMPDIR

Environment Variables

InfoSphere Information Server, InfoSphere Information Server 11.5.0, InfoSphere DataStage and QualityStage, Reference, Parallel Job Reference, Environment Variables

Environment Variables For The Parallel Engine

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Installing, Installing IBM InfoSphere Information Server software, Configuring software, Configuring a parallel processing environment, Setting environment variables for the parallel engine, Environment variables for the parallel engine