Datastage Large VARCHAR Performance

IBM Infosphere Information Server (IIS), Datastage Large VARCHAR Performance

IBM Infosphere Information Server (IIS)

To the IBM Infosphere Information Server Parallel Environment Variables, I would added APT_COMPRESS_BOUNDED_FIELDS variable under “Reading and Writing Files” or, perhaps, create a new category called “Performance” for VARCHAR handling.  Using this parameter can help with overall performance of jobs having a significant number of Varchar fields, especially, large Varchar fields.  This parameter can also reduce sort and dataset storage space consumption within parallel jobs. This parameter is applicable to IBM Infosphere Information Server 8.0.1 fp3 and more recent versions.

To enable the parameter ensure the APT_COMPRESS_BOUNDED_FIELDS is set to 1 in your project environment variables, you may need to add the variable. I used these properties:

 

Category Name Type Prompt Value
User defined APT_COMPRESS_BOUNDED_FIELDS String VARCHAR Compression for Performance 1

 

To further improve performance:

  • Set VARCHAR field length to zero. This will avoid column padding, thereby, reducing I/O and space consumed by Datasets and Sorts
  • Avoid writing to files, if possible.
  • Avoid use of Datasets, if possible; especially, persistent datasets.
  • Avoid use of sorts within the Datastage job, if possible.
  • Pay attention to your partitioning strategy within the Datastage job to avoid unintentional sort operations.

One thought on “Datastage Large VARCHAR Performance

  1. Pingback: IBM InfoSphere DataStage – Parallel Environment Variables | Scientia Compendium

Comments are closed.