Whаt Iѕ A Cоdе Sniрреt?

Acronyms, Abbreviations, Terms, And Definitions

Acronyms, Abbreviations, Terms, And Definitions

 

Whаt Iѕ A Cоdе Sniрреt?

Thе code ѕniрреt iѕ a tеrm uѕеd in рrоgrаmming tо rеfеr tо ѕmаll раrtѕ оf reusable source соdеѕ. Suсh kinds оf соdеѕ аrе аvаilаblе both in binary оr tеxt context. Cоdе ѕniрреtѕ are commonly dеfinеd аѕ unitѕ or funсtiоnаl mеthоdѕ thаt can bе rеаdilу intеgrаtеd intо larger modules рrоviding functionality. Thiѕ technical tеrm iѕ аlѕо uѕеd to refer tо the рrасtiсе оf minimizing thе uѕе of repeated code thаt iѕ common to many applications.

Java рrоgrаmmеrѕ use соdе ѕniрреtѕ аѕ an informative mean tо ѕuрроrt the рrосеѕѕ оf еnсоding. Normally, a ѕniрреt shows an еntirе functional unit corresponding tо code a ѕmаll рrоgrаm, оr a ѕinglе funсtiоn, a сlаѕѕ, a template or a bunch of related funсtiоnѕ.

Prоgrаmmеrѕ use ѕniрреt codes with thе ѕаmе purposes аѕ аn аррliсаtiоn. Fоr еxаmрlе, they uѕе it as a way to ѕhоw the соdе as a proven ѕоlutiоn to a givеn рrоblеm. Thеу mау аlѕо use this tо illuѕtrаtе рrоgrаmming “triсkѕ” of nоn-triviаl imрlеmеntаtiоn to highlight thе ресuliаritiеѕ of a givеn соmрilеr. Sоmе реорlе uѕе thiѕ as an еxаmрlе оf соdе portability оr еvеn tо uѕе thеm tо lower the Jаvа programming timе. Organic аnd thеmаtiс collections of ѕniрреt соdеѕ inсludе thе digital соllесtiоn оf tiрѕ аnd triсkѕ аnd асt аѕ a ѕоurсе fоr lеаrning and rеfining рrоgrаmming.

Thе snippet iѕ ѕhоrt аnd fulfillѕ thе раrtiсulаr tаѕk well, it dоеѕ nоt nееd any еxtrа соdе beyond ѕtаndаrd library and ѕуѕtеm dереndеnt code. Thе ѕniрреt iѕn’t the complete рrоgrаm – аnd for thаt you will ѕubmit thе соdе in the ѕоurсе code rероѕitоrу that iѕ thе bеѕt place to handle the lаrgеr рrоgrаmѕ. Ideally, thе ѕniрреt must be thе ѕесtiоn of соdе, whiсh уоu mау ѕniр оut оf the lаrgеr рrоgrаm аnd very еаѕilу reuse in оthеr рrоgrаm. In order, to mаkе ѕniрреtѕ ѕimрlе tо use, it is good to еnсарѕulаtе in thе funсtiоn, сlаѕѕ аnd роtеntiаllу, аѕ thе framework tо ѕtаrt thе new рrоgrаm.

For a рrоgrаmmеr, having gооd code ѕniрреtѕ iѕ vеrу imроrtаnt. Mаnу people uѕе different wау tо kеер thеir code with them. Thеrе iѕ a lоt of оnlinе ѕоlutiоn аlѕо for thоѕе likе аgаinѕt. Hаving gооd соdе in hаnd is vеrу imроrtаnt tо dеlivеr best in class рrоduсt. Sniрреtѕ should bе аlwауѕ mоdulаr and роrtаblе. Sо that iѕ should bе plugged intо уоur соdе easily. Many реорlе uѕе github giѕt to keep thеir snippets. Rubу рrоgrаmmеrѕ uѕе mоdulеѕ to сrеаtе соdе ѕniрреtѕ.

Advertisements

Netezza / PureData – How to add a Foreign Key

DDL (Data Definition Language), Netezza PureData How to add a Foreign Key

DDL (Data Definition Language)

Adding a forging key to tables in Netezza / PureData is a best practice; especially, when working with dimensionally modeled data warehouse structures and with modern governance, integration (including virtualization), presentation semantics (including reporting, business intelligence and analytics).

Foreign Key (FK) Guidelines

  • A primary key must be defined on the table and fields (or fields) to which you intend to link the foreign key
  • Avoid using distribution keys as foreign keys
  • Foreign Key field should not be nullable
  • Your foreign key link field(s) must be of the same format(s) (e.g. integer to integer, etc.)
  • Apply standard naming conventions to constraint name:
    • FK_<<Constraint_Name>>_<<Number>>
    • <<Constraint_Name>>_FK<<Number>>
  • Please note that foreign key constraints are not enforced in Netezza

Steps to add a Foreign Key

The process for adding foreign keys involves just a few steps:

  • Verify guidelines above
  • Alter table add constraint SQL command
  • Run statistics, which is optional, but strongly recommended

Basic Foreign Key SQL Command Structure

Here is the basic syntax for adding Foreign key:

ALTER TABLE <<Owner>>.<<NAME_OF_TABLE_BEING_ALTERED>>

ADD CONSTRAINT <<Constraint_Name>>_fk<Number>>

FOREIGN KEY (<<Field_Name or Field_Name List>>) REFERENCES <<Owner>>.<<target_FK_Table_Name>.(<<Field_Name or Field_Name List>>) <<On Update | On Delete>> action;

Example Foreign Key SQL Command

This is a simple one field example of the foreign key (FK)

 

ALTER TABLE Blog.job_stage_fact

ADD CONSTRAINT job_stage_fact_host_dim_fk1

FOREIGN KEY (hostid) REFERENCES Blog.host_dim(hostid) ON DELETE cascade ON UPDATE no action;

Related References

Alter Table

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL command reference, Alter Table, constraints

 

 

Database – What is a foreign key?

Acronyms, Abbreviations, Terms, And Definitions, DDL (Data Definition Language), What is a foreign key

Acronyms, Abbreviations, Terms, And Definitions

Definition of a Foreign Key

  • A foreign Key (FK) is a constraint that references the unique primary key (PK) of another table.

Facts About Foreign Keys

  • Foreign Keys act as a cross-reference between tables linking the foreign key (Child record) to the Primary key (parent record) of another table, which establishing a link/relationship between the table keys
  • Foreign keys are not enforced by all RDBMS
  • The concept of referential integrity is derived from foreign key theory
  • Because Foreign keys involve more than one table relationship, their implementation can be more complex than primary keys
  • A foreign-key constraint implicitly defines an index on the foreign-key column(s) in the child table, however, manually defining a matching index may improve join performance in some database
  • The SQL, normally, provides the following referential integrity actions for deletions, when enforcing foreign-keys

Cascade

  • The deletion of a parent (primary key) record may cause the deletion of corresponding foreign-key records.

No Action

  • Forbids the deletion of a parent (primary key) record, if there are dependent foreign-key records.   No Action does not mean to suppress the foreign-key constraint.

Set null

  • The deletion of a parent (primary key) record causes the corresponding foreign-key to be set to null.

Set default

  • The deletion of a record causes the corresponding foreign-keys be set to a default value instead of null upon deletion of a parent (primary key) record

Related References

 

Netezza / PureData – How to Number for day of week in SQL?

Netezza / PureData - Numeric Day of Week

Netezza / PureData – Numeric Day of Week

I had reason today to get the number of the day of the week, in PureData / Netezza, which I don’t seem to have discussed in previous posts.  So, here is a simple script to get the number for the day of week with a couple of flavors, which may prove useful.

Basic Format

select extract(dow from <<FieldName>>) from <<SchemaName>>.<<tableName>>

Example SQL

 

SELECT

CURRENT_DATE

,  TO_CHAR(CURRENT_DATE,’DAY’) AS DAY_OF_WEEK

—WEEK STARTS ON MONDAY

,  EXTRACT(DOW FROM CURRENT_DATE)-1 AS DAY_OF_WEEK_NUMBER_STARTS_ON_MONDAY

—WEEK STARTS ON SUNDAY

,  EXTRACT(DOW FROM CURRENT_DATE) AS DAY_OF_WEEK_NUMBER_STARTS_ON_SUNDAY

—WEEK STARTS ON SATURDAY

,  EXTRACT(DOW FROM CURRENT_DATE)+1 AS DAY_OF_WEEK_NUMBER_STARTS_ON_SATURDAY

FROM _V_DUAL;

 

Related References

Extract date and time values

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Functions and operators, Functions, Extract date and time values

Date/time functions

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Date/time functions

Template patterns for date/time conversions

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Conversion functions, Template patterns for date/time conversions

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html

Database – What is a Composite Primary Key?

Database Table

Database Table

What is a Composite Primary Key?

A Composite Primary key is Primary key What a primary key, which is defined by having multiple fields (columns) in it.  Like a Primary Key what a composite Primary Key is depends on the database.  Essentially a Composite Primary Key:

  • Is a combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And may define parent-Child relationship between tables

Related References

Data Modeling – Dimension Table Effective Practices

Database Table

Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

 

 

Datastage – When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0

APT_TSortOperator Warning

APT_TSortOperator Warning

The APT_TSortOperator  warning happens when there is a conflict in the portioning behavior between stages.  Usually, because the successor (down Stream) stage has the ‘Partitioning / Collecting’ and ‘Sorting’ property set in a way that conflicts with predecessor (upstream) stage’s properties, which it is set to preserver.  This can occur when the successor stage has the “Preserve Partitioning” property set to:

  • ‘Default (Propagate)’
  • ‘Propagate’, or
  • ‘Set’
Preserve Partitioning Property - list

Preserve Partitioning Property – list

Message ID

  • IIS-DSEE-TFOR-00074

Message Text

  • <<Link Name Where Warning Occurred>>: When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0.

Warning Fixes

  • First, if the verify that the partitioning behaviors of both stages are correct
  • If so, set the predecessor ‘Preserve Partitioning’ property to “Clear”
  • If not, then correct the partitioning behavior of the stage which is in error

Clear Partitioning Property Screenshot

Preserve Partitioning Property - Set To Clear

Preserve Partitioning Property – Set To Clear

Infosphere DataStage – Designer Client Repository Structure

Default Repository Structure

When a project is created, there is a default repository structure created for use in the DataStage designer client.

Default DataStage Repository Structure

Default DataStage Repository Structure

However, some additional organization will be required for most DataStage projects.  Usually, this organization occurs in in these areas:

  • Addition of structure within the “Jobs” folder
  • Addition of a “Parameter Sets” folder
  • Addition of structure within the “Table Definitions” folder
  • Addition of a “Developer Work Area” folder

Repository Structure within the “Jobs” folder

Below is a sample of a folder structure for multiple applications that share a common Repository.  Pattern includes, but does not illustrate all other delivered folders. In addition to the core folder structure, developers can create individual working, test, and in progress folders, which do not migrate, but keep work segregated.

Jobs Folder Pattern Datastage Repository Structure

Jobs Folder Pattern Datastage Repository Structure

Parameter Sets Folders

The parameter set folders or for two sets of information.

  • First, are the database parameters, which include data connections and the attached parameter sets.
  • The second, for job parameters, which may include parameter sets, for things like e-mail parameters, surrogate key file paths, etc.; which is a best practice, rather creating them as project level parameters.
Parameter Sets Folder Pattern Datastage Repository Structure

Parameter Sets Folder Pattern Datastage Repository Structure

Table Definitions

The Tables Definition folder have folders added to segregate the imported meta data for source and target system and, in some case, may need folders to logically organize imported meta which may reside within the same database and/or schema, but belong to different logical layer.

Table Definitions Folder Pattern DataStage Repository Structure

Table Definitions Folder Pattern DataStage Repository Structure

InfoSphere DataStage – DataStage Parallel Job Peer Code Review Checklist Template

SDLC Development Phase

SDLC Development Phase

Peer code review happens during the development phase and focus on the overall quality and compliance to standards of code and configuration artifacts. However, the hard part of performing a Peer code review isn’t, performing the review, but rather to achieving consistency and thoroughness in the review.   This is where a checklist can contribute significantly, providing a list of things to check and providing a relative weight for the findings.  I hope this template assists with your DataStage job review process.

 

What is a Peer Review?

Acronyms, Abbreviations, Terms, And Definitions

Acronyms, Abbreviations, Terms, And Definitions

A peer review is an examination of a Software Development Life Cycle (SDLC) work product by team members, other than the work Product’s author, to identify defects, omissions, and compliance to standards.  This process provides an opportunity for a quality assurance, knowledge sharing, and product improvement early during the SDLC life cycle.

InfoSphere DataStage – Ways to Create a Datastage Parameter Set

Parameter Sets

Parameter Sets

There are three primary ways to create a parameter sets and is a different practice from adding ‘User Defined’ variables in InfoSphere DataStage Administrator. The ways to create a parameter set are:

  • Create a parameter set from a data connection stage
  • Create a Parameter Set from the navigation of DataStage designer, and
  • Create a Parameter Set from a job

Create a parameter set from Data Connection Stage

This is used to create parameter sets for Database connections parameters

To create a new Parameter Set from a Data Connection

  • Select: File > New > Other and select “Data Connection
  • Complete the data Connection stage properties, then save the stage.
  • Open the Connection stage and navigate to the “Parameters” Tab
  • Then, click on the “Associated Parameter Set” button, and Chose the “Create & Attach” menu item
  • This will Launch a Dialog
  • Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
  • In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set
  • On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
  • Click OK to save the Parameter set.

Create a Parameter Set from the navigation of DataStage designer

This is, perhaps, the more traditional way of creating a parameter set.

To create a new Parameter Set

  • Select: File > New > Other and select “Parameter Set”
  • This will Launch a Dialog
  • Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
  • In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set.
Note: Existing Environment Variables can also added.
  • Create a Parameter Set from a job
    On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
  • Click OK to save the Parameter set.

This approach is, perhaps, less traditional, but is equally effective, if you find yourself creating additional jobs and now need to share the same parameters.  This is a quick and easy to generate a parameter set from an existing job.

To create a new Parameter Set from a job

  • Open the job that you want to create a parameter set for.
  • Click “Edit > Job Properties” to open the “Job Properties” window.
  • Click the “Parameters” tab.
  • Press and hold the Ctrl key, then select the parameters that you want to include in the parameter set.
  • With your parameters highlighted, click “Create Parameter Set”.  The Parameter Set window opens.
    • Enter a name and short description for your parameter set.
    • Click the “Parameters” tab; the parameters that you selected are listed.
    • Click the ”Values” tab.
    • Enter a name in the Value File name field, then press Enter.  The value for each of your parameters is automatically populated with the path name that you entered.
    • If a default value is not already set, enter a value for each parameter. For example, if the variable is a Pathname type, enter a default path name.
    • Click “OK” to close the Parameter Set window.
    • In the Save Parameter Set As window, select the folder where you want to save your parameter set and click Save. When prompted to replace the selected parameters with the parameter set, click Yes.
  • Click “OK” to close the Job Properties window.

Related References

Infosphere Datastage – Data type conversions from Netezza to DataStage

IBM Infosphere Information Server (IIS)

IBM Infosphere Information Server (IIS)

 

Some recent research to eliminate some ETL Data Conversion issues, made me want to make an enhancement to the documentation provided by IBM, to prevent repeating the research.

Netezza data types and their equivalent InfoSphere DataStage data types

Netezza data types

InfoSphere DataStage data types (SQL types)

Notes

 
BYTEINT TinyInt
SMALLINT SmallInt
INT Integer
BIGINT BigInt
NUMERIC(p, s) Numeric, decimal, double Decimal and double are aliases of Numeric
FLOAT(p) Float
REAL Real
DOUBLE PRECISION Double
CHAR(n) Char
VARCHAR(n) VarChar
NCHAR(n) NChar
NVARCHAR(n) NVarChar
BOOLEAN Bit
DATE Date
TIME Time
TIME WITH TIME ZONE VarChar The value of time with time zone will be returned without the time zone information.
TIMESTAMP TimeStamp
INTERVAL VarChar You cannot load the interval data type from an external table.
ROWID BigInt
TRANSACTION ID BigInt
DATASLICE Integer

Related References

Data type conversions from Netezza to DataStage

InfoSphere Information Server,InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

Netezza / PureData – What is the maximum length of CHAR, VARCHAR, and NVARCHAR fields

IBM Netezza / IBM PureData for Analytics

IBM Netezza / IBM PureData for Analytics

 

 

During a recent project, I had reason to investigate what the maximum character field links for Netezza to be able to fit the data ETL ‘ed from the source.  The constraints on Netezza character fields are not as long as some other databases, therefore, I thought I would document and a quick post for future reference.   Knowing these limits will help with your ETL design and construction but you may know where to truncate your field input data.

 

Data type Alias names InfoSphere DataStage data types (SQL type) Maximum Length Notes
CHAR CHARACTER, CHAR(n), CHARACTER(n) Char 64,000 Fixed-length character string, blank padded to length n. If you do not specify n, the default is an unsized CHAR value.
VARCHAR CHARACTER VARYING, VARCHAR(n), CHARACTER VARYING(n), CHAR VARYING(n) VarChar, LongVarChar 64,000 Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized VARCHAR value. There is no blank padding, and the value is stored as entered.
NCHAR NATIONAL CHARACTER, NATIONAL CHAR(n), NCHAR(size) NChar, Char, 16,000 Fixed-length character string, blank padded to length n. If you do not specify n, the default is an unsized NCHAR value.
NVARCHAR NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NATIONAL CHAR VARYING(n), NATIONAL CHARACTER VARYING(n), and NVARCHAR(n) VarChar, NVarChar ,LongVarChar, LongNVarChar, VarChar 16,000 Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized NVARCHAR value.

Related References

Data types and aliases

PureData System for Analytics ,PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

Data type conversions from DataStage to Netezza

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from DataStage to Netezza

Netezza / PureData – How to add comments on a field

Netezza / PureData Table Documentation Practices

Netezza / PureData Table Documentation Practices

The ‘Comment on Column’ provides the same self-documentation capability as ‘Comment On table’, but drives the capability to the column field level.  This provides an opportunity to describe the purpose, business meaning, and/or source of a field to other developers and users.  The comment code is part of the DDL and can be migrated with the table structure DDL.  The statement can be run independently, or working with Aginity for PureData System for Analytics, they can be run as a group, with the table DDL, using the ‘Execute as a Single Batch (Ctrl+F5) command.

Basic ‘COMMENT ON field’ Syntax

  • The basic syntax to add a comment to add a comment to a column is:

COMMENT ON COLUMN <<Schema.TableName.ColumnName>> IS ‘<<Descriptive Comment>>’;

Example ‘COMMENT ON Field’ Syntax

  • This is example syntax, which would need to be changed and applied to each column field:

COMMENT ON COLUMN time_dim.time_srky IS ‘time_srky is the primary key and is a surrogate key derived from the date business/natural key’;

 

Related References

 

Netezza / PureData – How to add comments on a table

Comment On A Netezza / PureData Table

Comment On A Netezza / PureData Table

Using the ‘Comment On’ capabilities of Netezza is an easy way to provide self-documentation within the DDL structure of a table.  This can be very helpful to developers and users, who can see the DDL of the table.  Personally, I think of it as a way to document the purpose and/or business meaning of a table.  This can be especially, when differing team may be working with tables. Also, reduce the reliance on developer and/or user memory, or to search of the technical documentation, which may or may not be readily available.

Basic ‘COMMENT ON Table’ Syntax

The basic syntax to add a comment to add a comment to a table is:

COMMENT ON TABLE <<Schema.TableName>> IS ‘<<Descriptive Comment>>’;

 

Example ‘COMMENT ON Table’ Syntax

COMMENT ON TABLEdate_dim IS ‘Date dimension table, which provides basic calendar date information’;

Related References

 

InfoSphere Datastage – How to Improve Sequential File Performance Using Parallel Environment Variables

APT_FILE_EXPORT_BUFFER_SIZE and APT_FILE_IMPORT_BUFFER_SIZE in DataStage Administrator

APT_FILE_EXPORT_BUFFER_SIZE and APT_FILE_IMPORT_BUFFER_SIZE in DataStage Administrator

While extensive use of sequential files is not best practice, sometimes there is no way around it, due to legacy systems and/or existing processes. However, recently, I have encountered a number of customers who are seeing significant performance issues with sequential file intensive processes. Sometimes it’s the job design, but often when you look at the project configuration they still have the default values. This is a quick and easy thing to check and to adjust to get a quick performance win, if they’ve not already been adjusted.These are delivered variables, but should seriously be considered for adjustment in nearly all data stage ETL projects. The adjustment must be based on the amount of available memory, the volume of work load that is sequential file intensive, and the environment you’re working in. Some experiential adjustment may be required, but I have provided a few recommendations below.

 

Environment Variable Properties

 

Category Name Type Parameter Name Prompt Size Default Value
Parallel > Operator Specific String APT_FILE_EXPORT_BUFFER_SIZE Sequential write buffer size Adjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096. 128
Parallel > Operator Specific String APT_FILE_IMPORT_BUFFER_SIZE Sequential read buffer size Adjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096. 128

Related References

 

 

Surrogate Key File Effective Practices

Database Table, Surrogate Key File Effective Practices

Surrogate Key File Effective Practices

Here are a few thoughts on effectively working with IBM Infosphere, Information Server, DataStage surrogate key files, which may prove useful for developers.

 

Placement

  • The main thing about placement is that it be in a consistent location. Developers and production support teams should need to guess or look up where it is for every DataStage project. So, it is best to put the surrogate keys in same base path and that each project has its own subfolder to facilitate migrations and to reduce the possibility of human error. Here Is the patch structure, which is commonly use:

Path

  • /data/SRKY/<<Project Name>>

Parameter Sets

  • As a best practice, the surrogate key file path should be in a parameter set and the parameter used in the jobs, as needed.  This simplifies maintenance, if and when changes to the path are required, and during migrations.

Surrogate Key Parameter Set Screenshot – Example Parameter Tab

Surrogate Key Parameter Set Screen Screen

Surrogate Key Parameter Set Screenshot

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Surrogate Key Job Parameter Example Path using Parameter

Permissions

  • DataStage must have permissions to:
    • The entire parent path
    • The project folder, and
    • The surrogate key files themselves.

To ensure the DataStage has access to the path and Surrogate files, ensure:

  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to folders in the path, with at least a “-rw-r–r–“ (644) permissions level. Keeping the permissions to a minimum can, for obvious reasons,  prevent inadvertent overwrites of the surrogate key files; thus, avoiding some, potentially, serious cleanup.
  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

Surrogate Key File Minimum Permissions

Productivity Tip – Quickly create a new surrogate key file

Linux

Linux

This productivity tip, is how we can quickly create a new surrogate key file in Linux.  This example is leveraging native capabilities of Red Hat Enterprise Linux (RHEL) to skip a few commands, by using an existing surrogate key file to create a new surrogate file with a minimum of keys strokes and command line entries.

Creating a New Surrogate Key File From an Existing File

The basic process consists of just a few steps:

  1. Navigate to the location of your existing surrogate key files
  2. Copy an existing surrogate file
  3. Empty the new surrogate key file

Navigate to the location of your existing surrogate key files

This step is preparatory step; you will need to look at the path variable for the project you are working with to know where to go.  The actual path to the surrogate files your project can vary by project.

Copy an existing surrogate file

Assuming you have existing surrogate key files configured as needed, the use of the copy (cp) command can and the interactive and preserve options can eliminate the need to create the file, then set groups and permissions.   The interactive (-i) option prevent you from overwriting an existing files, in case you made a filename typo and the preserver (-p) option preserve the specified attributes (e.g. ownership, and permissions).

Basic Command

  • Here is the command formats with interactive and preserve, either format works
    • cp -ip <<FileName to Be Copied>> <<New Filename>>
  • Here is the command formats with only preserve
    • cp -p <<FileName to Be Copied>> <<New Filename>>

Example Command

  • cp -ip srky  blogexampl.srky
Copy Surrogate Key With Permissions

Copy Surrogate Key With Permissions

Empty the new surrogate key file

Setting the newly create surrogate key file to null will empty the file, so, DataStage can begin from the point configure in your DataStage job.

Basic Command

  • cat /dev/null > <<FileName>>

Example Command

  • cat /dev/null > blogexample.srky
Empty Surrogate Key File

Empty Surrogate Key File

Related References