DataStage – Netezza Connector Action Column

Over the years have occasionally use the action column feature, however, the last month or so I have found myself using it quite a lot. This is especially true in relation to the tea set and not just in relation to the change capture stage.

The first thing you need to know is, if you want to prevent getting the ‘no action column found’ notice on the target stage, need to ensure that the action column has been coded to be a single character field char (1). Otherwise, the Netezza connector stage will not recognize your field as an action column.

While most developers will commonly work with the action column feature in relation to the change capture stage, it can also be very useful if you have created a field from one or more inputs to tell you what behavior the row requires. I have found that this approach can be very useful and efficient under the right circumstances.

Example Pattern for Action Column Using Multiple Source Selects

Example Pattern for Action Column Using Multiple Source Selects

Action column configuration example

Action Column Field Type

Action Column Field Type

 Change Code Values Mapping To Action Column

  • Here’s a quick reference table to provide the interpretation of the change type code to the actual one character action column value to which it will need to be interpreted.

Change Code Type

Change Type Code

Action Column Value

Copy (Data Without Changes)

0

No
value for this Change Type

Insert

1

I

Delete

2

D

Update

3

U

Example Transformer Stage, Derivation

  •  Here is a quick transformer stage derivation coding example to take advantage of the action call capabilities. If you haven’t already handled the removal of the copy rows, you may also want to add a constraint.
  • The combination I most frequently find myself using is the insert and update combination.
if Lnk_Out_To_Tfm.change_code=1 then ‘I’

Else if Lnk_Out_To_Tfm.change_code=2 then ‘D’

Else if Lnk_Out_To_Tfm.change_code=3 then ‘U’

Related References

Home > InfoSphere Information Server 11.7.0 > InfoSphere DataStage and QualityStage > Developing parallel jobs > Introduction to InfoSphere DataStage Balanced Optimization > Job design considerations  > Specific considerations for the Netezza connector

Netezza / PureData – List of Views against a table

PureData Powered by Netezza

PureData Powered by Netezza

I have found myself using this simple, but useful SQL time in recent weeks to research different issues and to help with impact analysis.  So, I thought I would post it while I’m thinking about it.  It just gives a list of views using a table, which can be handy to know.  This SQL is simple and could be converted to an equi-join.  I used the like statement mostly because I sometimes want to know if there are other views a similar nature in the same family (by naming convention) of tables.

Select All Fields From The _V_View

This is the simplest form of this SQL to views, which a table.

Select * from _v_view

where DEFINITION like ‘%<<TABLE_NAME>>%’ ;

Select Minimal Fields From The _V_View

This is the version of the SQL, which I normally use, to list the views, which use a table.

Select VIEWNAME, OWNER from _v_view

where DEFINITION like ‘%<<TABLE_NAME>>%’;

Related References

Big Data vs. Virtualization

Big Data Information Approaches

Big Data Information Approaches

Globally, organizations are facing challenges emanating from data issues, including data consolidation, value, heterogeneity, and quality. At the same time, they have to deal with the aspect of Big Data. In other words, consolidating, organizing, and realizing the value of data in an organization has been a challenge over the years. To overcome these challenges, a series of strategies have been devised. For instance, organizations are actively leveraging on methods such as Data Warehouses, Data Marts, and Data Stores to meet their data assets requirements. Unfortunately, the time and resources required to deliver value using these legacy methods is a distressing issue. In most cases, typical Data Warehouses applied for business intelligence (BI) rely on batch processing to consolidate and present data assets. This traditional approach is affected by the latency of information.

Big Data

As the name suggests, Big Data describes a large volume of data that can either be structured or unstructured. It originates from business processes among other sources. Presently, artificial intelligence, mobile technology, social media, and the Internet of Things (IoT) have become new sources of vast amounts of data. In Big Data, the organization and consolidation matter more than the volume of the data. Ultimately, big data can be analyzed to generate insights that can be crucial in strategic decision making for a business.

Features of Big Data

The term Big Data is relatively new. However, the process of collecting and preserving vast amounts of information for different purposes has been there for decades. Big Data gained momentum recently with the three V’s features that include volume, velocity, and variety.

Volume: First, businesses gather information from a set of sources, such as social media, day-to-day operations, machine to machine data, weblogs, sensors, and so on. Traditionally, storing the data was a challenge. However, the requirement has been made possible by new technologies such as Hadoop.

Velocity: Another defining nature of Big Data is that it flows at an unprecedented rate that requires real-time processing. Organizations are gathering information from RFID tags, sensors, and other objects that need timely processing of data torrents.

Variety: In modern enterprises, information comes in different formats. For instance, a firm can gather numeric and structured data from traditional databases as well as unstructured emails, video, audio, business transactions, and texts.

Complexity: As mentioned above, Big Data comes from diverse sources and in varying formats. In effect, it becomes a challenge to consolidate, match, link, cleanse, or modify this data across an organizational system. Unfortunately, Big Data opportunities can only be explored when an organization successfully correlates relationships and connects multiple data sets to prevent it from spiraling out of control.

Variability: Big Data can have inconsistent flows within periodic peaks. For instance, in social media, a topic can be trending, which can tremendously increase collected data. Variability is also common while dealing with unstructured data.

Big Data Potential and Importance

The vast amount of data collected and preserved on a global scale will keep growing. This fact implies that there is more potential to generate crucial insights from this information. Unfortunately, due to various issues, only a small fraction of this data actually gets analyzed. There is a significant and untapped potential that businesses can explore to make proper and beneficial use of this information.

Analyzing Big Data allows businesses to make timely and effective decisions using raw data. In reality, organizations can gather data from diverse sources and process it to develop insights that can aid in reducing operational costs, production time, innovating new products, and making smarter decisions. Such benefits can be achieved when enterprises combine Big Data with analytic techniques, such as text analytics, predictive analytics, machine learning, natural language processing, data mining and so on.

Big Data Application Areas

Practically, Big Data can be used in nearly all industries. In the financial sector, a significant amount of data is gathered from diverse sources, which requires banks and insurance companies to innovate ways to manage Big Data. This industry aims at understanding and satisfying their customers while meeting regulatory compliance and preventing fraud. In effect, banks can exploit Big Data using advanced analytics to generate insights required to make smart decisions.

In the education sector, Big Data can be employed to make vital improvements on school systems, quality of education and curriculums. For instance, Big Data can be analyzed to assess students’ progress and to design support systems for professors and tutors.

Healthcare providers, on the other hand, collect patients’ records and design various treatment plans. In the healthcare sector, practitioners and service providers are required to offer accurate and timely treatment that is transparent to meet the stringent regulations in the industry and to enhance the quality of life. In this case, Big Data can be managed to uncover insights that can be used to improve the quality of service.

Governments and different authorities can apply analytics to Big Data to create the understanding required to manage social utilities and to develop solutions necessary to solve common problems, such as city congestion, crime, and drug use. However, governments must also consider other issues such as privacy and confidentiality while dealing with Big Data.

In manufacturing and processing, Big Data offers insights that stakeholders can use to efficiently use raw materials to output quality products. Manufacturers can perform analytics on big data to generate ideas that can be used to increase market share, enhance safety, minimize wastage, and solve other challenges faster.

In the retail sector, companies rely heavily on customer loyalty to maintain market share in a highly competitive market. In this case, managing big data can help retailers to understand the best methods to utilize in marketing their products to existing and potential consumers, and also to sustain relationships.

Challenges Handling Big Data

With the introduction of Big Data, the challenge of consolidating and creating value on data assets becomes magnified. Today, organizations are expected to handle increased data velocity, variety, and volume. It is now a business necessity to deal with traditional enterprise data and Big Data. Traditional relational databases are suitable for storing, processing, and managing low-latency data. Big Data has increased volume, variety, and velocity, making it difficult for legacy database systems to efficiently handle it.

Failing to act on this challenge implies that enterprises cannot tap the opportunities presented by data generated from diverse sources, such as machine sensors, weblogs, social media, and so on. On the contrary, organizations that will explore Big Data capabilities amidst its challenges will remain competitive. It is necessary for businesses to integrate diverse systems with Big Data platforms in a meaningful manner, as heterogeneity of data environments continue to increase.

Virtualization

Virtualization involves turning physical computing resources, such as databases and servers into multiple systems. The concept consists of making the function of an IT resource simulated in software, making it identical to the corresponding physical object. Virtualization technique uses abstraction to create a software application to appear and operate like hardware to provide a series of benefits ranging from flexibility, scalability, performance, and reliability.

Typically, virtualization is made possible using virtual machines (VMs) implemented in microprocessors with necessary hardware support and OS-level implementations to enhance computational productivity. VMs offers additional convenience, security, and integrity with little resource overhead.

Benefits of Virtualization

Achieving the economics of wide-scale functional virtualization using available technologies is easy to improve reliability by employing virtualization offered by cloud service providers on fully redundant and standby basis. Traditionally, organizations would deploy several services to operate at a fraction of their capacity to meet increased processing and storage demands. These requirements resulted in increased operating costs and inefficiencies. With the introduction of virtualization, the software can be used to simulate functionalities of hardware. In effect, businesses can outstandingly eliminate the possibility of system failures. At the same time, the technology significantly reduces capital expense components of IT budgets. In future, more resources will be spent on operating, than acquisition expenses. Company funds will be channeled to service providers instead of purchasing expensive equipment and hiring local personnel.

Overall, virtualization enables IT functions across business divisions and industries to be performed more efficiently, flexibly, inexpensively, and productively. The technology meaningfully eliminates expensive traditional implementations.

Apart from reducing capital and operating costs for organizations, virtualization minimizes and eliminates downtime. It also increases IT productivity, responsiveness, and agility. The technology provides faster provisioning of resources and applications. In case of incidents, virtualization allows fast disaster recovery that maintains business continuity.

Types of Virtualization

There are various types of virtualization, such as a server, network, and desktop virtualization.

In server virtualization, more than one operating system runs on a single physical server to increase IT efficiency, reduce costs, achieve timely workload deployment, improve availability and enhance performance.

Network virtualization involves reproducing a physical network to allow applications to run on a virtual system. This type of virtualization provides operational benefits and hardware independence.

In desktop virtualization, desktops and applications are virtualized and delivered to different divisions and branches in a company. Desktop virtualization supports outsourced, offshore, and mobile workers who can access simulate desktop on tablets and iPads.

Characteristics of Virtualization

Some of the features of virtualization that support the efficiency and performance of the technology include:

Partitioning: In virtualization, several applications, database systems, and operating systems are supported by a single physical system since the technology allows partitioning of limited IT resources.

Isolation: Virtual machines can be isolated from the physical systems hosting them. In effect, if a single virtual instance breaks down, the other machine, as well as the host hardware components, will not be affected.

Encapsulation: A virtual machine can be presented as a single file while abstracting other features. This makes it possible for users to identify the VM based on a role it plays.

Data Virtualization – A Solution for Big Data Challenges

Virtualization can be viewed as a strategy that helps derive information value when needed. The technology can be used to add a level of efficiency that makes big data applications a reality. To enjoy the benefits of big data, organizations need to abstract data from different reinforcements. In other words, virtualization can be deployed to provide partitioning, encapsulation, and isolation that abstracts the complexities of Big Data stores to make it easy to integrate data from multiple stores with other data from systems used in an enterprise.

Virtualization enables ease of access to Big Data. The two technologies can be combined and configured using the software. As a result, the approach makes it possible to present an extensive collection of disassociated and structured and unstructured data ranging from application and weblogs, operating system configuration, network flows, security events, to storage metrics.

Virtualization improves storage and analysis capabilities on Big Data. As mentioned earlier, the current traditional relational databases are incapable of addressing growing needs inherent to Big Data. Today, there is an increase in special purpose applications for processing varied and unstructured big data. The tools can be used to extract value from Big Data efficiently while minimizing unnecessary data replication. Virtualization tools also make it possible for enterprises to access numerous data sources by integrating them with legacy relational data centers, data warehouses, and other files that can be used in business intelligence. Ultimately, companies can deploy virtualization to achieve a reliable way to handle complexity, volume, and heterogeneity of information collected from diverse sources. The integrated solutions will also meet other business needs for near-real-time information processing and agility.

In conclusion, it is evident that the value of Big Data comes from processing information gathered from diverse sources in an enterprise. Virtualizing big data offers numerous benefits that cannot be realized while using physical infrastructure and traditional database systems. It provides simplification of Big Data infrastructure that reduces operational costs and time to results. Shortly, Big Data use cares will shift from theoretical possibilities to multiple use patterns that feature powerful analytics and affordable archival of vast datasets. Virtualization will be crucial in exploiting Big Data presented as abstracted data services.

 

Netezza / Puredata – How to replace or trim CHAR(0) is NULL characters in a field

PureData Powered by Netezza

PureData Powered by Netezza

Occasionally, one runs into the problem of hidden field values breaking join criteria.  I have had to clean up bad archive and conversion data with hidden characters serval times over the last couple of weeks, so, I thought I might as well capture this note for future use.

I tried the Replace command which is prevalent for Netezza answers to this issue on the web, but my client’s version does not support that command.  So, I needed to use the Translate command instead to accomplish it.  It took a couple of searches of the usual bad actors to find the character causing the issue, which on this day was chr(0).  Here is a quick mockup of the command I used to solve this issue.

Example Select Statement

Here is a quick example select SQL to identify problem rows.

SELECT TRANSLATE(F.BLOGTYPE_CODE, CHR(0), ”) AS BLOGTYPE_CODE, BT.BLOG_TYP_ID, LENGTH(BT.BLOG_TYP_ID) AS LNGTH_BT, LENGTH(F.BLOGTYPE_CODE) AS LNGTH_ BLOGTYPE

FROM  BLOGS_TBL F,  BLOG_TYPES BT WHERE TRANSLATE(F.BLOGTYPE_CODE, CHR(0), ”) =  BT.BLOG_TYP_ID AND LENGTH(BT.BLOG_TYP_ID) <>Length(LENGTH(F.BLOGTYPE_CODE) ;

 

Example Update Statement

Here is a quick shell update statement to remove the Char(0) characters from the problem field.

Update <<Your Table Name>> A

Set A.<<Your Field Name>> = TRANSLATE(A.<<Your FieldName>>, CHR(0), ”)

where length(A.<<Your Field Name>>) <> Length(A.<<Your FieldName>>) And << Additional criteria>>;

 

 

 

SQL Server – how to know when a stored procedure ran last

Microsoft SQL Server 2017

Microsoft SQL Server 2017

This week I needed to know if a stored procedure was running when expected during our batch.  So, here is a quick couple of SQL to answer the question:

When a Stored Procedure was run last

This version of the SQL gives the date for the last time the Stored Procure was run:

select distinct   top 1     s.last_execution_time

from  sys.dm_exec_query_stats s

cross apply sys.dm_exec_query_plan (s.plan_handle) p

where  object_name(p.objectid, db_id(‘<<DATABASE_NAME>>’)) = ‘<<STORED_PROCEDURE_NAME>>’

Order by s.last_execution_time desc

Get a list of when Stored Procedure has been run

This version of the SQL provides a list of dates of when Stored Procure has been run:

select distinct   s.last_execution_time

from  sys.dm_exec_query_stats s

cross apply sys.dm_exec_query_plan (s.plan_handle) p

where object_name(p.objectid, db_id(‘<<DATABASE_NAME>>’)) = ‘<<STORED_PROCEDURE_NAME>>’

Order by s.last_execution_time desc

 

 

Netezza / PureData – How to Substring on a Character

PureData Powered by Netezza

PureData Powered by Netezza

 

I had a reason this week to perform a substring on a character in Netezza this week, something I have not had a need to do before.  The process was not as straightforward as I would have thought, since the command is explained as a static position command, and the IBM documentation, honestly, wasn’t much help.  Knowing full well, that text strings are variable having to provide a static position is not terribly useful in and of itself.  So, we need to use an expression to make the substring command flexible and dynamic.

I did get it work the way I needed, but it took two commands to make it happen:

  • The First was the ’instr’ command to identify the field and character I wanted to substring on: instr(<<FIELD_NAME>>,’~’) as This provides the position number of the tilde (~).
  • The second was the ‘substr’ command in which I embedded the ‘instr’ command: substr(<<FIELD_NAME>>,0,instr(<<FIELD_NAME>>,’~’) )

This worked nicely for what I needed, which was to pick out a file name from the beginning of a string, which was delimited with a tilde (~)

Substring on a Character Command Format

  • This format example starts with position zero (0) as position 1 of substring command and goes to the first tilde (~) as position 2 of the substring command.
Select  <<FIELD_NAME>>

, instr(<>,’~’) as pos2

, substr(<<FIELD_NAME>>,0,instr(<<FIELD_NAME>>,’~’) ) as Results

From <<Table_Name>>

where  <<Where_Clause>>;

 

 

Related references

IBM Knowledge Center, Home, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Character string functions

IBM Knowledge Center, Home PureData System for Analytics 7.0.3, IBM Netezza Database User’s Guide, Netezza SQL basics, Netezza SQL extensions, Character functions

Oracle TO_CHAR to SQL Server CONVERT Equivalents to change Date to String

Transact SQL (T-SQL)

Transact SQL (T-SQL)

When it comes to SQL I tend to lean on the SQL I have used the most over the years, which is Oracle.  Today was no exception, I found myself trying to use the TO_CHAR command in SQL Server to format a date, which of course does not work. So, after a little thought, here are some examples of how you can the SQL Server Convert Command the achieve the equivalent result.

Example SQL Server Date Conversion SQL

Example SQL Server Date Conversion SQL

Example SQL Server Date Conversion SQL Code

This SQL of examples runs, as is, no from table required.

 

Select

CONVERT(VARCHAR(10), GETDATE(), 20) as
‘YYYY-MM-DD’

,CONVERT(VARCHAR(19), GETDATE(), 20) as ‘YYYY-MM-DD HH24:MI:SS’

,CONVERT(VARCHAR(8), GETDATE(), 112) as YYYYMMDD

,CONVERT(VARCHAR(6), GETDATE(), 112) as YYYYMM

,CONVERT(VARCHAR(12), DATEPART(YEAR, GETDATE()))+ RIGHT(‘0’+CAST(MONTH(GETDATE()) AS VARCHAR(2)),2)
as
YYYYMM_Method_2

,CONVERT(VARCHAR(4), GETDATE(), 12) as YYMM

,CONVERT(VARCHAR(4), GETDATE(), 112) as YYYY

,CONVERT(VARCHAR(4), DATEPART(YEAR, GETDATE())) as YYYY_Method_2

,CONVERT(VARCHAR(4), YEAR(GETDATE())) as YYYY_Method_3

,RIGHT(‘0’+CAST(MONTH(GETDATE()) AS VARCHAR(2)),2) as Two_Digit_Month

,SUBSTRING(ltrim(CONVERT(VARCHAR(4), GETDATE(), 12)),3,2) as Two_Digit_Month_2

,CONVERT(VARCHAR(10), GETDATE(), 111) as ‘YYYY/MM/DD’

,CONVERT(VARCHAR(5), GETDATE(), 8) as ‘HH24:MI’

,CONVERT(VARCHAR(8), GETDATE(), 8) ‘HH24:MI:SS’

Map TO_CHAR formats to SQL Server

You can map an Oracle TO_CHAR formats to SQL Server alternative commands as follows:

TO_CHAR
String

VARCHAR
Length

SQL
Server Convert Style

YYYY-MM-DD

VARCHAR(10)

20,
21, 120, 121, 126 and 127

YYYY-MM-DD
HH24:MI:SS

VARCHAR(19)

20,
21, 120 and 121

YYYYMMDD

VARCHAR(8)

112

YYYYMM

VARCHAR(6)

112

YYMM

VARCHAR(4)

12

YYYY

VARCHAR(4)

112

MM

VARCHAR(2)

12

YYYY/MM/DD

VARCHAR(10)

111

HH24:MI

VARCHAR(5)

8,
108, 14 and 114

HH24:MI:SS

VARCHAR(8)

8,
108, 14 and 114

Translating the formats commands

Here are some example of translating the formats commands.

Format

SQL
Server

YYYY-MM-DD

CONVERT(VARCHAR(10),
GETDATE(), 20)

YYYY-MM-DD
HH24:MI:SS

CONVERT(VARCHAR(19),
GETDATE(), 20)

YYYYMMDD

CONVERT(VARCHAR(8),
GETDATE(), 112)

YYYYMM

CONVERT(VARCHAR(6),
GETDATE(), 112)

YYMM

CONVERT(VARCHAR(4),
GETDATE(), 12)

YYYY

CONVERT(VARCHAR(4),
GETDATE(), 112)

YYYY

CONVERT(VARCHAR(4),
DATEPART(YEAR, GETDATE()))

YYYY

CONVERT(VARCHAR(4),
YEAR(GETDATE()))

MM

RIGHT(‘0’+CAST(MONTH(GETDATE())
AS VARCHAR(2)),2)

MM

SUBSTRING(ltrim(CONVERT(VARCHAR(4),
GETDATE(), 12)),3,2)

YYYY/MM/DD

CONVERT(VARCHAR(10),
GETDATE(), 111)

HH24:MI

CONVERT(VARCHAR(5),
GETDATE(), 8)

HH24:MI:SS

CONVERT(VARCHAR(8),
GETDATE(), 8)

Related Reference

Microsoft Docs, SQL, T-SQL Functions, GETDATE (Transact-SQL)

Microsoft Docs, SQL, T-SQL Functions, Date and Time Data Types and Functions (Transact-SQL)

Microsoft Docs, SQL, T-SQL Functions, DATEPART (Transact-SQL)

 

 

SQL server table Describe (DESC) equivalent

 

Transact SQL (T-SQL)

Transact SQL (T-SQL)

Microsoft SQL Server doesn’t seem have a describe command and usually, folks seem to want to build a stored procedure to get the describe behaviors.  However, this is not always practical based on your permissions. So, the simple SQL below will provide describe like information in a pinch.  You may want to dress it up a bit; but I usually just use it raw, as shown below by adding the table name.

Describe T-SQL Equivalent

Select *

From INFORMATION_SCHEMA.COLUMNS

Where TABLE_NAME = ‘<<TABLENAME>>’;

Related References

Microsoft SQL Server – Useful links

Microsoft SQL Server 2017

Microsoft SQL Server 2017

Here are a few references for the Microsoft SQL Server 2017 database, which may be helpful.

Table Of Useful Microsoft SQL Server Database References

Reference Type

Link

SQL Server 2017 Download Page

https://www.microsoft.com/en-us/sql-server/sql-server-downloads

SQL SERVER version, edition, and update level

https://support.microsoft.com/en-us/help/321185/how-to-determine-the-version–edition-and-update-level-of-sql-server-a

SQL Server 2017 Release Notes

https://docs.microsoft.com/en-us/sql/sql-server/sql-server-2017-release-notes

SQL Server Transact SQL Commands

https://technet.microsoft.com/en-us/library/ms189826(v=sql.90).aspx

Related References

Netezza / PureData – How To Get A List Of When A Store Procedure Was Last Changed Or Created

Netezza / Puredata - SQL (Structured Query Language)

Netezza / Puredata – SQL (Structured Query Language)

In the continuing journey to track down impacted objects and to determine when the code in a database was last changed or added, here is another quick SQL, which can be used in Aginity Workbench for Netezza to retrieve a list of when Store Procedures were last updated or were created.

SQL List of When A Stored Procedure was Last Changed or Created

select t.database — Database
, t.OWNER — Object Owner
, t.PROCEDURE — Procedure Name
, o.objmodified — The Last Modified Datetime
, o.objcreated — Created Datetime

from _V_OBJECT o
, _v_procedure t
where
o.objid = t.objid
and t.DATABASE = ‘<<Database Name>>
order by o.objmodified Desc, o.objcreated Desc;

 

Related References

 

Netezza / PureData – How To Get a SQL List of When View Was Last Changed or Created

Netezza / PureData SQL (Structured Query Language)

Netezza / PureData SQL (Structured Query Language)

Sometimes it is handy to be able to get a quick list of when a view was changed last.  It could be for any number of reason, but sometimes folks just lose track of when a view was last updated or even need to verify that it hadn’t been changed recently.  So here is a quick SQL, which can be dropped in Aginity Workbench for Netezza to create a list of when a view was created or was update dated last.  Update the Database name in the SQL and run it.

SQL List of When A view was Last Changed or Created

select t.database — Database
, t.OWNER — Object Owner
, t.VIEWNAME — View Name
, o.objmodified — The Last Modified Datetime
, o.objcreated — Created Datetime

from _V_OBJECT o
,_V_VIEW_XDB t
where
o.objid = t.objid
and DATABASE = ‘<<Database Name>>
order by o.objcreated Desc, o.objmodified Desc;

Related References

 

Netezza / PureData – How To Quote a Single Quote in Netezza SQL

How To Quote a Single Quote in Netezza SQL?

The short answer is to use four single quotes (””), which will result in a single quote within the select statement results.

How to Assemble the SQL to Quote a Single Quote in a SQL Select Statement

Knowing how to construct a list to embed in a SQL where clause ‘in’ list or to add to an ETL job can be a serious time saver eliminating the need to manually edit large lists.  In the example below, I used the Select result set to create a rather long list of values, which needed to be included in an ELT where clause.  By:

  • Adding the comma delimiter (‘,’) and a Concatenate (||) on the front
  • Followed by adding a quoted single Quote (entered as four single quotes (””)) and a Concatenate (||)
  • The Field I which to have delaminated and Quoted (S1.ORDER_NUM)
  • And closed with a quoted single Quote (entered as four single quotes (””))

This results in a delimited and quoted list ( ,’116490856′) which needs only to have the parentheses added and the first comma removed, which is much less work than manually editing the 200 item that resulted from this select.

Example SQL:

SELECT Distinct

‘,’||””|| S1.ORDER_NUM||”” as Quoted_Order_Number

FROM Sales S1

 

How to Quote A Single Quote Example SQL

How to Quote A Single Quote Example SQL

Related Reference

DataStage – How to use single quoted parameter list in an Oracle Connector

Data Integration

Data Integration

While working with a client’s 9.1 DataStage version, I ran into a situation where they wanted to parameterize SQL where clause lists in an Oracle Connector stage, which honestly was not very straight forward to figure out.  First, if the APT_OSL_PARAM_ESC_SQUOTE is not set and single quotes are used in the parameter, the job creates unquoted invalid SQL when the parameter is populated.  Second, I found much of the information confusing and/or incomplete in its explanation.   After some research and some trial and error, here is how I resolved the issue.  I’ll endeavor to be concise, but holistic in my explanation.

When this Variable applies

This where I know this process applies, there may be other circumstances to which is this applicable, but I’m listing the ones here with which I have recent experience.

Infosphere Information Server Datastage

  • Versions 91, 11.3, and 11.5

Oracle RDBMS

  • Versions 11g and 12c

Configurations process

Here is a brief explanation of the steps I used to implement the where clause as a parameter.  Please note that in this example, I am using a job parameter to populate on a portion of the where clause, you can certainly pass the entire where clause as a parameter, if it is not too long.

Configure Project Variable in Administrator

  • Add APT_OSL_PARAM_ESC_SQUOTE to project in Administrator
  • Populate the APT_OSL_PARAM_ESC_SQUOTE Variable \
APT_OSL_PARAM_ESC_SQUOTE Project Variable

APT_OSL_PARAM_ESC_SQUOTE Project Variable

Create job parameter

Following your project name convention or standard practice, if you customer and/or project do not have established naming conventions, create the job parameter in the job. See jp_ItemSource parameter in the image below.

Job Parameter In Oracle Connector

Job Parameter In Oracle Connector

Add job parameter to Custom SQL in Select Oracle Connector Stage

On the Job parameter has been created, add the job parameter to the SQL statement of the job.

Job Parameter In SQL

Job Parameter In SQL

Related References

IBM Knowledge Center > InfoSphere Information Server 11.5.0

Connecting to data sources > Databases > Oracle databases > Oracle connector

IBM Support > Limitation of the Parameter APT_OSL_PARAM_ESC_SQUOTE on Plugins on Parallel Canvas

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and Quality > Stage > Reference > Parallel Job Reference > Environment Variables > Miscellaneous > APT_OSL_PARAM_ESC_SQUOTE

 

Netezza / PureData – How to build a multi table drop command from a select

Database Management

Database Management

How to Quick Drop Multiple Tables

occasionally, there is a need to quickly drop a list of tables and you don’t always want to write or generate each command individually in Aginity.  So, here is a quick example of how you can use a ‘Select’ SQL statement to generate a list of drop commands for you. Now, this approach assumes there is a common naming convention, so, you may need to adapt it to your needs.

An outline of the Drop Multiple Tables Process

Here is a quick summary of the steps to generate the drop statements from _V_Table:

  1. Build required Netezza SQL select; paying particular attention to the where clause criteria to exclude any unnecessary tables.
  2. Execute the SQL statement
  3. Copy from Aginity Results Tab without headers
  4. Past into new Aginity Query window
  5. validate that only the tables are in the list — No extras
  6. Click with the SQL Drop command list and Execute as a single batch

Example generate the drop statements

select  ‘Drop table ‘||tablename||’;’

from _V_TABLE

where tablename like ‘NZCC_TT_%’;

 

Related References

IBM Knowledge Center > PureData System for Analytics 7.2.1

IBM Netezza database user documentation > Netezza SQL command reference > Drop Table

How to know if your Oracle Client install is 32 Bit or 64 Bit

Oracle Database, How to know if your Oracle Client install is 32 Bit or 64 Bit

Oracle Database

 

How to know if your Oracle Client install is 32 Bit or 64 Bit

Sometimes you just need to know if your Oracle Client install is 32 bit or 64 bit. But how do you figure that out? Here are two methods you can try.

The first method

Go to the %ORACLE_HOME%\inventory\ContentsXML folder and open the comps.xml file.
Look for <DEP_LIST> on the ~second screen.

If you see this: PLAT=”NT_AMD64” then your Oracle Home is 64 bit
If you see this: PLAT=”NT_X86” then your Oracle Home is 32 bit.

It is possible to have both the 32-bit and the 64-bit Oracle Homes installed.

The second method

This method is a bit faster. Windows has a different lib directory for 32-bit and 64-bit software. If you look under the ORACLE_HOME folder if you see a “lib” AND a “lib32” folder you have a 64 bit Oracle Client. If you see just the “lib” folder you’ve got a 32 bit Oracle Client.

Related References

 

OLTP vs Data Warehousing

Database, OLTP vs Data Warehousing

Database

OLTP Versus Data Warehousing

I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager, and another. You get the picture.
In this article I will explain this very same thing to you. So you know  how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.

OLTP

OLTP stands for OLine Transactional Processing. With other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnoses and a treatment plan in there as well. Doctors, nurses and other people working with patients use this system on a daily basis to enter and get detailed information on their patients.
The way the data is stored within operational systems is so the data can be used efficiently by the people working directly on the product, or with the patient in this case.

Data Warehousing

A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day to day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnoses that takes more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.

The difference between OLTP and Data Warehousing

This is how a Data Warehouse works:

How a Data Warehouse works

How a Data Warehouse works

The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transforms the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that reports are formed with a reporting tool, from the data that lies in the Data Warehouse.

This is how OLTP works:

How OLTP works

How OLTP works

Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports form the operational databases.

Pro’s and Con’s

Data Warehousing

Pro’s:

  • There is no strain on the operational systems during business hours
    • As you can schedule the ETL processes to run during the hours the least amount of people are using the operational system, you won’t disturb the operational processes. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
  • Data from different systems can be combined
    • It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
  • Data is optimized for making queries and reports
    • You use different data in reports than you use on a day to day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month and year is saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
  • Data is saved longer than in the source systems
    • The source systems need to have their old records deleted when they are no longer used in the day to day operations. So they get deleted to gain performance.

Con’s:

  • You always look at the past
    • A Data Warehouse is updated once a night, or even just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients are in the hospital are right now. Or what surgeon didn’t show up on time this morning.
  • You don’t have all the data
    • A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
  • Data isn’t the same as the data in the source systems
    • Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example: the Data Warehouse at the hospital excluded all transactions that were marked as cancelled. If you try to get the same reports from both systems, and don’t exclude the cancelled transactions in the source system, you’ll get different results.

online transactional processing (OLTP)

Pro’s

  • You get real time data
    • If someone is entering a new record now, you’ll see it right away in your report. No delays.
  • You’ve got all the details
    • You have access to all the details that the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.

Con’s

  • You are putting strain on an application during business hours.
    • When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day to day operations. And if you make an error, by for instance forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
  • You can’t compare the data with data from other sources.
    • Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data is always going to be different because it is granulated on a different level, or not all data is relevant for both systems.
  • You don’t have access to old data
    • To keep the applications at peak performance, old data, that’s irrelevant to day to day operations is deleted.
  • Data is optimized to suit day to day operations
    • And not for report making. This means you’ll have to get creative with your queries to get the data you need.

So what method should you use?

That all depends on what you need at that moment. If you need detailed information about things that are happening now, you should use OLTP.
If you are looking for trends, or insights on a higher level, you should use a Data Warehouse.

 Related References

 

 

Oracle – How to get a list of user permission grants

IBM Infosphere Information Server (IIS), Oracle – How to get a list of user permission grants

IBM Infosphere Information Server (IIS)

Since the Infosphere, information server, repository, has to be installed manually with the scripts provided in the IBM software, sometimes you run into difficulties. So, here’s a quick script, which I have found useful in the past to identify user permissions for the IAUSER on Oracle database’s to help rundown discrepancies in user permissions.

 

SELECT *

FROM ALL_TAB_PRIVS

WHERE  GRANTEE = ‘iauser’

 

If we cannot run against the ALL_TAB_PRIVS view, then we can try the ALL_TAB_PRIVS view:

 

SELECT *

FROM USER_TAB_PRIVS

WHERE  GRANTEE = ‘iauser’

 

Related References

oracle help Center > Database Reference > ALL_TAB_PRIVS view

Netezza / PureData – Aginity for Netezza shortcut key list

Aginity for netezza shortcut key list

Aginity for Netezza

Recently, while working with a couple of my teammates on different projects I picked up a couple shortcut keys for Aginity for netezza, which I did not know existed. So, I thought about be nice to put a list of shortcut keys for future reference. I don’t use most of them very often, but I have flagged the ones that I have found to be frequently useful. I hope you find this useful as well.

Frequently Used By Me

Shortcut Keystrokes

Shortcut Description

Alt-C Complete Code Snippet
Alt-F4 Exit
Alt-Q Go to Query
Alt-R Go to Results
Alt-T Go to Tree
Atl-H User Query History
Ctrl-Alt-0 Goto Bookmark 0
Ctrl-Alt-1 Goto Bookmark 1
Ctrl-Alt-2 Goto Bookmark 2
Ctrl-Alt-3 Goto Bookmark 3
Ctrl-Alt-4 Goto Bookmark 4
Ctrl-Alt-5 Goto Bookmark 5
Ctrl-Alt-6 Goto Bookmark 6
Ctrl-Alt-7 Goto Bookmark 7
Ctrl-Alt-8 Goto Bookmark 8
Ctrl-Alt-9 Goto Bookmark 9
X Ctrl-Alt-C Comment Selection
Ctrl-Alt-Left Goto Previous Bookmark
Ctrl-Alt-Right Goto Next Bookmark
Ctrl-Alt-Shift-0 Set Bookmark 0
Ctrl-Alt-Shift-1 Set Bookmark 1
Ctrl-Alt-Shift-2 Set Bookmark 2
Ctrl-Alt-Shift-3 Set Bookmark 3
Ctrl-Alt-Shift-4 Set Bookmark 4
Ctrl-Alt-Shift-5 Set Bookmark 5
Ctrl-Alt-Shift-6 Set Bookmark 6
Ctrl-Alt-Shift-7 Set Bookmark 7
Ctrl-Alt-Shift-8 Set Bookmark 8
Ctrl-Alt-Shift-9 Set Bookmark 9
Ctrl-Alt-Shift-U Change select case
X Ctrl-Alt-U Uncomment Selection
Ctrl-Alt-W Word Wrap
X Ctrl-A Select All
Ctrl-B Toggle Object Browser
X Ctrl-C Copy
X Ctrl+Double click object name Find object in browser panel for current database
X Ctrl-F Find
Ctrl-F5 Execute as Single Batch
Ctrl-F6 Next Query Tab
Ctrl-G Goto Line
Ctrl-H Replace
Ctrl-N New Query Window
Ctrl-O Open SQL File
Ctrl-P Print
Ctrl-R Toggle
Ctrl-S Save Query
Ctrl-Shift-F6 Previous Query Tab
Ctrl-Shift-U Make selection UPPER case
Ctrl-T Add New Query Editor
Ctrl-U Make selection LOWER case
Ctrl-V Paste
X Ctrl-X Cut
X Ctrl-Y Redo
X Ctrl-Z Undo
F11 Toggle full screen
F12 Select Query at Cursor
 X F3 Find Again
F5 Execute
F8 Explain
F9 Toggle Bookmark
Shift-F5 Execute All

Related References

 

Netezza / PureData – Table Describe SQL

Netezza Puredata Table Describe SQL

Netezza / Puredata Table Describe SQL

If you want to describe a PureData / Netezza table in SQL, it can be done, but Netezza doesn’t have a describe command.  Here is a quick SQL, which will give the basic structure of a table or a view.  Honestly, if you have Aginity Generating the DDL is fast and more informative, at least to me.  If you have permissions to access NZSQL you can also use the slash commands (e.g. \d).

Example Netezza Table Describe SQL

select  name as Table_name,

owner as Table_Owner,

Createdate as Table_Created_Date,

type as Table_Type,

Database as Database_Name,

schema as Database_Schema,

attnum as Field_Order,

attname as Field_Name,

format_type as Field_Type,

attnotnull as Field_Not_Null_Indicator,

attlen as Field_Length

from _v_relation_column

where

name='<<Table Name Here>>’

Order by attnum;

 

Related References

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Command-line options for nzsql, Internal slash options

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza getting started tips, About the Netezza data warehouse appliance, Commands and queries, Basic Netezza SQL information, Commonly used nzsql internal slash commands

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL introduction, The nzsql command options, Slash options

 

 

Aginity For Netezza – How to Generate DDL

Aginity, Aginity for Netezza, Netezza, PureData, DDL, SQL

Aginity

How to Generate Netezza Object DDL

In ‘Aginity for Netezza’ this process is easy, if you have a user with sufficient permissions.

The basic process is:

  • In the object browser, navigate to the Database
  • select the Object (e.g. table, view, stored procedure)
  • Right Click, Select ‘Script’ > ‘DDL to query window’
  • The Object DDL will appear in the query window
Create DDL to Query Window

Create DDL to Query Window

Related References