OLTP vs Data Warehousing

Database, OLTP vs Data Warehousing

Database

OLTP Versus Data Warehousing

I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager, and another. You get the picture.
In this article I will explain this very same thing to you. So you know  how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.

OLTP

OLTP stands for OLine Transactional Processing. With other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnoses and a treatment plan in there as well. Doctors, nurses and other people working with patients use this system on a daily basis to enter and get detailed information on their patients.
The way the data is stored within operational systems is so the data can be used efficiently by the people working directly on the product, or with the patient in this case.

Data Warehousing

A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day to day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnoses that takes more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.

The difference between OLTP and Data Warehousing

This is how a Data Warehouse works:

How a Data Warehouse works

How a Data Warehouse works

The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transforms the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that reports are formed with a reporting tool, from the data that lies in the Data Warehouse.

This is how OLTP works:

How OLTP works

How OLTP works

Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports form the operational databases.

Pro’s and Con’s

Data Warehousing

Pro’s:

  • There is no strain on the operational systems during business hours
    • As you can schedule the ETL processes to run during the hours the least amount of people are using the operational system, you won’t disturb the operational processes. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
  • Data from different systems can be combined
    • It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
  • Data is optimized for making queries and reports
    • You use different data in reports than you use on a day to day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month and year is saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
  • Data is saved longer than in the source systems
    • The source systems need to have their old records deleted when they are no longer used in the day to day operations. So they get deleted to gain performance.

Con’s:

  • You always look at the past
    • A Data Warehouse is updated once a night, or even just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients are in the hospital are right now. Or what surgeon didn’t show up on time this morning.
  • You don’t have all the data
    • A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
  • Data isn’t the same as the data in the source systems
    • Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example: the Data Warehouse at the hospital excluded all transactions that were marked as cancelled. If you try to get the same reports from both systems, and don’t exclude the cancelled transactions in the source system, you’ll get different results.

online transactional processing (OLTP)

Pro’s

  • You get real time data
    • If someone is entering a new record now, you’ll see it right away in your report. No delays.
  • You’ve got all the details
    • You have access to all the details that the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.

Con’s

  • You are putting strain on an application during business hours.
    • When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day to day operations. And if you make an error, by for instance forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
  • You can’t compare the data with data from other sources.
    • Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data is always going to be different because it is granulated on a different level, or not all data is relevant for both systems.
  • You don’t have access to old data
    • To keep the applications at peak performance, old data, that’s irrelevant to day to day operations is deleted.
  • Data is optimized to suit day to day operations
    • And not for report making. This means you’ll have to get creative with your queries to get the data you need.

So what method should you use?

That all depends on what you need at that moment. If you need detailed information about things that are happening now, you should use OLTP.
If you are looking for trends, or insights on a higher level, you should use a Data Warehouse.

 Related References

 

 

Netezza / PureData – Casting Numbers to Character Data Type

Cast Conversion Format

Cast Conversion Format

I noticed that someone has been searching for this example on my site, so, here is a quick example of how to cast number data to a character data type.  I ran this SQL example in netezza and it worked fine.

Basic Casting Conversion Format

cast(<<FieldName>> as <<IntegerType_or_Alias>>) as <<FieldName>>

 

Example Casting Number to Character Data Type SQL

 

SELECT
—-Casting Integer to Character Data Type —————

SUBMITDATE_SRKY as  SUBMITDATE_SRKY_INTEGER
, cast(SUBMITDATE_SRKY as  char(10)) as Integer_to_CHAR
, cast(SUBMITDATE_SRKY as Varchar(10)) as Integer_to_VARCHAR
, cast(SUBMITDATE_SRKY as Varchar(10)) as Integer_to_VARCHAR
, cast(SUBMITDATE_SRKY as Nchar(10)) as Integer_to_NCHAR
, cast(SUBMITDATE_SRKY as NVarchar(10)) as Integer_to_NVARCHAR

—-Casting Double Precision Number to Character Data Type —————

, HOST_CPU_SECS as DOUBLE_PRECISION_NUMBER
, cast(HOST_CPU_SECS as  char(30)) as DOUBLE_PRECISION_to_CHAR
, cast(HOST_CPU_SECS as Varchar(30)) as DOUBLE_PRECISION_to_VARCHAR
, cast(HOST_CPU_SECS as Varchar(30)) as DOUBLE_PRECISION_to_VARCHAR
, cast(HOST_CPU_SECS as Nchar(30)) as DOUBLE_PRECISION_to_NCHAR
, cast(HOST_CPU_SECS as NVarchar(30)) as DOUBLE_PRECISION_to_NVARCHAR

—-Casting Numeric to Character Data Type —————

, TOTALRUNTIME  as NUMERIC_FIELD
, cast(TOTALRUNTIME as  char(30)) as NUMERIC_FIELD_to_CHAR
, cast(TOTALRUNTIME as Varchar(30)) as NUMERIC_FIELD_to_VARCHAR
, cast(TOTALRUNTIME as Varchar(30)) as NUMERIC_FIELD_to_VARCHAR
, cast(TOTALRUNTIME as Nchar(30)) as NUMERIC_FIELD_to_NCHAR
, cast(TOTALRUNTIME as NVarchar(30)) as NUMERIC_FIELD_to_NVARCHAR
FROM NETEZZA_QUERY_FACT ;

Related References

IBM, IBM Knowledge Center, PureData System for Analytics,Version 7.2.1

IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

IBM, IBM Knowledge Center, PureData System for Analytics,Version 7.2.1

IBM Netezza database user documentation,SQL statement grammar,Explicit and implicit casting, Summary of Netezza casting

IBM, IBM Knowledge Center, PureData System for Analytics,Version 7.2.1

IBM Netezza database user documentation ,Netezza SQL basics,Netezza SQL extensions

 

Databases – Database Isolation Level Cross Reference

Database Type Isolation Levels Cross Reference

Database And Tables

 

Here is a table quick reference of some common database and/or connection types, which use connection level isolation and the equivalent isolation levels. This quick reference may prove useful as a job aid reference, when working with and making decisions about isolation level usage.

Database isolation levels

Data sources

Most restrictive isolation level

More restrictive isolation level

Less restrictive isolation level

Least restrictive isolation level

Amazon SimpleDB

Serializable Repeatable read Read committed Read Uncommitted

dashDB

Repeatable read Read stability Cursor stability Uncommitted read

DB2® family of products

Repeatable read Read stability* Cursor stability Uncommitted read

Informix®

Repeatable read Repeatable read Cursor stability Dirty read

JDBC

Serializable Repeatable read Read committed Read Uncommitted

MariaDB

Serializable Repeatable read Read committed Read Uncommitted

Microsoft SQL Server

Serializable Repeatable read Read committed Read Uncommitted

MySQL

Serializable Repeatable read Read committed Read Uncommitted

ODBC

Serializable Repeatable read Read committed Read Uncommitted

Oracle

Serializable Serializable Read committed Read committed

PostgreSQL

Serializable Repeatable read Read committed Read committed

Sybase

Level 3 Level 3 Level 1 Level 0

 

Related References

 

What are the dashDB isolation levels?

dashDB

dashDB

 

Isolation levels are part of the ACID (Atomicity, Consistency, Isolation, Durability) paradigms in database control.  Isolation levels allow developers and user to trade-off consistency for a potential gain in performance. Therefore, it is essential to understand them and how the apply in structured Query Language(SQL).  The dashDB RDBMS has four isolations levels:

Repeatable Read (RR)

  • The repeatable read (RR) isolation level locks all the rows that an application references during a unit of work (UOW). If an application issues a SELECT statement twice within the same unit of work, the same result is returned each time. Under RR, lost updates, access to uncommitted data, non-repeatable reads, and phantom reads are not possible.
  • Under RR, an application can retrieve and operate on the rows as many times as necessary until the UOW completes. However, no other application can update, delete, or insert a row that would affect the result set until the UOW completes. Applications running under the RR isolation level cannot see the uncommitted changes of other applications. This isolation level ensures that all returned data remains unchanged until the time the application sees the data, even when temporary tables or row blocking is used.
  • Every referenced row is locked, not just the rows that are retrieved. For example, if you scan 20 000 rows and apply predicates to them, locks are held on all 20 000 rows, even if, say, only 200 rows qualify. Another application cannot insert or update a row that would be added to the list of rows referenced by a query if that query were to be executed again. This prevents phantom reads.
  • Because RR can acquire a considerable number of locks, this number might exceed limits specified by the locklist and maxlocks database configuration parameters. To avoid lock escalation, the optimizer might elect to acquire a single table-level lock for an index scan, if it appears that lock escalation is likely. If you do not want table-level locking, use the read stability isolation level.
  • While evaluating referential constraints, the dashDB might, occasionally, upgrade the isolation level used on scans of the foreign table to RR, regardless of the isolation level that was previously set by the user. This results in additional locks being held until commit time, which increases the likelihood of a deadlock or a lock timeout. To avoid these problems, create an index that contains only the foreign key columns, which the referential integrity scan can use instead.

Read Stability (RS)

  • The read stability (RS) isolation level locks only those rows that an application retrieves during a unit of work. RS ensures that any qualifying row read during a UOW cannot be changed by other application processes until the UOW completes, and that any change to a row made by another application process cannot be read until the change is committed by that process. Under RS, access to uncommitted data and non-repeatable reads are not possible. However, phantom reads are possible. Phantom reads might also be introduced by concurrent updates to rows where the old value did not satisfy the search condition of the original application but the new updated value does.
  • For example, a phantom row can occur in the following situation:
    • Application process P1 reads the set of rows n that satisfy some search condition.
    • Application process P2 then inserts one or more rows that satisfy the search condition and commits those new inserts.
    • P1 reads the set of rows again with the same search condition and obtains both the original rows and the rows inserted by P2.
  • In a dashDB environment, an application running at this isolation level might reject a previously committed row value, if the row is updated concurrently on a different member. To override this behavior, specify the WAIT_FOR_OUTCOME option.
  • This isolation level ensures that all returned data remains unchanged until the time the application sees the data, even when temporary tables or row blocking is used.
  • The RS isolation level provides both a high degree of concurrency and a stable view of the data. To that end, the optimizer ensures that table-level locks are not obtained until lock escalation occurs.
  • The RS isolation level is suitable for an application that:
    • Operates in a concurrent environment
    • Requires qualifying rows to remain stable for the duration of a unit of work
    • Does not issue the same query more than once during a unit of work, or does not require the same result set when a query is issued more than once during a unit of work

Cursor Stability (CS)

  • The cursor stability (CS) isolation level locks any row being accessed during a transaction while the cursor is positioned on that row. This lock remains in effect until the next row is fetched or the transaction terminates. However, if any data in the row was changed, the lock is held until the change is committed.
  • Under this isolation level, no other application can update or delete a row while an updatable cursor is positioned on that row. Under CS, access to the uncommitted data of other applications is not possible. However, non-repeatable reads and phantom reads are possible.
  • Cursor Stability (CS) is the default isolation level.
  • Cursor Stability (CS) is suitable when you want maximum concurrency and need to see only committed data.
  • In a dashDB environment, an application running at this isolation level may return or reject a previously committed row value, if the row is concurrently updated on a different member. The WAIT FOR OUTCOME option of the concurrent access resolution setting can be used to override this behavior.

Uncommitted Read (UR)

  • The uncommitted read (UR) isolation level allows an application to access the uncommitted changes of other transactions. Moreover, UR does not prevent another application from accessing a row that is being read, unless that application is attempting to alter or drop the table.
  • Under UR, access to uncommitted data, non-repeatable reads, and phantom reads are possible. This isolation level is suitable if you run queries against read-only tables, or if you issue SELECT statements only, and seeing data that has not been committed by other applications is not a problem.
  • Uncommitted Read (UR) works differently for read-only and updatable cursors.
  • Read-only cursors can access most of the uncommitted changes of other transactions.
  • Tables, views, and indexes that are being created or dropped by other transactions are not available while the transaction is processing. Any other changes by other transactions can be read before they are committed or rolled back. Updatable cursors operating under UR behave as though the isolation level were CS.
  • If an uncommitted read application uses ambiguous cursors, it might use the CS isolation level when it runs. To prevent this escalation, modify the cursors in the application program to be unambiguous and/or Change the SELECT statements to include the for read-only

 

Related References

IBM dashDB

Accessing remote data sources with fluid queries on dashDB Local, Developing for federation

 

InfoSphere / Datastage – What are The support Connectors stages for dashDB?

dashDB

dashDB

In a recent discussion, the question came up concern which Infosphere Datastage connectors and/or stages are supported by IBM for dashDB.  So, it seems appropriate to share the insight gained from the question being answered.

What Datastage Connectors and/or stages are Supported for dashDB

You have three choices as to connectors, which may best meet you your needs based on the nature of your environment and the configuration chooses which have been applied:

  1. The DB2 Connector Stage
  2. The JDBC Connector stage
  3. The ODBC Stage

Related References

Connecting to IBM dashDB

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Information Server on Cloud offerings, Connecting to other systems, Connecting to IBM dashDB

DB2 connector

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, IBM DB2 databases, DB2 connector

ODBC stage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Older stages for connectivity, ODBC stage

JDBC data sources

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Multiple data sources, JDBC data sources

Netezza – [SQLCODE=HY000][Native=46] ERROR: External Table : count of bad input rows reached maxerrors limit

SQL (Structured Query Language)

SQL (Structured Query Language)

While helping a customer we encountered the [SQLCODE=HY000][Native=46] ERROR, which was a new one for me. So here are a few notes to help the next unlucky soul may run into the error.

Netezza Error Reason:

  • [SQLCODE=HY008][Native=51] Operation canceled; [SQLCODE=HY000][Native=46] ERROR: External Table : count of bad input rows reached maxerrors limit

What Does the Error Mean

  • In a nutshell, it mean invalid data was submitted and could not be inserted.

What To Do

  • Basically, you need to go to the Netezza logs to see why the rows were reject and resolve input data error, then resubmit your transactions. The logs are temporary and reused, so, you need to get to them before they are over written.

Where Are The Data Logs

  • In linux the logs can be found in /tmp:

For nzload Methods Logs

  • /tmp/database name.table name.nzlog
  • /tmp/database name.table name.nzbad

For External Table Load Logs

  • /tmp/external table name.log
  • /tmp/external table name.bad

Related References

 

Netezza / PureData – SQL Cast Conversion to Integers

Cast Conversion Format, netezza sql cast function, Netezza PureData SQL Cast Conversion to Integers

Cast Conversion Format

From time to time, I have had to look up this information, especially, when working with old school SQL intensive ETL patterns where the transformations are being performed in SQL rather within the ETL application.  So, rather than scrambling to find the differing integer casting range limits, bit type conversions, and alias for different integer types, and pieces it together, I thought I would consolidate it here for future reference.

There are some important rules in the summary of Netezza casting reference below, but here are a few, which I have found useful to remember:

  • It is safer to convert from a small to a larger integer type, otherwise the input values may not fit
  • You may need to apply other data treatments to handle special characters, and decimals to prevent casting errors.

Basic Casting Conversion Format

cast(<<FieldName>> as <<IntegerType_or_Alias>>) as <<FieldName>>

 

Example Casting Integers SQL

Select

cast(127 as BYTEINT ) as  BYTEINT_8_Bit,

cast(127 as Int1) as  BYTEINT_ALIAS,

cast(32767 as SMALLINT ) as  SMALLINT_16_Bit,

cast(32767  as Int ) as SMALLINT_ALIAS,

cast(2147483647 as INTEGER ) as INTEGER_32_Bit,

cast(2147483647 as Int) as INTEGER_ALIAS_1,

cast(2147483647 as Int4 ) as INTEGER_ALIAS_2,

cast(9223372036854775807 as BIGINT ) as BIGINT_64_bit,

cast(9223372036854775807 as Int8 ) as BIGINT_ALIAS_1

From _v_dual;

 

Table Of Netezza Integer Types

Type

Alias Names

Bit Size

Value Range

BYTEINT INT1 8-bit  –128 to 127
SMALLINT INT2 16-bit  –32,768 to 32,767
INTEGER INT, INT4 32-bit  –2,147,483,648 to 2,147,483,647
BIGINT INT8 64-bit -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Related References

 

IBM, IBM Knowledge Center, PureData System for Analytics,Version 7.2.1

IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

IBM, IBM Knowledge Center, PureData System for Analytics,Version 7.2.1

IBM Netezza database user documentation,SQL statement grammar,Explicit and implicit casting, Summary of Netezza casting

How to get a previously run SQL in Aginity

Every once and a while, it happens that a SQL needs to be reused, but for whatever reason it was not saved.  If it wasn’t too far back in time, you can recover the SQL from the User Query History.  This is a fairly straight forward process, if not exactly obvious.

The basic process is:

  • Connect to the database
  • Navigate to View > User Query History
Aginity View User Query History

Aginity View User Query History

  • Select the SQL Statement you want to retrieve

note: you want to select a databases and or user to narrow you rows returned.

  • Right Click to display command menu
Copy SQL Statement from Aginity User Query History

Copy SQL Statement from Aginity User Query History

  • Select ‘Copy SQL Statement’
  • Now you can paste the SQL in any Aginity Query Tab of your choosing

Netezza Connector Stage, Table name required warning for User-defined SQL Write mode

Recently, while working at a customer site and I encountered an anomaly in the Netezza Connector stage, when choosing ‘User-defined SQL’ write mode, the ‘Table name’ displays a caution / warning even though a table name should not be required.  If you are using a user-defined SQL statement and/or have parametrized your SQL scripts to make the job reusable, each SQL and/or SQL script would have its own schema and table name being passed in.  After some investigation, a workaround was found, which both allows you to populate table name and leverage with different schema and table names within your SQl statement and/or.

Table Name, User-defined SQL, Warning

You will notice, in a screen shot below the ‘User-defined SQL’, ‘write mode’, property has been chosen, a parameter has been placed in the ‘User-defined SQL’ property, and ‘Read user defined SQL from a file’ property has been set to ‘Yes’.  However, yellow triangle displays on the ‘Table name’ property marking it as a required item.  This, also, occurs when placing SQL statements in the User-defined SQL property, whether reading from a file of not.

Netezza Connector User-Defined SQL , Table Name Required , Warning

Netezza Connector User-Defined SQL , Table Name Required , Warning

Table Name, User-defined SQL, Warning Workaround

After some experimentation, the workaround is straight forward enough.  Basically, give the ‘table name’ property something to read successfully, so it can move on to the user-defined SQL and/or user defined SQl file script, which the process actually needs to execute. In the screenshot below, the SYSTEM.DEFINITION_SCHEMA._V_DUAL view was used, so, it could be found, then the script file passed by the parameter runs fine.  Another view or table, which the DataStage user has access to, should just as well.

Netezza Connector, User-Define SQL, Table Name Required Fix

Netezza Connector, User-Define SQL, Table Name Required Fix

Related References

 

Netezza / PureData – How to convert an epoch field to a timestamp

Netezza PureData convert an epoch field to a timestamp

How To Convert And Epoch Field To A Timestamp

This is one of the item, which we need to do in Netezza SQL from time to time, but is not exactly obvious to the average SQL user.  So, having a pattern for making the conversion for a epoch field to a timestamp can be a real time saver.

Epoch Field Conversion to A Timestamp Format

to_timestamp(‘1970-01-01 00:00:00′,’YYYY-MM-DD HH24:MI:SS’) + (<> * interval ‘.000001 seconds’) as Entry_TS_Timestamp_Format

Epoch Timestamp Conversion Example SQL 1

select

entry_ts as Orginial_Format_Entry_TS,

to_timestamp(‘1970-01-01 00:00:00′,’YYYY-MM-DD HH24:MI:SS’) + (entry_ts * interval ‘.000001 seconds’) as Entry_TS_Timestamp_Format

from _v_system_util

limit 1;

Epoch Timestamp Conversion Example SQL 2

select

entry_ts as Orginial_Format_Entry_TS,

to_date(‘1/1/1970′,’MM/DD/YYYY’) + (entry_ts * interval ‘.000001 seconds’) as Entry_TS_Timestamp_Format

from _v_system_util

limit 1;

Related References

Netezza / PureData – Row number within a group

Netezza / PureData – row number within a group

SQL (Structured Query Language)

Sometimes it is useful to be able number data with it groups, which can be quickly and easily done suing the Row Number function in Netezza. However, you do need to pay particular attention to you ‘partition by’ (conceptually a group by operation) field list and field order, as well as, you ‘Order by’ field list.  Otherwise, you can get some bizarre  results.  it find it helpful to think of them as having a parent, child, relationship.  In this line of thought, your ‘partition by’ field list should usually have one less more field than than your ‘order by’.  This is not an absolute rule, but under normal circumstances your  ‘order by’ field list should contain all the field in your ‘partition by’  and, occasionally, provide a orby direction (ascending or descending).

SQL Netezza Row Number Format

ROW_NUMBER() OVER( PARTITION BY <<Partition_By_Field_list>> ORDER BY <<Sort_By_Field_List [direction asc/desc]>> ) as ROW_NUM

 

Example Row Number SQL

Row Number Within A Group

Row Number Within A Group

SELECT

qh_database

,qh_sessionid

, qh_tend

,ROW_NUMBER() OVER( PARTITION BY qh_database, qh_sessionid

ORDER BY qh_database, qh_sessionid, qh_tend desc ) as ROW_NUM

FROM nz_query_history

where qh_database = ‘system’

order by qh_database, qh_sessionid, qh_tend desc

 

Related References

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Character string functions

IBM Knowledge Center, PureData System for Analytics, Version 7.1.0

IBM Netezza Database User’s Guide, Netezza SQL basics, Functions and operators, Functions, Standard string functions

 

 

 

 

 

 

Netezza / PureData – How to find and kill table locks

Netezza/PureData - How to find and kill table locks

Netezza/PureData Table Session Locks

Sometimes there is a need to find and/or kill (terminate) table locks, so, that application process and user access can be restored.  To do this relative straight forward if you have access and the appropriate permission to the Netezza PureData server.

How to find table locks on a Netezza database

  • Log into the Netezza server
  • From the command line, navigate to Netezza directory (e.g. cd /NZ)
  • On the command line enter show locks command

 

Show Locks Command (nz_show_locks) Syntax

nz_show_locks <db name> <tablename>

 

Example Show Locks Command (nz_show_locks)

nz_show_locks dashboard_staging stg_nz_query_history

Netezza PureData Kill Table Session locks

Netezza PureData Kill Table Session locks

How to kill table locks on a Netezza database

  • Perform find locks above
  • Then, on the command line enter kill sessions locks command

Kill Sessions Command (nzsession) Syntax

nzsession subcmd [subcmd options]

Example Kill Sessions Command (nzsession)

nzsession abort -id  523662 -force

Related References:

 

l

 

 

 

 

 

 

 

Netezza / PureData – CURRENT_TIMESTAMP Function

Netezza PureData Analytics CURRENT_TIMESTAMP Function

Netezza / PureData Analytics Current Timestamp

The Current_Timestamp function, returns the current date and time, from PureData Analytics (Netezza) and is an easy way to retrieve the current timestamp via SQL.

Example SQL

Select CURRENT_TIMESTAMP as “Current Time”

from _V_DUAL;

Related Reference

Netezza / PureData – Case Statement Example

Netezza PureData Case statement Function example

Netezza PureData Case Function

The Case Statement, is one which I occasionally use within Netezza/PureData SQL and Stored Procedures.  Basically, the Case Function provides an ‘IF-THEN-ELSE’ decision capability.  This ‘IF-THEN-ELSE’ capability allows the evaluation conditions and return a value when the first condition is met and/ or else break out logic, if the condition or conditions are not met.

Case Function Language Structure

·       Case Function Basic Structure

CASE

WHEN <search-condition-1> THEN <result-1>

WHEN <search-condition-2> THEN <result-2>

WHEN <search-condition-n> THEN <result-n>

ELSE <default-result>

END

·       Case Function Simple Structure

CASE

WHEN <search-condition-1> THEN <result-1>

ELSE <default-result>

END

Case Function Embedded in SQL Select Example

Case Function Embedded In SQL Select Statement

Case Function Embedded In SQL Select Statement

 

SQL Used in Embedded Example

Select

now() as “Time”,

current_date as “Today”,

(

case

when (DATE_PART(‘HOUR’,NOW())>12 )

THEN date(current_date +  cast(‘1 days’ as interval))

ELSE  current_date

END ) as “PROCESSING_DATE”

From ADMIN._v_dual;

 

Related References

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL basics, Functions and operators, Functions

 

 

DB2 JDBC ISJDBC.CONFIG Configuration

JDBC ( Java Database Connectivity)

JDBC ( Java Database Connectivity)

Here are a few pointers for building an IBM InfoSphere Information Server (IIS) isjdbc.config file for an IBM DB2 Universal Driver, Type 4.

Where to place JAR files

For Infosphere Information Server installs, as a standard practice, create a custom jdbc file in the install path.  And install any download Jar file not already installed by other applications in the jdbc folder. Usually, jdbc folder path looks something like this:

  • /opt/IBM/InformationServer/jdbc

CLASSPATH

  • db2jcc.jar
  • Classpath must have complete path and jar name

CLASS_NAMES

  • ibm.db2.jcc.DB2Driver

JAR Source URL

DB2 DEFAULT PORT

  • 5000

JDBC URL FORMAT

  • jdbc:db2://[:]/

JDBC URL EXAMPLE

jdbc:db2://127.0.0.1:50000/IADB

isjdbc.config EXAMPLE

CLASSPATH=/opt/IBM/InformationServer/ASBNode/lib/java/db2jcc.jar;

CLASS_NAMES=com.ibm.db2.jcc.DB2Driver;

 

Isjdbc.config FILE PLACEMENT

  • /opt/IBM/InformationServer/Server/DSEngine

Related References

 

Data Warehouse – Effective Practices

Methodology

Methodology

Effective Practices

Effective practices are enablers, which can improve performance, data availability, environment stability, resource consumption, and data accuracy.

Use of an Enterprise Scheduler

The scheduling service in InfoSphere information Server (IIS) leverages the operating system (OS) scheduler, the common enterprise scheduler can provide these capabilities beyond those of a common OS scheduler:

  • Centralized control, monitoring, and maintenance of job stream processes
  • Improved insight into and control of cycle processes
  • Improved intervention capabilities, including alerts, job stream suspension, auto-restarts, and upstream/downstream dependency monitoring
  • Reduced time-to-recovery and increased flexibility in recovery options
  • Improved ability to monitor and alert for mission critical process that may be delayed or failing
  • Improved ability to automate disparate process requirements within and across systems
  • Improved load balancing to optimize use of resources or to compensate for loss of a given resource
  • Improved scalability and adaptability to infrastructure or application environment changes

Use of data Source Timestamps

When they exist or can be added to data, ‘created’ and ‘last updated’ timestamps can greatly reduce the impact of Change Data Capture (CDC) operations.  Especially, if the data warehouse, data model and load process store that last success run time of CDC jobs. This reduces the number of rows required to be processed and reduces the load on the RDBMS and/or ETL application server.  Leveraging ‘created’ and ‘last updated’ can, also, greatly reduce processing time required to perform the same CDC processes.

Event Based Scheduling

Event based scheduling, when coupled with an Enterprise scheduler, can increase data availability, distribute work opportunistically. Event based scheduling can allow all or part of a process stream to begin as soon as predecessor data sources have completed the requisite processes.  This can allow processes to begin soon as possible, which can reduce resource bottlenecks and contention. This, potentially, allows data to be made available earlier than a static time based schedule.  Event based scheduling can also delay processing, should the source system requisite processing completion be delayed; thereby, improving data accuracy in the receiving system.

Integrated RDBMS Maintenance

Integrating RDBMS Maintenance into the process job stream can perform on demand optimization as the processes move through their flow, improving performance.  Items such as indexing, distribution, and grooming, maintenance at key points ensures that the data structures are optimized for follow on processes to consume.

Application Server and Storage  Space Monitoring and Maintenance

Monitoring and actively clearing disk space can not only improves overall performance, and reduce costs, but it also improves application stability.

Data Retention Strategies

Data Retention strategies, an often overlooked form of data maintenance, which deals with establishing policies ensure only truly necessary data is kept and that information by essential category, which is no longer necessary is purged to limit legal liability, limit data growth, storage costs, and improve RDBMS performance.

Use Standard Practices

Use of standard practices both, application and industry, allows experienced resources to more readily understand the major application activities, their relationships, dependency, design and code.  This facilitates resourcing and support over the life cycle of the application.

 

Netezza / PureData – how to set processing date to next day based on a timestamp

hey are a variety of uses for this logic within Netezza / Puredata SQL, when working with interfaces, ETL, ELT, Snapshot facts and/or tables, setting the processing / run control date, and the list goes on.  However, rather than reinventing this logic each time, I thought I would save this code snippet for future use and reference.

What this code does is straight forwarded enough, it:

  • Determines the current time of day using the Now()
  • Determines the our from the time using the Date_Part function
  • Determines the current date using the current_date function
  • Then, using the Case function set the processing date based on the current hour of the day

SQL used to calculate processing date

Select

now() as “Time”,

current_date as “Today”,

(case

when (DATE_PART(‘HOUR’,NOW())>12 )

THEN date(current_date +  cast(‘1 days’ as interval))

ELSE  current_date

END ) as “PROCESSING_DATE”

From ADMIN._v_dual;

·       Morning Example of SQL Results

Processing Date Morning Example

Processing Date Morning Example

·       Afternoon Example of SQL Results

Processing Date Afternoon Hour Example

Processing Date After Noon Hour Example

Related References

 

Structured Query Language (SQL) Tuning

Structured Query Language (SQL) Tuning

Structured Query Language (SQL) Tuning

Tuning SQL is one of those skills, which is part art and part science.  However, there are a few fundamental approaches, which can help ensure optimal SQL select statement performance.

Structuring your SQL

By Structuring SQL Statements, much performance can be gained through good SQL statement organization and sound logic.

Where Clause Concepts:

Use criteria ordering and Set Theory thinking.  SQL  can be coupled with set-theory to aid conception of the operations being conducted. Order your selection criteria  to execute criteria which arrives at the smallest possible row set first. Doing so, reduces the volume of rows to be processed by follow-on operations. This does require an understanding of the data relationships to be effective.

SQL First Select Criteria Pie Chart

SQL First Select Criteria Pie Chart

 

Join Rules (equi joins, etc.)

When  constructing your joins, consider these rules:

  • Join on keys and indexed columns: The efficiency of your program improves when tables are joined based on indexed columns, rather than on non-indexed ones.
  • Use equi-joins (=), whenever possible
  • Avoid using of sub-queries
  • Re-write EXISTS and NOT EXISTS subqueries as outer joins
  • Avoid OUTER Joins on fields containing nulls
  • Avoid RIGHT OUTER JOINS: Always select FROM your primary table (or derived table) and LEFT OUTER JOIN to auxiliary tables.
  • Use Joins Instead of Subqueries: A join can be more efficient than a correlated subquery or a subquery using IN. Use caution when specifying ORDER BY with a Join: When the results of a join must be sorted, limiting the ORDER BY to columns of a single table can cause the database to avoid a sort.
  • Provide Adequate Search Criteria: When possible, provide additional search criteria in the WHERE clause for every table in a join. These criteria are in addition to the join criteria, which are mandatory to avoid Cartesian products

Order of Operations SQL & “PEMDAS”

To improve your SQL, careful attention needs to be paid to the mathematical order of operations; especially, parentheses since they not only set the order of operation, but also the boundaries of each subset operation.

  • PEMAS is “Parentheses, Exponents, Multiplication and Division, and Addition and Subtraction”.
  • Use parentheses () to group and specify the order of execution. SQL observes the normal rules of arithmetic operator precedence.
Precedence Operator(s) Operation(s) Notes
1 ( ) Parentheses If the parentheses are nested, the expression in the innermost pair is evaluated first. If there are several un-nested parentheses, then parentheses are evaluated left to right.
2 *
/
%
Multiplication
Division
Modulus
If there are several, evaluation is left to right.
3 +
Addition
Subtraction
If there are several, evaluation is left to right.

 

Index Leveraging (criteria ordering, hints, append, etc.)

  • Avoid Full Table Scans: within the scope of a SQL statement, there are many conditions that will cause the SQL optimizer to invoke a full-table scan.  Avoid Queries:
  • with NULL Conditions (Is NUll, Is Not NUll)
  • Against Unindexed Columns
  • with Like Conditions
  • with Not Equals Condition (<>, !=, not in)
  • with use built-in Function (to_char, substr, decode, UPPER)
  • Use UNION ALL instead of UNION if business rules allow
  • UNION: Specifies that multiple result sets are to be combined and returned as a single result set. Query optimizer performs extra work to return to avoid duplicate rows.
  • UNION ALL: Incorporates all rows into the results. This includes duplicates. Query optimizer just needs to concatenate the result sets with no extra work
  • Use stored procedures instead of ad hoc queries when possible. Stored procedures are precompiled and cached
  • Avoid cursor use when possible
  • Select only the rows needed
  • Use NOLOCK hint in select statement to avoid blocking
  • Commit transactions in smaller batches
  • Whenever possible use tables instead of views
  • Make sure comparison columns whether using JOIN or WHERE clause are exactly same data type. For example if we are comparing Varchar column to nchar columns the query optimizer has to do a CONVERT before comparing the values

Note: You do not necessarily need to remove all full table scans from your query’s execution plan. Tables with few rows, few columns, or thin columns may fit into few database blocks. In this case, a full table scan will always be the most efficient access

Oracle SQL – Trimming a Specific Literal from a Field

Oracle Database

Oracle Database

I have frequently used the trim command for one purpose or another, but I have not used it to extract a literal out of a field, which I recently used to extract invalid values from a field in SQL.  So, I am providing a sample SQL for Future reference.

 

SQL Pattern:

  Select trim (‘I’ FROM <<FieldName>>)  AS <<FieldName>>

From <<TableName>>;

 

Substitution example:

In this example an invalid indicator (‘Y’) is being removed, performing a Null Value Logic (NVL) substitution to a integer compatible value, then the field is converted to an integer.

 

Select

CAST( NVL(trim (‘Y’ FROM AA.Status_ID) ,’0′)AS                   INT)         AS PACK_SZ

From  Inventory_code_TBL AA;

 

Reference Links: