An Overview of DB2 Federation

DB2 Federation

DB2 Federation

Data analytics has changed where data is no longer manageable in relational databases only. Data is flowing from various sources which are not of the same format. This means it is not possible to store all data in the same repository. Some are best suited for storing in relational databases, others for Apache Hadoop while others are best suited for NoSQL databases.

During data analyzing, so much time is taken in trying to bring the distributed data together instead of obtaining insights. Db2 Federation has come to the rescue of data analysts. Federation concept in db2 eliminates the need for storing data in different repositories and reduces the hassle of getting insights.

What is DB2 Federation?

DB2 federation is a data integration technology that permits remote database objects to be accessed as local DB2 database objects. This technology connects multiple databases and makes them appear like one database.

How does DB2 federation work?

Federation allows you to access all of your data that is on multiple distributed databases using a single query. When implemented in an organization, this technology can be used to access data that is on any of the organization’s Db2, whether local or in the cloud.

Why use DB2 federation?

So, why should you use the federation? This concept brings data of all formats into one virtual source. With data being retrieved from one virtual source, analyzing it becomes cost-effective and efficient.

What are its primary use cases for DB2 federation?

Merging of various sources of data

DB2 federation facilitates consolidating of data from sources data local and cloud to form one virtual data source. This eliminates the process of migrating data which can be expensive and troublesome.

Increase the capacity of a repository beyond the fixed limits

Physical storage capacity is bound to have a limit which is one reason you may find an organization has distributed its data in various repositories. With federation, the storage is virtual and therefore doesn’t have any limit. This technology can greatly help you if your physical dataset is running low on space.

Linking up to Db2 Warehouse on Cloud

People who use Db2 products can federate data from Db2 on Cloud and Db2 Warehouse on the Cloud. This will give them a joint interface where they can access, add, query, and analyze data without encountering the complex ETL processes. Better still, no additional code will be required to execute all these processes. This makes it easy for people with the low technical know-how to use these products smoothly.

Split data across different servers

At times, you might choose to partition your data. With federation integration technology, partitioned data can be queried with a unified interface. Federation allows you to better balance your workloads, scale precise parts of an app, and create micro-services that work harmoniously.

Generally, db2 federation makes it access data by bringing it together into a single virtual source. This brings about cost and time-saving benefits. When you want to analyze data, you can get insights immediately instead of spending a lot of time querying through repositories.

Related References

OLTP vs Data Warehousing

Database, OLTP vs Data Warehousing

Database

OLTP Versus Data Warehousing

I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager, and another. You get the picture.
In this article I will explain this very same thing to you. So you know  how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.

OLTP

OLTP stands for OLine Transactional Processing. With other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnoses and a treatment plan in there as well. Doctors, nurses and other people working with patients use this system on a daily basis to enter and get detailed information on their patients.
The way the data is stored within operational systems is so the data can be used efficiently by the people working directly on the product, or with the patient in this case.

Data Warehousing

A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day to day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnoses that takes more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.

The difference between OLTP and Data Warehousing

This is how a Data Warehouse works:

How a Data Warehouse works

How a Data Warehouse works

The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transforms the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that reports are formed with a reporting tool, from the data that lies in the Data Warehouse.

This is how OLTP works:

How OLTP works

How OLTP works

Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports form the operational databases.

Pro’s and Con’s

Data Warehousing

Pro’s:

  • There is no strain on the operational systems during business hours
    • As you can schedule the ETL processes to run during the hours the least amount of people are using the operational system, you won’t disturb the operational processes. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
  • Data from different systems can be combined
    • It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
  • Data is optimized for making queries and reports
    • You use different data in reports than you use on a day to day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month and year is saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
  • Data is saved longer than in the source systems
    • The source systems need to have their old records deleted when they are no longer used in the day to day operations. So they get deleted to gain performance.

Con’s:

  • You always look at the past
    • A Data Warehouse is updated once a night, or even just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients are in the hospital are right now. Or what surgeon didn’t show up on time this morning.
  • You don’t have all the data
    • A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
  • Data isn’t the same as the data in the source systems
    • Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example: the Data Warehouse at the hospital excluded all transactions that were marked as cancelled. If you try to get the same reports from both systems, and don’t exclude the cancelled transactions in the source system, you’ll get different results.

online transactional processing (OLTP)

Pro’s

  • You get real time data
    • If someone is entering a new record now, you’ll see it right away in your report. No delays.
  • You’ve got all the details
    • You have access to all the details that the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.

Con’s

  • You are putting strain on an application during business hours.
    • When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day to day operations. And if you make an error, by for instance forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
  • You can’t compare the data with data from other sources.
    • Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data is always going to be different because it is granulated on a different level, or not all data is relevant for both systems.
  • You don’t have access to old data
    • To keep the applications at peak performance, old data, that’s irrelevant to day to day operations is deleted.
  • Data is optimized to suit day to day operations
    • And not for report making. This means you’ll have to get creative with your queries to get the data you need.

So what method should you use?

That all depends on what you need at that moment. If you need detailed information about things that are happening now, you should use OLTP.
If you are looking for trends, or insights on a higher level, you should use a Data Warehouse.

 Related References

 

 

Databases – Database Isolation Level Cross Reference

Database Type Isolation Levels Cross Reference

Database And Tables

 

Here is a table quick reference of some common database and/or connection types, which use connection level isolation and the equivalent isolation levels. This quick reference may prove useful as a job aid reference, when working with and making decisions about isolation level usage.

Database isolation levels

Data sources

Most restrictive isolation level

More restrictive isolation level

Less restrictive isolation level

Least restrictive isolation level

Amazon SimpleDB

Serializable Repeatable read Read committed Read Uncommitted

dashDB

Repeatable read Read stability Cursor stability Uncommitted read

DB2® family of products

Repeatable read Read stability* Cursor stability Uncommitted read

Informix®

Repeatable read Repeatable read Cursor stability Dirty read

JDBC

Serializable Repeatable read Read committed Read Uncommitted

MariaDB

Serializable Repeatable read Read committed Read Uncommitted

Microsoft SQL Server

Serializable Repeatable read Read committed Read Uncommitted

MySQL

Serializable Repeatable read Read committed Read Uncommitted

ODBC

Serializable Repeatable read Read committed Read Uncommitted

Oracle

Serializable Serializable Read committed Read committed

PostgreSQL

Serializable Repeatable read Read committed Read committed

Sybase

Level 3 Level 3 Level 1 Level 0

 

Related References

 

Netezza / PureData – What client tools to use with Netezza

IBM Netezza / IBM PureData for Analytics

IBM Netezza / IBM PureData for Analytics

 

Occasionally, a client will want a list of tools to work with Netezza / PureData, other than the Netezza Administrator Client.  Honestly, there are several tools which could be used, if they have odbc and/or jdbc connectivity.  However, these are the tools which keep being used across different customers.

Open Source

For customers willing to work with an open source tool Aginity for Netezza provides a significant set of capabilities, including script generation, which can be a significant productivity accelerator for development and operation support teams.

Aginity for Netezza

Free From IBM

For customer who do not want to work with open source tools, then IBM Data studio is an acceptable tool and has the added benefit of being free from IBM.

IBM Data Studio

IBM Software > Products > Data management platform > Data management > IBM Data Studio

Related References

Where to download IBM Data Studio?

IBM Data Studio Client

IBM Data Studio Client

IBM data studio is offered free from IBM, and can be helpful when working with DB2 and Puredata/Netezza using a JDBC driver.

What you need to Down Load IBM Data Studio

  • You will need an IBM ID and password

Basic down load steps

IBM Sign In Screen

IBM Sign In Screen

  • Enter you IBM ID, and password, then click ‘sign in’.
  • On the IBM Data Studio Client, license page, check ‘I agree’ and then click ‘I confirm’
IBM Data Studio Client License Screen

IBM Data Studio Client License Screen

  • On the IBM Data Studio Client, download page, Select the desired method tab, Then
    • Select the desired product or products and click ‘Download now.
IBM Data Studio Client Download Files Screen

IBM Data Studio Client Download Files Screen

 

Related References

IBM Data Studio

IBM Software > Products > Data management platform > Data management > IBM Data Studio

IBM Data Studio Client (Download)

IBM Support

Download and install IBM Data Studio Version 4.1.x

IBM Support

System requirements for IBM Data Studio Version 4.1.x

IBM Knowledge Center

Data Studio, Data Studio 4.1.1, Overview, Overview of IBM Data Studio

 

Netezza / PureData – How to convert an epoch field to a timestamp

Netezza PureData convert an epoch field to a timestamp

How To Convert And Epoch Field To A Timestamp

This is one of the item, which we need to do in Netezza SQL from time to time, but is not exactly obvious to the average SQL user.  So, having a pattern for making the conversion for a epoch field to a timestamp can be a real time saver.

Epoch Field Conversion to A Timestamp Format

to_timestamp(‘1970-01-01 00:00:00′,’YYYY-MM-DD HH24:MI:SS’) + (<> * interval ‘.000001 seconds’) as Entry_TS_Timestamp_Format

Epoch Timestamp Conversion Example SQL 1

select

entry_ts as Orginial_Format_Entry_TS,

to_timestamp(‘1970-01-01 00:00:00′,’YYYY-MM-DD HH24:MI:SS’) + (entry_ts * interval ‘.000001 seconds’) as Entry_TS_Timestamp_Format

from _v_system_util

limit 1;

Epoch Timestamp Conversion Example SQL 2

select

entry_ts as Orginial_Format_Entry_TS,

to_date(‘1/1/1970′,’MM/DD/YYYY’) + (entry_ts * interval ‘.000001 seconds’) as Entry_TS_Timestamp_Format

from _v_system_util

limit 1;

Related References

Netezza / PureData – CURRENT_TIMESTAMP Function

Netezza PureData Analytics CURRENT_TIMESTAMP Function

Netezza / PureData Analytics Current Timestamp

The Current_Timestamp function, returns the current date and time, from PureData Analytics (Netezza) and is an easy way to retrieve the current timestamp via SQL.

Example SQL

Select CURRENT_TIMESTAMP as “Current Time”

from _V_DUAL;

Related Reference

The Value of Data Indicator Flags

Data Indicator Flags

Data Indicator Flags

I cannot count the times, which using a flag (also, called an indicator) is described as a nice to have in database table design, at least, until the code runs into complexity and/or performance challenges.

When designing your data models, ETL’s, and reports it is useful to consider how indicator flags can help.  While indicator flags are, normally, binary in nature, such as True/False or Yes/No, but indicator flags don’t always need to be binary.

How indicator flags can help your processes and reporting:

  • Provide an equijoin for complex business rules, which can otherwise result ‘Not In List’, ‘In list’, ‘Not Exists., ‘Exists’, ‘Not Equal To’ and sub-selects SQL statements
  • Provide processing maker to prevent look ups to other tables to determine an attribute. For example, a snapshot type (daily, weekly, monthly) flag, which can be used to apply data retention rules.
  • Provide a mark for special circumstance. For example, a Legal Hold flag to mark record to be exempted from removal and/or change to meet legal proceeding requirements.
  • The judicious planning and use of flags can reduce the number of full table scans required against large tables.

Related References

DB2 JDBC ISJDBC.CONFIG Configuration

JDBC ( Java Database Connectivity)

JDBC ( Java Database Connectivity)

Here are a few pointers for building an IBM InfoSphere Information Server (IIS) isjdbc.config file for an IBM DB2 Universal Driver, Type 4.

Where to place JAR files

For Infosphere Information Server installs, as a standard practice, create a custom jdbc file in the install path.  And install any download Jar file not already installed by other applications in the jdbc folder. Usually, jdbc folder path looks something like this:

  • /opt/IBM/InformationServer/jdbc

CLASSPATH

  • db2jcc.jar
  • Classpath must have complete path and jar name

CLASS_NAMES

  • ibm.db2.jcc.DB2Driver

JAR Source URL

DB2 DEFAULT PORT

  • 5000

JDBC URL FORMAT

  • jdbc:db2://[:]/

JDBC URL EXAMPLE

jdbc:db2://127.0.0.1:50000/IADB

isjdbc.config EXAMPLE

CLASSPATH=/opt/IBM/InformationServer/ASBNode/lib/java/db2jcc.jar;

CLASS_NAMES=com.ibm.db2.jcc.DB2Driver;

 

Isjdbc.config FILE PLACEMENT

  • /opt/IBM/InformationServer/Server/DSEngine

Related References

 

DB2 to Netezza data type conversions

IBM DB2 Netezza SQL

IBM DB2 Netezza SQL

 

This is a  extract table I created from the IBM source, a while back ,when investing what format to convert data fields into for IBM Informsphere Datastage.  I have had it floating around in my notes, but lately, I have found myself referencing  it to help other team members, so, it seems useful to include it here.   The notes column is just a few snippits of information, which I have found useful to reference when planning data field conversions.

 

Transformer data type DB2 data type Netezza data type Notes
BIGINT BIGINT BIGINT
BINARY CHARACTER Not supported
BIT SMALLINT BOOLEAN
CHAR CHARACTER CHAR
DATE DATE DATE
DECIMAL DECIMAL NUMERIC or DECIMAL
DOUBLE DOUBLE DOUBLE PRECISION or FLOAT(15)
FLOAT DOUBLE FLOAT
INTEGER INTEGER INTEGER
LONGVARBINARY BLOB Not supported
LONGVARCHAR CLOB VARCHAR The maximum character string size is 64,000.
NUMERIC DECIMAL NUMERIC
REAL REAL REAL or FLOAT(6)
SMALLINT SMALLINT SMALLINT
TIME TIME TIME DB2 9.1 TIME data type does not support fractional digits or microseconds.
TIMESTAMP TIMESTAMP TIMESTAMP DB2 9.1 TIME data type does not support fractional digits or microseconds.
TINYINT SMALLINT BYTEINT
VARBINARY VARCHAR Not supported
VARCHAR VARCHAR VARCHAR The maximum character string size is 64,000.
WCHAR GRAPHIC Not supported
WLONGVARCHAR LONG VARGRAPHIC Not supported
WVARCHAR VARGRAPHIC Not supported

 

Related References

InfoSphere Information Server InfoSphere Information Server 11.5.0 – Data types

IBM PureData System for Analytics, Version 7.1 – Data types and aliases

Netezza – How GET to Database UTC Offset With SQL

 

Netezza Database Timezone Offset

Netezza Database Timezone Offset

Recently, I reason to know the Coordinated Universal Time (UTC) Time zone offset of the Netezza database for incorporation into an ETL.   So, here is an easy Aginity Workbench SQL to pull Time Zone information from your database, if you have permission to the _VT_PG_TIME_OFFSET view.

UTC Time Zone Offset SQL

SELECT (select EPOCH

from _VT_PG_TIME_OFFSET)  as “EPOCH               ”

,(select ONE_MICRO

from _VT_PG_TIME_OFFSET)  as “ONE_MICRO”

,(select TZOFFSET

from _VT_PG_TIME_OFFSET)  as “TZOFFSET”

,(select TZNAME

from _VT_PG_TIME_OFFSET)  as “TZNAME”

FROM _V_dual

Limit 1;

Related References

 

Data Warehouse – Effective Practices

Methodology

Methodology

Effective Practices

Effective practices are enablers, which can improve performance, data availability, environment stability, resource consumption, and data accuracy.

Use of an Enterprise Scheduler

The scheduling service in InfoSphere information Server (IIS) leverages the operating system (OS) scheduler, the common enterprise scheduler can provide these capabilities beyond those of a common OS scheduler:

  • Centralized control, monitoring, and maintenance of job stream processes
  • Improved insight into and control of cycle processes
  • Improved intervention capabilities, including alerts, job stream suspension, auto-restarts, and upstream/downstream dependency monitoring
  • Reduced time-to-recovery and increased flexibility in recovery options
  • Improved ability to monitor and alert for mission critical process that may be delayed or failing
  • Improved ability to automate disparate process requirements within and across systems
  • Improved load balancing to optimize use of resources or to compensate for loss of a given resource
  • Improved scalability and adaptability to infrastructure or application environment changes

Use of data Source Timestamps

When they exist or can be added to data, ‘created’ and ‘last updated’ timestamps can greatly reduce the impact of Change Data Capture (CDC) operations.  Especially, if the data warehouse, data model and load process store that last success run time of CDC jobs. This reduces the number of rows required to be processed and reduces the load on the RDBMS and/or ETL application server.  Leveraging ‘created’ and ‘last updated’ can, also, greatly reduce processing time required to perform the same CDC processes.

Event Based Scheduling

Event based scheduling, when coupled with an Enterprise scheduler, can increase data availability, distribute work opportunistically. Event based scheduling can allow all or part of a process stream to begin as soon as predecessor data sources have completed the requisite processes.  This can allow processes to begin soon as possible, which can reduce resource bottlenecks and contention. This, potentially, allows data to be made available earlier than a static time based schedule.  Event based scheduling can also delay processing, should the source system requisite processing completion be delayed; thereby, improving data accuracy in the receiving system.

Integrated RDBMS Maintenance

Integrating RDBMS Maintenance into the process job stream can perform on demand optimization as the processes move through their flow, improving performance.  Items such as indexing, distribution, and grooming, maintenance at key points ensures that the data structures are optimized for follow on processes to consume.

Application Server and Storage  Space Monitoring and Maintenance

Monitoring and actively clearing disk space can not only improves overall performance, and reduce costs, but it also improves application stability.

Data Retention Strategies

Data Retention strategies, an often overlooked form of data maintenance, which deals with establishing policies ensure only truly necessary data is kept and that information by essential category, which is no longer necessary is purged to limit legal liability, limit data growth, storage costs, and improve RDBMS performance.

Use Standard Practices

Use of standard practices both, application and industry, allows experienced resources to more readily understand the major application activities, their relationships, dependency, design and code.  This facilitates resourcing and support over the life cycle of the application.

 

How to get the current Status of a DB2 Database

To determine the current configuration of a DB2 database on a Linux server:

  • Sign-on to Linux server containing database as instance owner (e.g. db2inst1)
  • Run the following command to find out the current state of the database: db2 get db cfg for >

You should receive a response similar to this, but with additional values:

DB2 Current Database Configuration Example

DB2 Current Database Configuration Example

Related References

 

 

Data Modeling – Database Table Field Ordering Effective Practices

Database Table Field Ordering Effective Practices

Database Table

Field ordering can help the performance on inserts and updates and, also, keeps developer and users from having to search entire table structure to be sure they have all the keys, etc.

Table Field Ordering

  1. Distribution Field Or Fields, if no distribution field is set the first field will be used by default.
  2. Primary Key Columns (including Parent and Child key fields)
  3. Foreign Key Columns (Not Null)
  4. Not Null Columns
  5. Nullable Columns
  6. Created Date Timestamp
  7. Modified (or Last Updated) Date Timestamp
  8. Large text Fields
  9. Large binary Columns or Binary Field references

Related References

Structured Query Language (SQL) Tuning

Structured Query Language (SQL) Tuning

Structured Query Language (SQL) Tuning

Tuning SQL is one of those skills, which is part art and part science.  However, there are a few fundamental approaches, which can help ensure optimal SQL select statement performance.

Structuring your SQL

By Structuring SQL Statements, much performance can be gained through good SQL statement organization and sound logic.

Where Clause Concepts:

Use criteria ordering and Set Theory thinking.  SQL  can be coupled with set-theory to aid conception of the operations being conducted. Order your selection criteria  to execute criteria which arrives at the smallest possible row set first. Doing so, reduces the volume of rows to be processed by follow-on operations. This does require an understanding of the data relationships to be effective.

SQL First Select Criteria Pie Chart

SQL First Select Criteria Pie Chart

 

Join Rules (equi joins, etc.)

When  constructing your joins, consider these rules:

  • Join on keys and indexed columns: The efficiency of your program improves when tables are joined based on indexed columns, rather than on non-indexed ones.
  • Use equi-joins (=), whenever possible
  • Avoid using of sub-queries
  • Re-write EXISTS and NOT EXISTS subqueries as outer joins
  • Avoid OUTER Joins on fields containing nulls
  • Avoid RIGHT OUTER JOINS: Always select FROM your primary table (or derived table) and LEFT OUTER JOIN to auxiliary tables.
  • Use Joins Instead of Subqueries: A join can be more efficient than a correlated subquery or a subquery using IN. Use caution when specifying ORDER BY with a Join: When the results of a join must be sorted, limiting the ORDER BY to columns of a single table can cause the database to avoid a sort.
  • Provide Adequate Search Criteria: When possible, provide additional search criteria in the WHERE clause for every table in a join. These criteria are in addition to the join criteria, which are mandatory to avoid Cartesian products

Order of Operations SQL & “PEMDAS”

To improve your SQL, careful attention needs to be paid to the mathematical order of operations; especially, parentheses since they not only set the order of operation, but also the boundaries of each subset operation.

  • PEMAS is “Parentheses, Exponents, Multiplication and Division, and Addition and Subtraction”.
  • Use parentheses () to group and specify the order of execution. SQL observes the normal rules of arithmetic operator precedence.
Precedence Operator(s) Operation(s) Notes
1 ( ) Parentheses If the parentheses are nested, the expression in the innermost pair is evaluated first. If there are several un-nested parentheses, then parentheses are evaluated left to right.
2 *
/
%
Multiplication
Division
Modulus
If there are several, evaluation is left to right.
3 +
Addition
Subtraction
If there are several, evaluation is left to right.

 

Index Leveraging (criteria ordering, hints, append, etc.)

  • Avoid Full Table Scans: within the scope of a SQL statement, there are many conditions that will cause the SQL optimizer to invoke a full-table scan.  Avoid Queries:
  • with NULL Conditions (Is NUll, Is Not NUll)
  • Against Unindexed Columns
  • with Like Conditions
  • with Not Equals Condition (<>, !=, not in)
  • with use built-in Function (to_char, substr, decode, UPPER)
  • Use UNION ALL instead of UNION if business rules allow
  • UNION: Specifies that multiple result sets are to be combined and returned as a single result set. Query optimizer performs extra work to return to avoid duplicate rows.
  • UNION ALL: Incorporates all rows into the results. This includes duplicates. Query optimizer just needs to concatenate the result sets with no extra work
  • Use stored procedures instead of ad hoc queries when possible. Stored procedures are precompiled and cached
  • Avoid cursor use when possible
  • Select only the rows needed
  • Use NOLOCK hint in select statement to avoid blocking
  • Commit transactions in smaller batches
  • Whenever possible use tables instead of views
  • Make sure comparison columns whether using JOIN or WHERE clause are exactly same data type. For example if we are comparing Varchar column to nchar columns the query optimizer has to do a CONVERT before comparing the values

Note: You do not necessarily need to remove all full table scans from your query’s execution plan. Tables with few rows, few columns, or thin columns may fit into few database blocks. In this case, a full table scan will always be the most efficient access

Netezza/PureData – How to Identify Your Netezza Version Using SQL

IBM Netezza / IBM PureData for Analytics

IBM Netezza / IBM PureData for Analytics

I had a need today to find the version of Netezza, which my customer was using and without DBA access.  So, a quick way to determine the Netezza version of your environment is to run this SQL.

 

select system_software_version from _v_system_info;

 

Netezza/PureData SQL Date Formatting Examples

Netezza/PureData SQL Date Formatting Examples

PureData Powered by Netezza

Below is a SQL of various examples of IBM Netezza/PureData date formats, which may to be a useful reference.

SELECT

CURRENT_DATE,

—————–Sort Pattern1———————

TO_CHAR(CURRENT_DATE,’YYYYMMDD’) AS DATE_SORT_PATTERN1,

——————– Date Formats——————

TO_CHAR(CURRENT_DATE,’DD/MM/YYYY’) AS DATE_PATTERN1,

TO_CHAR(CURRENT_DATE,’DD/MON/YY’) AS DATE_PATTERN2,

TO_CHAR(CURRENT_DATE,’DD/MON/YYYY’) AS DATE_PATTERN3,

TO_CHAR(CURRENT_DATE,’DD-MON-YY’) AS DATE_PATTERN4,

TO_CHAR(CURRENT_DATE,’MM.DD.YY’) AS DATE_PATTERN5,

TO_CHAR(CURRENT_DATE,’MM.DD.YYYY’) AS DATE_PATTERN6,

TO_CHAR(CURRENT_DATE,’MM/DD/YYYY’) AS DATE_PATTERN7,

TO_CHAR(CURRENT_DATE,’MM-DD-YY’) AS DATE_PATTERN8,

TO_CHAR(CURRENT_DATE,’MM-DD-YYYY’) AS DATE_PATTERN9,

TO_CHAR(CURRENT_DATE,’YY.MM.DD’) AS DATE_PATTERN10,

TO_CHAR(CURRENT_DATE,’YY/MM/DD’) AS DATE_PATTERN11,

TO_CHAR(CURRENT_DATE,’YY-MM-DD’) AS DATE_PATTERN12,

TO_CHAR(CURRENT_DATE,’YYYY.DDD’) AS YEAR_AND_DAY_NUMBER_OF_YEAR,

TO_CHAR(CURRENT_DATE,’YYYY.MM.DD’) AS DATE_PATTERN13,

TO_CHAR(CURRENT_DATE,’YYYY/MM/DD’) AS DATE_PATTERN14,

TO_CHAR(CURRENT_DATE,’YYYY-MM-DD’) AS DATE_PATTERN15,

TO_CHAR(CURRENT_DATE,’DD-MON-YY’) AS DATE_PATTERN5,

——————–Parts of Date——————

TO_CHAR(CURRENT_DATE,’YYYY’) AS FOUR_DIGIT_YEAR,

TO_CHAR(CURRENT_DATE,’YY’) AS YEAR_OF_CENTURY,

TO_CHAR(CURRENT_DATE,’J’) AS JUALIAN_CURRENT_DATE,

TO_CHAR(CURRENT_DATE,’Q’) AS CALENDAR_QUARTER_OF_YEAR,

TO_CHAR(CURRENT_DATE,’WW’) AS CALENDAR_WEEK_OF_YEAR,

TO_CHAR(CURRENT_DATE,’DDD’) AS CALENDAR_DAY_OF_YEAR_NUMBER,

TO_CHAR(CURRENT_DATE,’W’) AS WEEK_OF_MONTH,

TO_CHAR(CURRENT_DATE,’DAY’) AS NAME_OF_DAY_OF_WEEK,

TO_CHAR(CURRENT_DATE,’DY’) AS ABBREVIATED_DAY_OF_WEEK,

TO_CHAR(CURRENT_DATE,’MM’) AS NUMERIC_MONTH_OF_YEAR,

TO_CHAR(CURRENT_DATE,’MON’) AS ABBREVIATED_MONTH_OF_YEAR

FROM _V_DUAL;

Related References

Why Use Materialized Views?

Database Views - Why Use Materialized Views

Database Views

If we consider Materialized Views (MV) in their simplest form, as a point in time stored query result, then materialized views serve two primary purposes Performance Optimization and Semantics Simplification.

Performance Optimization

There are several ways in which materialized views can improve performance:

  • Reduce Database Workloads: materialized views can reduce database workloads by pre-assembling frequently used queries and, thereby, eliminating the repetitive execution of joins, aggregations, and filtering.
  • Facilitate Database Optimizers: in some databases can be partitioning and indexing which are considered by database optimizers. Also, some databases, in which more than one materialized view has been applied to a table, the database optimizer will consider the all the associated materialized views when optimizing queries.
  • Reduced Network Workloads: by the use of database replication and/or mass deployment techniques to materialized views, they can be distributed to more local proximity to the consumers, thereby, reducing the data volume across the network and provided business continuation/disaster recovery capabilities, should the primary site become temporarily unavailable.
  • Precalculation and/or Preaggregation: Performing calculation and aggregation of information upon creation of a materialized view, eliminates the need to perform these functions on an on-demand basis as various consumers submit requests.
  • Data Subsets: by applying filters to eliminate unnecessary data (e.g. history data no long in common reporting use) or unnecessary data attributes (e.g. unused columns or columns intended for other information purposes) the impact of filter for these items is reduced and is effectively eliminated for consumers of the materialized view.

Semantics Simplification

Materialized views can be used to simplify the semantics provided to information consumers with Ad hoc capabilities and/or to simplify the construction of reporting and analytics objects.  Depending on the database and/or integration tools in use to create them, materialized views can simplify the consumer experience by:

  • Reduce or Eliminate Join Coding: When constructed materialized views can perform the joins and populate the materialized view with the value results of from the join table, thereby, eliminating the need for the consumer to perform this function as an ad hoc user or in the semantics of reporting and analytics tools
  • Pre-application of Business Rules: When constructed materialized views can apply business rules to facilitate queries by adding indicator flags and preapplying special business logic to data and populating the materialized view with the value results, thus, eliminating the need for the consumer to perform this function as an ad hoc user or in the semantics of reporting and analytics tools.
  • Precalculation and/or Pre-aggregation: Performing calculation and aggregation of information upon creation of a materialized view allow the consumer to use the results without need to build the calculations and/or aggregations in to the ad hoc query or in the semantics of reporting and analytics tools.  This also, helps to ensure information accuracy and consistency.
  • Data Subsets: By prefiltering the data during the creation of the view unnecessary or unused data and columns are not available to consumers and do not need to be filter out of ad hoc queries or in the semantics of reporting and analytics tools

Netezza / PureData – Now() Command For Current Date

PureData Analytics Now() Command

PureData Analytics Now() Command

There is more than one way to retrieve current date information from PureData Analytics (Netezza).  Using the Now() command is an easy way to retrieve current date from within SQL.

 

Example SQL

Select now() as “Today”

from _V_DUAL;

 

Related Reference

Date/time functions

PureData System for Analytics, PureData System for Analytics 7.1.0, IBM Netezza Database User’s Guide, Netezza SQL basics, Netezza SQL extensions,Date/time functions