How the IBM Common SQL Engine (CSE) Improves DB2

Common SQL Engine (CSE)

Common SQL Engine (CSE)

Today, newfound efficiencies and innovation are key to any business success – small, medium or large. In the rapidly evolving field of data analytics, innovative approaches to handling data are particularly important since data is the most valuable resource any business can have. IBM common SQL Engine is delivering application and query compatibility that is allowing companies to turn their data into actionable insights. This is allowing businesses to unleash the power of their databases without constraints.

But, is this really important?

Yes. Many businesses have accumulated tons of data over the years. This data resides in higher volumes, more locations throughout an enterprise – on-premise and on-cloud –, and in greater variety. Typically, this data should be a huge advantage, providing enterprises with actionable insights. But, often, this doesn’t happen.

IBM Hybrid Data Management.

With such a massive barrel of complex legacy data, many organizations find it confusing to decide what to do with it. Or where to start. The process of migrating all that data into new systems is simply a non-starter. As a solution, enterprises are turning to IBM Db2 – a hybrid, intuitive data approach that marries data and analytics seamlessly. IBM Db2 hybrid data management allows flexible cloud and on-premises deployment of data.

However, such levels of flexibility typically require organizations to rewrite or restructure their queries, and applications that will use the diverse, ever-changing data. These changes may even require you to license new software. This is costly and unfeasible. To bridge this gap, the Common SQL Engine (CSE) comes into play.

How IBM Common SQL Engine is Positioning Db2 for the Future?

The IBM Common SQL Engine inserts a single layer of data abstraction at the very data source. This means that, instead of migrating the data all at once, you can now apply data analytics wherever the data resides – whether on private, public or hybrid cloud – by using the Common SQL Engine as a bridge.

The IBM’s Common SQL Engine provides portability and consistency of SQL commands, meaning that the SQL is functionally portable across multiple implementations. It allows seamless movement of workloads to the cloud and allows for multiplatform integration and configurations regardless of their programming language.

Ideally, the Common SQL Engine is supposed to be the heart of the query and the foundation of application compatibility. But it does so much more!

Its compatibility extends beyond data analytic applications to include security, management, governance, data management, and other functionalities as well.

How does this improve the quality, flexibility, and portability of Db2?

By allowing for integration across multiple platforms, workloads and programming languages, the Common SQL Engine, ultimately, leads to a “data without limits” environment for Db2 hybrid data management family through:

  1. Query and application compatibility

The Common SQL engine (CSE) ensures that users can write a query, and be confident that it will work across the Db2 hybrid data management family of offerings. With the CSE, you can change your data infrastructure and location – on-cloud or on-premises – without having to worry about license costs and application compatibility.

  1. Data virtualization and Integration

The common SQL engine has a built-in data virtualization service that ensures that you can access your data from all your sources. These services position Db2 family of offerings including, IBM Db2 warehouse, IBM Db2, IBM Db2 BigSQL amongst others.

This services also applies to IBM Integrated Analytics System, Teradata, Oracle, Puredata and Microsoft SQL server. Besides, you can work seamlessly with open-source solutions such as HIVE; and cloud sources such as Amazon Redshift. Such levels of integration are unprecedented!

By allowing users to effectively pull data from Db2 data stores and integrate it with data from non-IBM stores using a single query, the common SQL engine places Db2 at an authoritative position as compared to other data stores.

  1. Flexible Licensing

Licensing is one of the hardest nuts to crack, especially for smart organizations who rely on technologies such as the cloud to deliver their services. While application compatibility and data integration will save you time, flexible licensing saves you money, on the spot.

IBM’s common SQL engine allows flexible licensing, meaning that you can purchase one license model and deploy it whenever needed, or as your data architecture evolves. Using IBM’s FlexPoint licensing, you can purchase FlexPoints and use them across all Db2 data management offerings. This is a convenience in one place.

The flexible licensing will not only simplify the adoption and exchange of platform capabilities, but it also positions your business strategically by making it more agile. Your data managers will be able to access the tools needed on the fly, without going through a lethargic and tedious procurement process.

IBM Db2 Data Management Family Is Supported by Common SQL Engine (CSE) .

IBM Db2 is a family of custom, deployable database that allows enterprises to leverage existing investments. IBM Db2 allows businesses to use any type of data from an either structured or unstructured database (or data warehouse). It provides the right data foundation/environment with industry-leading data compression, on-premise and cloud deployment options, modern data security, robust performance for mixed loads and the ability to adjust and scale without redesigning.

The IBM Db2 family enable businesses to adapt, scale quickly and remain competitive without compromising security, risk levels or privacy. It features:

  • Always-on availability
  • Deployment and flexibility: On-premises, scale-on demand, and private or cloud deployments• Compression and performance
  • Embedded IoT technology is allowing businesses to act fast on the fly.

Some of these Db2 family offerings that are supported by the common SQL engine include:

  • Db2 Database
  • Db2 Hosted
  • Db2 Big SQL
  • Db2 on Cloud
  • Db2 Warehouse
  • Db2 Warehouse on Cloud
  • IBM Integrated Analytics System (IIAS)

Db2 Family Offerings and Beyond

Since the common SQL engine mainly focuses on data federation and propensity, other non-IBM databases can as well plug into the engine for SQL processing. These other 3rd party offerings include:

  • Watson Data Platform
  • Oracle
  • Hadoop
  • Microsoft SQL Server
  • Teradata
  • Hive

Conclusion

IBM Common SQL engine is allowing organizations to fully use data analytics to future-proof their business, and as well remain agile and competitive. In fact, besides the benefits of having robust tools woven into CSE, this SQL engine offers superior analytics and machine-learning positioning. Data processing can now happen at the speed of light –- 2X to 5X faster. The IBM Common SQL engine adds important capabilities to Db2, including freedom of location, freedom of use, and freedom of assembly.

Related References

An Overview of DB2 Federation

DB2 Federation

DB2 Federation

Data analytics has changed where data is no longer manageable in relational databases only. Data is flowing from various sources which are not of the same format. This means it is not possible to store all data in the same repository. Some are best suited for storing in relational databases, others for Apache Hadoop while others are best suited for NoSQL databases.

During data analyzing, so much time is taken in trying to bring the distributed data together instead of obtaining insights. Db2 Federation has come to the rescue of data analysts. Federation concept in db2 eliminates the need for storing data in different repositories and reduces the hassle of getting insights.

What is DB2 Federation?

DB2 federation is a data integration technology that permits remote database objects to be accessed as local DB2 database objects. This technology connects multiple databases and makes them appear like one database.

How does DB2 federation work?

Federation allows you to access all of your data that is on multiple distributed databases using a single query. When implemented in an organization, this technology can be used to access data that is on any of the organization’s Db2, whether local or in the cloud.

Why use DB2 federation?

So, why should you use the federation? This concept brings data of all formats into one virtual source. With data being retrieved from one virtual source, analyzing it becomes cost-effective and efficient.

What are its primary use cases for DB2 federation?

Merging of various sources of data

DB2 federation facilitates consolidating of data from sources data local and cloud to form one virtual data source. This eliminates the process of migrating data which can be expensive and troublesome.

Increase the capacity of a repository beyond the fixed limits

Physical storage capacity is bound to have a limit which is one reason you may find an organization has distributed its data in various repositories. With federation, the storage is virtual and therefore doesn’t have any limit. This technology can greatly help you if your physical dataset is running low on space.

Linking up to Db2 Warehouse on Cloud

People who use Db2 products can federate data from Db2 on Cloud and Db2 Warehouse on the Cloud. This will give them a joint interface where they can access, add, query, and analyze data without encountering the complex ETL processes. Better still, no additional code will be required to execute all these processes. This makes it easy for people with the low technical know-how to use these products smoothly.

Split data across different servers

At times, you might choose to partition your data. With federation integration technology, partitioned data can be queried with a unified interface. Federation allows you to better balance your workloads, scale precise parts of an app, and create micro-services that work harmoniously.

Generally, db2 federation makes it access data by bringing it together into a single virtual source. This brings about cost and time-saving benefits. When you want to analyze data, you can get insights immediately instead of spending a lot of time querying through repositories.

Related References

How to know if your Oracle Client install is 32 Bit or 64 Bit

Oracle Database, How to know if your Oracle Client install is 32 Bit or 64 Bit

Oracle Database

 

How to know if your Oracle Client install is 32 Bit or 64 Bit

Sometimes you just need to know if your Oracle Client install is 32 bit or 64 bit. But how do you figure that out? Here are two methods you can try.

The first method

Go to the %ORACLE_HOME%\inventory\ContentsXML folder and open the comps.xml file.
Look for <DEP_LIST> on the ~second screen.

If you see this: PLAT=”NT_AMD64” then your Oracle Home is 64 bit
If you see this: PLAT=”NT_X86” then your Oracle Home is 32 bit.

It is possible to have both the 32-bit and the 64-bit Oracle Homes installed.

The second method

This method is a bit faster. Windows has a different lib directory for 32-bit and 64-bit software. If you look under the ORACLE_HOME folder if you see a “lib” AND a “lib32” folder you have a 64 bit Oracle Client. If you see just the “lib” folder you’ve got a 32 bit Oracle Client.

Related References

 

OLTP vs Data Warehousing

Database, OLTP vs Data Warehousing

Database

OLTP Versus Data Warehousing

I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager, and another. You get the picture.
In this article I will explain this very same thing to you. So you know  how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.

OLTP

OLTP stands for OLine Transactional Processing. With other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnoses and a treatment plan in there as well. Doctors, nurses and other people working with patients use this system on a daily basis to enter and get detailed information on their patients.
The way the data is stored within operational systems is so the data can be used efficiently by the people working directly on the product, or with the patient in this case.

Data Warehousing

A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day to day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnoses that takes more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.

The difference between OLTP and Data Warehousing

This is how a Data Warehouse works:

How a Data Warehouse works

How a Data Warehouse works

The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transforms the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that reports are formed with a reporting tool, from the data that lies in the Data Warehouse.

This is how OLTP works:

How OLTP works

How OLTP works

Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports form the operational databases.

Pro’s and Con’s

Data Warehousing

Pro’s:

  • There is no strain on the operational systems during business hours
    • As you can schedule the ETL processes to run during the hours the least amount of people are using the operational system, you won’t disturb the operational processes. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
  • Data from different systems can be combined
    • It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
  • Data is optimized for making queries and reports
    • You use different data in reports than you use on a day to day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month and year is saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
  • Data is saved longer than in the source systems
    • The source systems need to have their old records deleted when they are no longer used in the day to day operations. So they get deleted to gain performance.

Con’s:

  • You always look at the past
    • A Data Warehouse is updated once a night, or even just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients are in the hospital are right now. Or what surgeon didn’t show up on time this morning.
  • You don’t have all the data
    • A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
  • Data isn’t the same as the data in the source systems
    • Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example: the Data Warehouse at the hospital excluded all transactions that were marked as cancelled. If you try to get the same reports from both systems, and don’t exclude the cancelled transactions in the source system, you’ll get different results.

online transactional processing (OLTP)

Pro’s

  • You get real time data
    • If someone is entering a new record now, you’ll see it right away in your report. No delays.
  • You’ve got all the details
    • You have access to all the details that the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.

Con’s

  • You are putting strain on an application during business hours.
    • When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day to day operations. And if you make an error, by for instance forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
  • You can’t compare the data with data from other sources.
    • Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data is always going to be different because it is granulated on a different level, or not all data is relevant for both systems.
  • You don’t have access to old data
    • To keep the applications at peak performance, old data, that’s irrelevant to day to day operations is deleted.
  • Data is optimized to suit day to day operations
    • And not for report making. This means you’ll have to get creative with your queries to get the data you need.

So what method should you use?

That all depends on what you need at that moment. If you need detailed information about things that are happening now, you should use OLTP.
If you are looking for trends, or insights on a higher level, you should use a Data Warehouse.

 Related References

 

 

Netezza / PureData – Table Describe SQL

Netezza Puredata Table Describe SQL

Netezza / Puredata Table Describe SQL

If you want to describe a PureData / Netezza table in SQL, it can be done, but Netezza doesn’t have a describe command.  Here is a quick SQL, which will give the basic structure of a table or a view.  Honestly, if you have Aginity Generating the DDL is fast and more informative, at least to me.  If you have permissions to access NZSQL you can also use the slash commands (e.g. \d).

Example Netezza Table Describe SQL

select  name as Table_name,

owner as Table_Owner,

Createdate as Table_Created_Date,

type as Table_Type,

Database as Database_Name,

schema as Database_Schema,

attnum as Field_Order,

attname as Field_Name,

format_type as Field_Type,

attnotnull as Field_Not_Null_Indicator,

attlen as Field_Length

from _v_relation_column

where

name='<<Table Name Here>>’

Order by attnum;

 

Related References

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Command-line options for nzsql, Internal slash options

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza getting started tips, About the Netezza data warehouse appliance, Commands and queries, Basic Netezza SQL information, Commonly used nzsql internal slash commands

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL introduction, The nzsql command options, Slash options

 

 

Netezza / PureData – Substring Function Example

SQL (Structured Query Language), Netezza PureData – Substring Function Example, Substr

Netezza / PureData – Substring Function Example

The function Substring (SUBSTR) in Netezza PureData provides the capability parse character type fields based on position within a character string.

Substring Functions Basic Syntax

SUBSTRING Function Syntax

SUBSTRING(<<CharacterField>>,<< StartingPosition integer>>, <<for Number of characters Integer–optional>>)

 

SUBSTR Function Syntax

SUBSTR((<>,<< StartingPosition integer>>, <>)

 

Example Substring SQL

Netezza / PureData Substring Example

Netezza / PureData Substring Example

Substring SQL Used In Example

SELECT  LOCATIONTEXT

— From the Left Of the String

— Using SUBSTRING Function

,’==SUBSTRING From the Left==’ as Divider1

,SUBSTRING(LOCATIONTEXT,1,5) as Beggining_Using_SUBSTRING_LFT

,SUBSTRING(LOCATIONTEXT,7,6) as Middle_Using_SUBSTRING_LFT

,SUBSTRING(LOCATIONTEXT,15) as End_Using_SUBSTRING_LFT

,’==SUBSTR From the Left==’ as Divider2

—Using SUBSTR Function

 

,SUBSTR(LOCATIONTEXT,1,5) as Beggining_Using_SUBSTR_LFT

,SUBSTR(LOCATIONTEXT,7,6) as Middle_Using_SUBSTR_LFT

,SUBSTR(LOCATIONTEXT,15) as End_Using_SUBSTR_LFT

— From the right of the String

,’==SUBSTRING From the Right==’ as Divider3

,SUBSTRING(LOCATIONTEXT,LENGTH(LOCATIONTEXT)-18, 8) as Beggining_Using_SUBSTRING_RGT

,SUBSTRING(LOCATIONTEXT,LENGTH(LOCATIONTEXT)-9, 6) as Middle_Using_SUBSTRING_RGT

,SUBSTRING(LOCATIONTEXT,LENGTH(LOCATIONTEXT)-1) as End_Using_SUBSTRING_RGT

,’==SUBSTR From the right==’ as Divider4

,SUBSTR(LOCATIONTEXT,LENGTH(LOCATIONTEXT)-18, 8) as Beggining_Using_SUBSTR_RGT

,SUBSTR(LOCATIONTEXT,LENGTH(LOCATIONTEXT)-9, 6) as Middle_Using_SUBSTR_RGT

,SUBSTR(LOCATIONTEXT,LENGTH(LOCATIONTEXT)-1) as End_Using_SUBSTR_RGT

FROM BLOG.D_ZIPCODE

where STATE = ‘PR’

AND CITY = ‘REPTO ROBLES’;

Related References

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Character string functions

IBM Knowledge Center, PureData System for Analytics, Version 7.1.0

IBM Netezza Database User’s Guide, Netezza SQL basics, Functions and operators, Functions, Standard string functions

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL command reference, Functions

Netezza / PureData – Substring Function On Specific Delimiter

SQL (Structured Query Language), Netezza / PureData - Substring Function On Specific Delimiter, substr

Netezza / PureData – Substring Function On Specific Delimiter

The function Substring (SUBSTR) in Netezza PureData provides the capability parse character type fields based on position within a character string.  However, it is possible, with a little creativity, to substring based on the position of a character in the string. This approach give more flexibility to the substring function and makes the substring more useful in many cases. This approach works fine with either the substring or substr functions.  In this example, I used the position example provide the numbers for the string command.

 

Example Substring SQL

Netezza PureData Substring Function On Specific Character In String, substring, substr

Netezza PureData Substring Function On Specific Character In String

 

Substring SQL Used In Example

select LOCATIONTEXT

,position(‘,’ in LOCATIONTEXT) as Comma_Postion_In_String

—without Adjustment

,SUBSTRING(LOCATIONTEXT,position(‘,’ in LOCATIONTEXT)) as Substring_On_Comma

—Adjusted to account for extra space

,SUBSTRING(LOCATIONTEXT,position(‘,’ in LOCATIONTEXT)+2) as Substring_On_Comma_Ajusted

,’==Breaking_Up_The_Sting==’ as Divider

— breaking up the string

,SUBSTRING(LOCATIONTEXT,1, position(‘ ‘ in LOCATIONTEXT)-1) as Beggining_of_String

,SUBSTRING(LOCATIONTEXT,position(‘ ‘ in LOCATIONTEXT)+1, position(‘ ‘ in LOCATIONTEXT)-1) as Middle_Of_String

,SUBSTRING(LOCATIONTEXT,position(‘,’ in LOCATIONTEXT)+2) as End_Of_String

 

FROM Blog.D_ZIPCODE

where STATE = ‘PR’

AND CITY = ‘REPTO ROBLES’

Related References

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Character string functions

IBM Knowledge Center, PureData System for Analytics, Version 7.1.0

IBM Netezza Database User’s Guide, Netezza SQL basics, Functions and operators, Functions, Standard string functions

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL command reference, Functions

Netezza / PureData – Position Function

SQL (Structured Query Language), Netezza PureData Position Function, SQL, Position Function

Netezza / PureData Position Function

 

The position function in Netezza is a simple enough function, it just returns the number of a specified character within a string (char, varchar, nvarchar, etc.) or zero if the character not found. The real power of this command is when you imbed it with character function, which require a numeric response, but the character may be inconsistent from row to row in a field.

The Position Function’s Basic Syntax

position(<<character or Character String>> in <<CharacterFieldName>>)

 

Example Position Function SQL

Netezza PureData Position Function, SQL, Position Function

Netezza PureData Position Function

 

Position Function SQL Used in Example

select LOCATIONTEXT, CITY

,’==Postion Funtion Return Values==’ as Divider

,position(‘,’ in LOCATIONTEXT) as Postion_In_Nbr_String

,position(‘-‘ in LOCATIONTEXT) as Postion_Value_Not_Found

,’==Postion Combined with Substring Function==’ as Divider2

,SUBSTRING(LOCATIONTEXT,position(‘,’ in LOCATIONTEXT)+2) as Position_Used_in_Substring_Function

FROM Blog.D_ZIPCODE  where STATE = ‘MN’ AND CITY = ‘RED WING’ limit 1;

 

 

Related References

IBM Knowledge Center, PureData System for Analytics, Version 7.1.0

IBM Netezza Database User’s Guide, Netezza SQL basics, Functions and operators, Functions, Standard string functions

IBM Knowledge Center, PureData System for Analytics, Version 7.2.1

IBM Netezza database user documentation, Netezza SQL command reference, Functions

 

Data Modeling – Column Data Classification

Data Modeling, Column Data Classification, Field Data Classification

Data Modeling

 

Column Data Classification

When analyzing individual column data, at its most foundational level, column data can be classified by their fundamental use/characteristics.  Granted, when you start rolling up the structure into multiple columns, table structure and table relationship, then other classifications/behaviors, such as keys (primary and foreign), indexes, and distribution come into play.  However, many times when working with existing data sets it is essential to understand the nature the existing data to begin the modeling and information governance process.

Column Data Classification

Generally, individual columns can be classified into the classifications:

  • Identifier — A column/field which is unique to a row and/or can identify related data (e.g., Person ID, National identifier, ). Basically, think primary key and/or foreign key.
  • Indicator — A column/field, often called a Flag, that has a binary condition (e.g., True or False, Yes or No, Female or Male, Active or Inactive). Frequently used to identify compliance with complex with a specific business rule.
  • Code — A column/field that has a distinct and defined set of values, often abbreviated (e.g., State Code, Currency Code)
  • Temporal — A column/field that contains some type date, timestamp, time, interval, or numeric duration data
  • Quantity — A column/field that contains a numeric value (decimals, integers, etc.) and is not classified as an Identifier or Code (e.g., Price, Amount, Asset Value, Count)
  • Text — A column/field that contains alphanumeric values, possibly long text, and is not classified as an Identifier or Code (e.g., Name, Address, Long Description, Short Description)
  • Large Object (LOB)– A column/field that contains data traditional long text fields or binary data like graphics. The large objects can be broadly classified as Character Large Objects (CLOBs), Binary Large Objects (BLOBs), and Double-Byte Character Large Object (DBCLOB or NCLOB).

Related References

What is a Common Data Model (CDM)?

Data Model, Common Data Model, CDM, What is a Common Data Model (CDM)

Data Model

 

What is a Common Data Model (CDM)?

 

A Common Data Model (CDM) is a share data structure designed to provide well-formed and standardized data structures within an industry (e.g. medical, Insurance, etc.) or business channel (e.g. Human resource management, Asset Management, etc.), which can be applied to provide organizations a consistent unified view of business information.   These common models can be leveraged as accelerators by organizations form the foundation for their information, including SOA interchanges, Mashup, data vitalization, Enterprise Data Model (EDM), business intelligence (BI), and/or to standardize their data models to improve meta data management and data integration practices.

Related references

IBM, IBM Analytics

IBM Analytics, Technology, Database Management, Data Warehousing, Industry Models

github.com

Observational Health Data Sciences and Informatics (OHDSI)/Common Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Utilities Data Model

Oracle

Industries, Communications, Service Providers, Products, Data Mode, Oracle Communications Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Airline data Model

 

PureData / Netezza – What date/time ranges are supported by Netezza?

SQL (Structured Query Language), Date/Time ranges supported by Netezza

Date/Time ranges supported by Netezza

Here is a synopsis of the temporal ranges ( date, time, and timestamp), which Netezza / PureData supports.

Temporal Type

Supported Ranges

Size In Bytes

Date

A month, day, and year. Values range from January 1, 0001, to December 31, 9999. 4 bytes

Time

An hour, minute, and second to six decimal places (microseconds). Values range from 00:00:00.000000 to 23:59:59.999999. 8 bytes

Related References

Temporal data types

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Data types, Temporal data types

Netezza date/time data type representations

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza user-defined functions, Data type helper API reference, Temporal data type helper functions, Netezza date/time data type representations

Date/time functions

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Date/time functions

Netezza / PureData – How to add a Foreign Key

DDL (Data Definition Language), Netezza PureData How to add a Foreign Key

DDL (Data Definition Language)

Adding a forging key to tables in Netezza / PureData is a best practice; especially, when working with dimensionally modeled data warehouse structures and with modern governance, integration (including virtualization), presentation semantics (including reporting, business intelligence and analytics).

Foreign Key (FK) Guidelines

  • A primary key must be defined on the table and fields (or fields) to which you intend to link the foreign key
  • Avoid using distribution keys as foreign keys
  • Foreign Key field should not be nullable
  • Your foreign key link field(s) must be of the same format(s) (e.g. integer to integer, etc.)
  • Apply standard naming conventions to constraint name:
    • FK_<<Constraint_Name>>_<<Number>>
    • <<Constraint_Name>>_FK<<Number>>
  • Please note that foreign key constraints are not enforced in Netezza

Steps to add a Foreign Key

The process for adding foreign keys involves just a few steps:

  • Verify guidelines above
  • Alter table add constraint SQL command
  • Run statistics, which is optional, but strongly recommended

Basic Foreign Key SQL Command Structure

Here is the basic syntax for adding Foreign key:

ALTER TABLE <<Owner>>.<<NAME_OF_TABLE_BEING_ALTERED>>

ADD CONSTRAINT <<Constraint_Name>>_fk<Number>>

FOREIGN KEY (<<Field_Name or Field_Name List>>) REFERENCES <<Owner>>.<<target_FK_Table_Name>.(<<Field_Name or Field_Name List>>) <<On Update | On Delete>> action;

Example Foreign Key SQL Command

This is a simple one field example of the foreign key (FK)

 

ALTER TABLE Blog.job_stage_fact

ADD CONSTRAINT job_stage_fact_host_dim_fk1

FOREIGN KEY (hostid) REFERENCES Blog.host_dim(hostid) ON DELETE cascade ON UPDATE no action;

Related References

Alter Table

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL command reference, Alter Table, constraints

 

 

Databases – Database Isolation Level Cross Reference

Database Type Isolation Levels Cross Reference

Database And Tables

 

Here is a table quick reference of some common database and/or connection types, which use connection level isolation and the equivalent isolation levels. This quick reference may prove useful as a job aid reference, when working with and making decisions about isolation level usage.

Database isolation levels

Data sources

Most restrictive isolation level

More restrictive isolation level

Less restrictive isolation level

Least restrictive isolation level

Amazon SimpleDB

Serializable Repeatable read Read committed Read Uncommitted

dashDB

Repeatable read Read stability Cursor stability Uncommitted read

DB2® family of products

Repeatable read Read stability* Cursor stability Uncommitted read

Informix®

Repeatable read Repeatable read Cursor stability Dirty read

JDBC

Serializable Repeatable read Read committed Read Uncommitted

MariaDB

Serializable Repeatable read Read committed Read Uncommitted

Microsoft SQL Server

Serializable Repeatable read Read committed Read Uncommitted

MySQL

Serializable Repeatable read Read committed Read Uncommitted

ODBC

Serializable Repeatable read Read committed Read Uncommitted

Oracle

Serializable Serializable Read committed Read committed

PostgreSQL

Serializable Repeatable read Read committed Read committed

Sybase

Level 3 Level 3 Level 1 Level 0

 

Related References

 

Database – What is a foreign key?

Acronyms, Abbreviations, Terms, And Definitions, DDL (Data Definition Language), What is a foreign key

Acronyms, Abbreviations, Terms, And Definitions

Definition of a Foreign Key

  • A foreign Key (FK) is a constraint that references the unique primary key (PK) of another table.

Facts About Foreign Keys

  • Foreign Keys act as a cross-reference between tables linking the foreign key (Child record) to the Primary key (parent record) of another table, which establishing a link/relationship between the table keys
  • Foreign keys are not enforced by all RDBMS
  • The concept of referential integrity is derived from foreign key theory
  • Because Foreign keys involve more than one table relationship, their implementation can be more complex than primary keys
  • A foreign-key constraint implicitly defines an index on the foreign-key column(s) in the child table, however, manually defining a matching index may improve join performance in some database
  • The SQL, normally, provides the following referential integrity actions for deletions, when enforcing foreign-keys

Cascade

  • The deletion of a parent (primary key) record may cause the deletion of corresponding foreign-key records.

No Action

  • Forbids the deletion of a parent (primary key) record, if there are dependent foreign-key records.   No Action does not mean to suppress the foreign-key constraint.

Set null

  • The deletion of a parent (primary key) record causes the corresponding foreign-key to be set to null.

Set default

  • The deletion of a record causes the corresponding foreign-keys be set to a default value instead of null upon deletion of a parent (primary key) record

Related References

 

Netezza / PureData – How to rebuild a Netezza view in Aginity

How To Generate View or table DDL in Aginity For Netezza PureData

How To Generate View or table DDL in Aginity For Netezza

 

Rebuilding Netezza view sometimes becomes necessary when the view’s source table have changed underneath the view.  Rebuilding a view can be done on Netezza or in Aginity. In Aginity, it is a simple process, assume your user has permissions to create or replace a view.  The process breaks down into just a few steps:

Generate the create / replace view SQL of the original view into the query window, if you don’t have it already

In the object browser:

  • Navigate to the Database and view you wish to rebuild
  • Select the view and right click
  • Select ‘Scripts’, then ‘DDL to Query window’

Make may updates to create / replace View SQL

  • This step is not always necessary, sometimes the changes which invalided the view did not actually impact the code of the view. If changes are necessary, make may updates to the SQL code.

Execute The code

  • This I usually do by choosing the ‘Execute as a single batch’ option.  Make sure the code executes successfully.

Verify the view

  • To verify the simply execute a select statement and make it executes without errors and/or warning.

Related References

 

Netezza / PureData Date – Difference in Days SQL

Netezza PureData Date Difference in Days SQL, Subtracting Non-inclusive dates, Subtracting inclusive dates

Netezza /PureData Date Difference in Days

Since Netezza does not have a datediff function, the ‘old school’ of calculating the difference, in days, between dates must be used.

Subtracting Inclusive Dates

To subtract to day and include end date, as a day, in calculation (1 day is added)

select (date(‘2015-12-31’) – date(‘2015-01-01’))+1 as Inclusive_dates

From _v_dual;

 

Subtracting Non-inclusive dates

To subtract dates non-inclusive simply subtract the dates

select date(‘2015-12-31’) – date(‘2015-01-01’) as Non_Inclusive_dates

From _v_dual;

 

Example SQL From Graphic

select (date(‘2015-12-31’) – date(‘2015-01-01’))+1 as Inclusive_dates,

date(‘2015-12-31’) – date(‘2015-01-01’) as Non_Inclusive_dates

From _v_dual;

 

Related References

What are the types of Database Management Systems (DBMS)?

Database Management Systems (DBMS)

Database Management Systems (DBMS)

It is important to understand the differences between Database Management Systems (DBMS) types, since the structure of each type will influence integrations approaches, functionality, overall speed, and scalability.

The Five Types of Database Management Systems (DBMS)?

The five basic types of databases are:

  • Hierarchical
  • Object-Orientated
  • Network
  • Relational
  • Flat File

Hierarchical Database (HDB)

  • A hierarchical database (HDB or HDBMS)is a design that uses a one-to-many relationship for data elements. Hierarchical database models use a tree structure that links several disparate elements to one “owner,” or “parent,” primary record.

Object-Oriented Database (OODB)

  • Object-Orientated databases (OODB or OODBM) integrate object orientation with database capabilities. Object orientation allows a more direct representation and modeling of real-world problems, and database functionality is needed to ensure persistence and concurrent sharing of information in applications.

Network Database (NDB)

  • Network databases (NDB or NDBMS) are quite like hierarchical databases, except it allows multiple records to be linked to the same owner file. The model can be seen as an upside down tree where the branches are the member information linked to the owner, which is the bottom of the tree. The multiple linkages which this information allows the network database model to be very flexible. In addition, the relationship that the information has in the network database model is defined as many-to-many relationship because one owner file can be linked to many member files and vice versa.

Relational Database (RDB)

  • In simplest terms, a relational database (RDB or RDBMS)is one that presents information in formally described tables with rows and columns. A table is referred to as a relation in the sense that it is a collection of objects of the same type (rows). Data in a table can be related per common keys or concepts, and the ability to retrieve related data from a table, which is the basis for the term relational database. Data can be accessed or reassembled in many ways without having to reorganize the database tables structure.

Flat File Database (FFDB)

  •  A flat file database (FFDB or FFDBM) describes any of various means to encode a database model as a single file or collection of files, which can be a plain text file or a binary file. There are usually no structural relationships between the records. Each line of the text file holds one record, with fields separated by delimiters, such as commas or tabs. This is a very old database approach, but can still be found in use to, often with some relation capability enhancements,.  Some example of current flat file databases are: GRAV, Jekyll, Kerby, and Monstra.

Related References

 

Database – What is a Composite Primary Key?

Database Table

Database Table

What is a Composite Primary Key?

A Composite Primary key is Primary key What a primary key, which is defined by having multiple fields (columns) in it.  Like a Primary Key what a composite Primary Key is depends on the database.  Essentially a Composite Primary Key:

  • Is a combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And may define parent-Child relationship between tables

Related References

Database – What is a Primary Key?

Database Table

Database Table

What is a primary Key?

What a primary key is depends, somewhat, on the database.  However, in its simplest form a primary key:

  • Is a field (Column) or combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And, depending on the data model can, define parent-Child relationship between tables

Related References

Data Modeling – Fact Table Effective Practices

Database Table

Database Table

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Related References