Netezza – JDBC ISJDBC.CONFIG Configuration

JDBC ( Java Database Connectivity)

JDBC ( Java Database Connectivity)

 

This jdbc information is based on Netezza (7.2.0) JDBC for InfoSphere Information Server11.5.  so, here are a few pointers for building an IBM InfoSphere Information Server (IIS) isjdbc.config file.

Where to place JAR files

For Infosphere Information Server installs, as a standard practice, create a custom jdbc file in the install path.  And install any download Jar file not already installed by other applications in the jdbc folder. Usually, jdbc folder path looks something like this:

  • /opt/IBM/InformationServer/jdbc

CLASSPATH

  • nzjdbc3.jar
  • Classpath must have complete path and jar name

CLASS_NAMES

  • netezza.Driver

JAR Source URL

IBM Netezza Client Components V7.2 for Linux

IBM Netezza Client Components V7.2 for Linux

 

File name

  • nz-linuxclient-v7.2.0.0.tar.gz

Unpack tar.gz

  • tar -zxvf nz-linuxclient-v7.2.0.0.tar.gz -C /opt/IBM/InformationServer/jdbc

DB2 DEFAULT PORT

  • 1521

JDBC URL FORMAT

  • jdbc:netezza://:/

JDBC URL EXAMPLE

  • jdbc:netezza://10.999.0.99:5480/dashboard

 

isjdbc.config EXAMPLE

CLASSPATH=usr/jdbc/nzjdbc3.jar;/usr/jdbc/nzjdbc.jar;/usr/local/nz/lib/nzjdbc3.jar;

CLASS_NAMES= org.netezza.Driver;

 

Isjdbc.config FILE PLACEMENT

  • /opt/IBM/InformationServer/Server/DSEngine

 

Related References

Data Warehouse – Effective Practices

Methodology

Methodology

Effective Practices

Effective practices are enablers, which can improve performance, data availability, environment stability, resource consumption, and data accuracy.

Use of an Enterprise Scheduler

The scheduling service in InfoSphere information Server (IIS) leverages the operating system (OS) scheduler, the common enterprise scheduler can provide these capabilities beyond those of a common OS scheduler:

  • Centralized control, monitoring, and maintenance of job stream processes
  • Improved insight into and control of cycle processes
  • Improved intervention capabilities, including alerts, job stream suspension, auto-restarts, and upstream/downstream dependency monitoring
  • Reduced time-to-recovery and increased flexibility in recovery options
  • Improved ability to monitor and alert for mission critical process that may be delayed or failing
  • Improved ability to automate disparate process requirements within and across systems
  • Improved load balancing to optimize use of resources or to compensate for loss of a given resource
  • Improved scalability and adaptability to infrastructure or application environment changes

Use of data Source Timestamps

When they exist or can be added to data, ‘created’ and ‘last updated’ timestamps can greatly reduce the impact of Change Data Capture (CDC) operations.  Especially, if the data warehouse, data model and load process store that last success run time of CDC jobs. This reduces the number of rows required to be processed and reduces the load on the RDBMS and/or ETL application server.  Leveraging ‘created’ and ‘last updated’ can, also, greatly reduce processing time required to perform the same CDC processes.

Event Based Scheduling

Event based scheduling, when coupled with an Enterprise scheduler, can increase data availability, distribute work opportunistically. Event based scheduling can allow all or part of a process stream to begin as soon as predecessor data sources have completed the requisite processes.  This can allow processes to begin soon as possible, which can reduce resource bottlenecks and contention. This, potentially, allows data to be made available earlier than a static time based schedule.  Event based scheduling can also delay processing, should the source system requisite processing completion be delayed; thereby, improving data accuracy in the receiving system.

Integrated RDBMS Maintenance

Integrating RDBMS Maintenance into the process job stream can perform on demand optimization as the processes move through their flow, improving performance.  Items such as indexing, distribution, and grooming, maintenance at key points ensures that the data structures are optimized for follow on processes to consume.

Application Server and Storage  Space Monitoring and Maintenance

Monitoring and actively clearing disk space can not only improves overall performance, and reduce costs, but it also improves application stability.

Data Retention Strategies

Data Retention strategies, an often overlooked form of data maintenance, which deals with establishing policies ensure only truly necessary data is kept and that information by essential category, which is no longer necessary is purged to limit legal liability, limit data growth, storage costs, and improve RDBMS performance.

Use Standard Practices

Use of standard practices both, application and industry, allows experienced resources to more readily understand the major application activities, their relationships, dependency, design and code.  This facilitates resourcing and support over the life cycle of the application.

 

How to get the current Status of a DB2 Database

To determine the current configuration of a DB2 database on a Linux server:

  • Sign-on to Linux server containing database as instance owner (e.g. db2inst1)
  • Run the following command to find out the current state of the database: db2 get db cfg for >

You should receive a response similar to this, but with additional values:

DB2 Current Database Configuration Example

DB2 Current Database Configuration Example

Related References

 

 

Data Modeling – Database Table Field Ordering Effective Practices

Database Table Field Ordering Effective Practices

Database Table

Field ordering can help the performance on inserts and updates and, also, keeps developer and users from having to search entire table structure to be sure they have all the keys, etc.

Table Field Ordering

  1. Distribution Field Or Fields, if no distribution field is set the first field will be used by default.
  2. Primary Key Columns (including Parent and Child key fields)
  3. Foreign Key Columns (Not Null)
  4. Not Null Columns
  5. Nullable Columns
  6. Created Date Timestamp
  7. Modified (or Last Updated) Date Timestamp
  8. Large text Fields
  9. Large binary Columns or Binary Field references

Related References

What is the Relationship Between CPUs and Cores?

CPU 8 core chip, What is the Relationship Between CPUs and Cores

CPU 8 core chip

 

Until 1996, the Central Processing Unit (CPU) and core were accentually the same thing.  These days, a CPU’s relationship could be 1-to-1 or 1-to many cores.  Therefore, cores are essentially subcomponents of the CPU architecture.

CPU to Core Relationship: 1-to-1

CPU to Core Relationship: 1-to-1

CPU to Core Relationship: 1-to-1

 

CPU to Core Relationship: 1-to-Many

CPU to Core Relationship: 1-to-Many

CPU to Core Relationship: 1-to-Many

 

Related References

 

 

Structured Query Language (SQL) Tuning

Structured Query Language (SQL) Tuning

Structured Query Language (SQL) Tuning

Tuning SQL is one of those skills, which is part art and part science.  However, there are a few fundamental approaches, which can help ensure optimal SQL select statement performance.

Structuring your SQL

By Structuring SQL Statements, much performance can be gained through good SQL statement organization and sound logic.

Where Clause Concepts:

Use criteria ordering and Set Theory thinking.  SQL  can be coupled with set-theory to aid conception of the operations being conducted. Order your selection criteria  to execute criteria which arrives at the smallest possible row set first. Doing so, reduces the volume of rows to be processed by follow-on operations. This does require an understanding of the data relationships to be effective.

SQL First Select Criteria Pie Chart

SQL First Select Criteria Pie Chart

 

Join Rules (equi joins, etc.)

When  constructing your joins, consider these rules:

  • Join on keys and indexed columns: The efficiency of your program improves when tables are joined based on indexed columns, rather than on non-indexed ones.
  • Use equi-joins (=), whenever possible
  • Avoid using of sub-queries
  • Re-write EXISTS and NOT EXISTS subqueries as outer joins
  • Avoid OUTER Joins on fields containing nulls
  • Avoid RIGHT OUTER JOINS: Always select FROM your primary table (or derived table) and LEFT OUTER JOIN to auxiliary tables.
  • Use Joins Instead of Subqueries: A join can be more efficient than a correlated subquery or a subquery using IN. Use caution when specifying ORDER BY with a Join: When the results of a join must be sorted, limiting the ORDER BY to columns of a single table can cause the database to avoid a sort.
  • Provide Adequate Search Criteria: When possible, provide additional search criteria in the WHERE clause for every table in a join. These criteria are in addition to the join criteria, which are mandatory to avoid Cartesian products

Order of Operations SQL & “PEMDAS”

To improve your SQL, careful attention needs to be paid to the mathematical order of operations; especially, parentheses since they not only set the order of operation, but also the boundaries of each subset operation.

  • PEMAS is “Parentheses, Exponents, Multiplication and Division, and Addition and Subtraction”.
  • Use parentheses () to group and specify the order of execution. SQL observes the normal rules of arithmetic operator precedence.
Precedence Operator(s) Operation(s) Notes
1 ( ) Parentheses If the parentheses are nested, the expression in the innermost pair is evaluated first. If there are several un-nested parentheses, then parentheses are evaluated left to right.
2 *
/
%
Multiplication
Division
Modulus
If there are several, evaluation is left to right.
3 +
Addition
Subtraction
If there are several, evaluation is left to right.

 

Index Leveraging (criteria ordering, hints, append, etc.)

  • Avoid Full Table Scans: within the scope of a SQL statement, there are many conditions that will cause the SQL optimizer to invoke a full-table scan.  Avoid Queries:
  • with NULL Conditions (Is NUll, Is Not NUll)
  • Against Unindexed Columns
  • with Like Conditions
  • with Not Equals Condition (<>, !=, not in)
  • with use built-in Function (to_char, substr, decode, UPPER)
  • Use UNION ALL instead of UNION if business rules allow
  • UNION: Specifies that multiple result sets are to be combined and returned as a single result set. Query optimizer performs extra work to return to avoid duplicate rows.
  • UNION ALL: Incorporates all rows into the results. This includes duplicates. Query optimizer just needs to concatenate the result sets with no extra work
  • Use stored procedures instead of ad hoc queries when possible. Stored procedures are precompiled and cached
  • Avoid cursor use when possible
  • Select only the rows needed
  • Use NOLOCK hint in select statement to avoid blocking
  • Commit transactions in smaller batches
  • Whenever possible use tables instead of views
  • Make sure comparison columns whether using JOIN or WHERE clause are exactly same data type. For example if we are comparing Varchar column to nchar columns the query optimizer has to do a CONVERT before comparing the values

Note: You do not necessarily need to remove all full table scans from your query’s execution plan. Tables with few rows, few columns, or thin columns may fit into few database blocks. In this case, a full table scan will always be the most efficient access

Why Use Materialized Views?

Database Views - Why Use Materialized Views

Database Views

If we consider Materialized Views (MV) in their simplest form, as a point in time stored query result, then materialized views serve two primary purposes Performance Optimization and Semantics Simplification.

Performance Optimization

There are several ways in which materialized views can improve performance:

  • Reduce Database Workloads: materialized views can reduce database workloads by pre-assembling frequently used queries and, thereby, eliminating the repetitive execution of joins, aggregations, and filtering.
  • Facilitate Database Optimizers: in some databases can be partitioning and indexing which are considered by database optimizers. Also, some databases, in which more than one materialized view has been applied to a table, the database optimizer will consider the all the associated materialized views when optimizing queries.
  • Reduced Network Workloads: by the use of database replication and/or mass deployment techniques to materialized views, they can be distributed to more local proximity to the consumers, thereby, reducing the data volume across the network and provided business continuation/disaster recovery capabilities, should the primary site become temporarily unavailable.
  • Precalculation and/or Preaggregation: Performing calculation and aggregation of information upon creation of a materialized view, eliminates the need to perform these functions on an on-demand basis as various consumers submit requests.
  • Data Subsets: by applying filters to eliminate unnecessary data (e.g. history data no long in common reporting use) or unnecessary data attributes (e.g. unused columns or columns intended for other information purposes) the impact of filter for these items is reduced and is effectively eliminated for consumers of the materialized view.

Semantics Simplification

Materialized views can be used to simplify the semantics provided to information consumers with Ad hoc capabilities and/or to simplify the construction of reporting and analytics objects.  Depending on the database and/or integration tools in use to create them, materialized views can simplify the consumer experience by:

  • Reduce or Eliminate Join Coding: When constructed materialized views can perform the joins and populate the materialized view with the value results of from the join table, thereby, eliminating the need for the consumer to perform this function as an ad hoc user or in the semantics of reporting and analytics tools
  • Pre-application of Business Rules: When constructed materialized views can apply business rules to facilitate queries by adding indicator flags and preapplying special business logic to data and populating the materialized view with the value results, thus, eliminating the need for the consumer to perform this function as an ad hoc user or in the semantics of reporting and analytics tools.
  • Precalculation and/or Pre-aggregation: Performing calculation and aggregation of information upon creation of a materialized view allow the consumer to use the results without need to build the calculations and/or aggregations in to the ad hoc query or in the semantics of reporting and analytics tools.  This also, helps to ensure information accuracy and consistency.
  • Data Subsets: By prefiltering the data during the creation of the view unnecessary or unused data and columns are not available to consumers and do not need to be filter out of ad hoc queries or in the semantics of reporting and analytics tools

What is the difference between left join and left outer join in SQL

When working with different databases syntax can cause questions and confusion.  Recent having been ask what the difference was between a left join and a left outer join, a subject which I hadn’t thought about in a while, I thought a simple explanation might be in order.   Actually, there is no difference between a left join and a left outer join, other than syntax.  Both perform the exact same operation in SQL, where the Left (Outer) Join will retain those rows for which there was a match in both tables and, also, retain those rows which exist only in the left (controlling) table of the join.

Left Outer Join Relationship, SQL, Outer Join,

Left Outer Join Relationship