PureData – Table Effective Practices

IBM Netezza PureData Table Effective Practices

Database Table

Here a few tips, which can make a significant difference to the efficiency and effectiveness of developers and users, making information available to them when developing and creating analytic objects.  This information can, also, be very help to data modelers.  While some of these recommendations are not enforced by Netezza/PureData, this fact makes them no less helpful to your community.

Alter table to Identify Primary Keys (PK)

  • Visually helps developers and users to know what the keys primary keys of the table are
  • Primary key information can, also, be imported as meta data by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
  • The query optimizer will use these definitions define efficient query execution plans

Alter table to Identify Foreign Keys (FK)

  • Illustrate table relationships for developers and users
  • Foreign key information can, also, be imported as meta data by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
  • The query optimizer will use these definitions define efficient query execution plans

Limit Distribution Key to Non-Updatable Fields

  • This one seems obvious, but this problem occurs regularly, if tables and optimizations are not properly planned; Causing an error will be generated, if an update is attempted against a field contain in the distribution of a table.

Use Null on Fields

  • Using ‘Not Null’ whenever the field data and ETL transformation rules can enforce it, helps improve performance by reducing the number of null condition checks performed and reduces storage.

Use Consistence Field Properties

  • Use the same data type and field length in all tables with the same field name, reduces the amount of interpretation/conversion required by the system, developers, and report SQL.

Schedule Table Optimizations

  • Work with your DBA’s to determine the best scheduling time, system user, and priority of groom and generate statistics operations. Keep in mind the relationship to when the optimizations occur in relation to when users need to consume the data. All too often, this operation is not performed before users need the performance and/or is driven by DBA choice, without proper consideration to other processing performance needs.  This has proven, especially true, in data warehousing when the DBA does not have Data warehousing experience and/or does not understand the load patterns of the ETL/ELT process.

Related Links

 

IBM InfoSphere Business Glossary Anywhere Client Is Gone

IBM_Infosphere Business Glossary Anywhere

IBM InfoSphere Business Glossary Anywhere

I thought I would post this, to point out what appears to be a case of incorrect  documentation, with regard to InfoSphere Information Server (IIS) 11.5 and 11.3.

According to IBM’s own documentation ‘InfoSphere Business Glossary, InfoSphere Business Glossary Anywhere, and InfoSphere Metadata Workbench are consolidated into the new product InfoSphere Information Governance Catalog.’  So, beware of these obsolete links, provided below.

Source link

Obsolete IBM Documentation Links for InfoSphere Business Glossary Anywhere InfoSphere

11.5

11.3