Database – What is a Primary Key?

Database Table

Database Table

What is a primary Key?

What a primary key is depends, somewhat, on the database.  However, in its simplest form a primary key:

  • Is a field (Column) or combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And, depending on the data model can, define parent-Child relationship between tables

Related References

Oracle Database – Useful links

Oracle Database

Oracle Database

Some useful oracle documentation references.

Reference Type Link

Database SQL Language Reference (11.2)

https://docs.oracle.com/cd/E11882_01/server.112/e41084/

Database SQL Language Reference (11.2)

https://docs.oracle.com/cd/E11882_01/server.112/e16604/toc.htm
Database SQL Language Reference (12c) https://docs.oracle.com/database/121/SQLRF/toc.htm

Database (12c) SQL*Plus User’s Guide and Reference

https://docs.oracle.com/database/121/SQPUG/toc.htm

SQL Developer Documentation

http://docs.oracle.com/cd/E12151_01/

Netezza / PureData – What client tools to use with Netezza

IBM Netezza / IBM PureData for Analytics

IBM Netezza / IBM PureData for Analytics

 

Occasionally, a client will want a list of tools to work with Netezza / PureData, other than the Netezza Administrator Client.  Honestly, there are several tools which could be used, if they have odbc and/or jdbc connectivity.  However, these are the tools which keep being used across different customers.

Open Source

For customers willing to work with an open source tool Aginity for Netezza provides a significant set of capabilities, including script generation, which can be a significant productivity accelerator for development and operation support teams.

Aginity for Netezza

Free From IBM

For customer who do not want to work with open source tools, then IBM Data studio is an acceptable tool and has the added benefit of being free from IBM.

IBM Data Studio

IBM Software > Products > Data management platform > Data management > IBM Data Studio

Related References

Where to download IBM Data Studio?

IBM Data Studio Client

IBM Data Studio Client

IBM data studio is offered free from IBM, and can be helpful when working with DB2 and Puredata/Netezza using a JDBC driver.

What you need to Down Load IBM Data Studio

  • You will need an IBM ID and password

Basic down load steps

IBM Sign In Screen

IBM Sign In Screen

  • Enter you IBM ID, and password, then click ‘sign in’.
  • On the IBM Data Studio Client, license page, check ‘I agree’ and then click ‘I confirm’
IBM Data Studio Client License Screen

IBM Data Studio Client License Screen

  • On the IBM Data Studio Client, download page, Select the desired method tab, Then
    • Select the desired product or products and click ‘Download now.
IBM Data Studio Client Download Files Screen

IBM Data Studio Client Download Files Screen

 

Related References

IBM Data Studio

IBM Software > Products > Data management platform > Data management > IBM Data Studio

IBM Data Studio Client (Download)

IBM Support

Download and install IBM Data Studio Version 4.1.x

IBM Support

System requirements for IBM Data Studio Version 4.1.x

IBM Knowledge Center

Data Studio, Data Studio 4.1.1, Overview, Overview of IBM Data Studio

 

Aginity for Netezza – How to disable and/or Enable the Autocomplete Function

Aginity for Netezza an out complete feature, which auto completes brackets, quotes, and comment character, which some folks find useful and helpful.  However, if you are an old-school style coder like me, you may find these annoying, to say the least.  Usually, when I’m coding I want to write straight through without the need to think about how the application may be trying to help me, which means I usually end up cleanup all the autocomplete added characters after the fact.  Disabling this feature is quick and easy and can be easily be reinstated later, if you find you want to leverage some or all of the autocomplete features.

Aginity Tools Options

Aginity Tools Options

 

To Disable the Autocomplete Functions

  • Navigate to: Tools > Options
  • Then, Navigate to: Query Analyzer > Code autocomplete
  • Then, Uncheck the feature or features you wish to disable and press, ‘OK
Disable Aginity Code Autocomplete Options Disable

Disable Aginity Code Autocomplete Options Disable

To Enable the Autocomplete Functions

  • Navigate to: Tools > Options
  • Then, Navigate to: Query Analyzer > Code autocomplete
  • Then, Check the feature or features you wish to enable and press, ‘OK
Enable Aginity Code Autocomplete Options Disable

Enable Aginity Code Autocomplete Options Disable

Related References

What is BYOD?

Acronyms, Abbreviations, Terms, And Definitions

Acronyms, Abbreviations, Terms, And Definitions

 

What is BYOD?

Basically, BYOD (bring your own device) is an information technology trend toward employee-owned devices within a business, in which consumer software and hardware are being integrated into the enterprise workplace.

 

Benefits of BYOD

The benefits of BYOD depend upon the point of view, here is a quick list

  • Supports integrated remote work and remote workforce augmentation with requiring the acquisition of hardware and software
  • Reduced software learning curve for employees
  • Increased availability of work force and access to network.

 

Drawbacks of BYOD

  • Increased Support requirements
  • Increase Security and data exposure risk
  • Less consistent hardware and Software environment

 

PureData – Table Effective Practices

IBM Netezza PureData Table Effective Practices

Database Table

Here a few tips, which can make a significant difference to the efficiency and effectiveness of developers and users, making information available to them when developing and creating analytic objects.  This information can, also, be very help to data modelers.  While some of these recommendations are not enforced by Netezza/PureData, this fact makes them no less helpful to your community.

Alter table to Identify Primary Keys (PK)

  • Visually helps developers and users to know what the keys primary keys of the table are
  • Primary key information can, also, be imported as meta data by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
  • The query optimizer will use these definitions define efficient query execution plans

Alter table to Identify Foreign Keys (FK)

  • Illustrate table relationships for developers and users
  • Foreign key information can, also, be imported as meta data by other IBM tools (e.g. InfoSphere, Datastage, Data Architect, Governance Catalog, Aginity, etc.)
  • The query optimizer will use these definitions define efficient query execution plans

Limit Distribution Key to Non-Updatable Fields

  • This one seems obvious, but this problem occurs regularly, if tables and optimizations are not properly planned; Causing an error will be generated, if an update is attempted against a field contain in the distribution of a table.

Use Null on Fields

  • Using ‘Not Null’ whenever the field data and ETL transformation rules can enforce it, helps improve performance by reducing the number of null condition checks performed and reduces storage.

Use Consistence Field Properties

  • Use the same data type and field length in all tables with the same field name, reduces the amount of interpretation/conversion required by the system, developers, and report SQL.

Schedule Table Optimizations

  • Work with your DBA’s to determine the best scheduling time, system user, and priority of groom and generate statistics operations. Keep in mind the relationship to when the optimizations occur in relation to when users need to consume the data. All too often, this operation is not performed before users need the performance and/or is driven by DBA choice, without proper consideration to other processing performance needs.  This has proven, especially true, in data warehousing when the DBA does not have Data warehousing experience and/or does not understand the load patterns of the ETL/ELT process.

Related Links

 

Infosphere Information Server (IIS) Commonly Used Parameters

Parameters are a very big key in, Infosphere Information Server (IIS), to process flexibility in sequences, DataStage jobs and DataQuality jobs.  Parameterization also help reduce development effort, reduce the number of jobs required, and reuse of jobs, by allowing construction of multi instance jobs, which are essentially reused code.

However, sometimes when starting a project, parameters need to created and doing so from memory, doesn’t always achieve the best results. So here is a quick starter list of parameters, which seem to be commonly encountered.  Hopefully, this list will aide in your parametrization efforts and setup.

 

Type Prompt Description Example
String DS_DIR Dataset Directory Path
String DS_LOG_DIR Log File Directory Path
String QUOTES Quotes
String RECIPIENT_EMAIL Recipient Email Address
String SENDER_EMAIL Senders Email Address
String SMTP_SERVER SMTP Mail Server Name
String SQL_DIR SQL File Directory path
String DATE_OFFSET Date Offset Number 1
String DS_ENVIRONMENT Datastage Environment PROD
String SRC_DIR Source Files Directory
String SRC_KEY_GEN_DIR Source Key Generator Files Directory Path
String SRC_REJ_DIR Source Reject Files Directory Path
String WRK_DIR Working Directory Path
String SRC_TABLE Source Table Name
String DB_SCHEMA Database Schema Name
String TGT_TBL Target Table Name
String PROC_DTE Run Control or processing date
String CURR_DTE Current Date

Related References

Infosphere Information Server (IIS) Component Alignment

Infosphere Information Server SDLC Alignment

Infosphere Information Server SDLC Alignment

In recent history, I have been asked several times to describe where different IIS components fit in the Software Development Lifecycle (SDLC) process.  The graphic above, list most of the more important IIS components in their relative SDLC relationships. However, it is important to note that that these are not absolutes. Many applications may cross boundaries depending on the practices of the individual company, the application spurt licensed by the company, and/or the applications implemented by the company.  For example, many components will participate in the sustainment phase of SDLC, although I did not list him in that role. This is especially true, if you’re using the governance tools (e.g. governance catalog ) and supporting your sustainment activities with modeling and development tools, such as, data architect.

Related References

InfoSphere Information Server (IIS) Component Descriptions

Infosphere Information Server LaunchPad Page

Infosphere Information Server LaunchPad Page

 

Each IIS component has a primary function in the InfoSphere architecture, which can by synopsized as follows:

 

Application Function
Blueprint Director IBM InfoSphere Blueprint Director is aimed at the Information Architect designing solution architectures for information-intensive projects.
Cognos (If purchased) Governance Dashboard (Framework Manager Model provided by IBM), Semantics, Analytics, and Reporting
Data Architect Data Architect is an enterprise data modeling and integration design tool. You can use it to discover, model, visualize, relate, and standardize diverse and distributed data assets, including dimensional models.
Data Click Data Click is an exciting new capability that helps novices and business users retrieve data and provision systems easily in only a few clicks.
Datastage DataStage is a data integration tool that enables users to move and transform data between operational, transactional, and analytical target systems.
Discovery Discovery is used to identify the transformation rules that have been applied to source system data to populate a target. Once accurately defined, these business objects and transformation rules provide the essential input into information-centric projects.
Fasttrack FastTrack streamlines collaboration between business analysts, data modelers, and developers by capturing and defining business requirements in a common format and, then, transforming that business logic (Source-to-Target-Mapping (STTM)) directly into DataStage ETL jobs.
Glossary Anywhere Business Glossary Anywhere, its companion module, augments Governance Catalog with more ease-of-use and extensibility features.
Governance Catalog The Governance Catalog includes business glossary assets (categories, terms, information governance policies, and information governance rules) and information assets.
Information Analyzer Information Analyzer provides capabilities to profile and analyze data.
Information Services Director Information Services Director provides a unified and consistent way to publish and manage shared information services in a service-oriented architecture (SOA).
Metadata Asset Manager Import, export, and manage common metadata assets in Metadata Repository and across applications
Operations Console Admin workspaces to investigate data, deploy applications, Web services, and monitor schedules and logs.
Qualitystage QualityStage provides data cleansing capabilities to help ensure quality and consistency by standardizing, validating, matching, and merging information to create comprehensive and authoritative information.
Server Manager Deployment tool to move, deploy, and control DataStage and QualityStage assets.

Related References

Data Modeling – What is Data Modeling?

Data Models, Data Modeling, What is data Modeling, logical Model, Conceptual Model, Physical Model

Data Models

Data modeling is the documenting of data relationships, characteristics, and standards based on its intended use of the data.   Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data creating a blueprint and foundation for information technology development and reengineering.

A data model can be thought of as a diagram  that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, a well-documented models allow stakeholders to identify errors and make changes before any programming code has been written.

Data modelers often use multiple models to view the same data and ensure that all processes, entities, relationships and data flows have been identified.

There are several different approaches to data modeling, including:

Concept Data Model (CDM)

  • The Concept Data Model (CDM) identifies the high level information entities, their relationships, and organized in the Entity Relationship Diagram (ERD).

Logical Data Model (LDM)

  • The Logical Data Model (LDM)  defines detail business information (in business terms) within each of the Concept Data Model and is a refinement of the information entities of the Concept Data Model.   Logical data model are non-RDBMS specific  business definition of tables, fields, and attributes contained within each information entity from which the Physical Data Model (PDM) and Entity Relationship Diagram (ERD) is produced.

Physical Data Model (PDM)

  • The Physical Data Model (PDM)  provides the actual technical details of the model and database object (e.g. table names, field names, etc.) to facilitate creation of accurate detail technical designs and actual database creation.  Physical Data Models are RDBMS specific definition of the logical model used build database, create deployable DDL statements, and to produce the Entity Relationship Diagram (ERD).

Related References

 

Useful Tools for Technical Consultants and Developers

Here is a quick list of technical consultant and developer tools (free or open source), which have proven useful over the years to me.  Hopefully, this information will be of use to you has well.

Text and Code Editors

Notepad++

Notepad++, which I have been using for years now, is a free source code editor which supports several programming languages running under the MS Windows environment (governed by GPL License).  I have repeatedly found Notepad++ extraordinarily useful.  Especially, the language features (e.g. Shell, XML, SQL) or needing to work with files and data without the worry of hidden character.

Image and Screenshot Tools

Greenshot

Greenshot is a light-weight screenshot software tool for Windows, which can be very helpful for capturing application screenshots.  Greenshot allows quick creation of screenshots, easy annotation, highlighting and obfuscating using the built-in image editor and sending the screenshot to a file, the clipboard, a printer or as e-mail attachment

PhotoScape

PhotoScape is an easy photo editing software editing, fixing and enhance photos, screenshots, and images.

Useful Open Source Tools for Project Plans

Google – gantterforgoogledrive

This tool can be to create projects and Gantter charts, when you don’t have a project planning tool available. this tools does seam to have some size and complexity limitations.

Microsoft Project Viewer

This is a useful Microsoft Project plan view for those times when you have MS project available.

ProjectLibre OPENPROJ

I have found this tool to be a useful substitute for MS Project and I have not had any issues with size or complexity limitations.  I have also found the export and import capability useful when working on projects teams who do no use a consistent project planning tool.  Additionally, if you are or have been a MS Project user, you should most of the functionality moderately intuitive, at least, I didn’t have any problems adapting.

Diagram Viewers

Microsoft Visio 2016 Viewer

The Microsoft Visio Viewer can be very handy when you need to view Visio diagrams and there are not enough licenses to go around, or you just need to read them and have no need to create or update the diagrams.

Google chrome addin 

if you are a chrome user you may want to consider this alternative.  Especially, if you are a MAc and/or Linux user.