Management principles – You can’t Manage what you don’t Measure

Business Intelligence and Analytics

Business Intelligence and Analytics

 

Management and Measurement

You can’t manage what you don’t measure is an old management adage that has been used for many years and while most attribute it to Peter Drucker, some claim that the quote was first used by Dr. W. Edwards Deming, although it is a bone of contention whether or not the quote is used in the correct context.

Irrespective of who said it first, I have always agreed with the principle. Coming from a corporate background where this is one of the management principles often used, I was surprised to learn that there are those that strongly disagree with the statement. This group argues that there are many things being managed at work that aren’t measurable, from the confidence we instill in a new, young manager, to the quality of new hires.

The argument is made that quantity is easy to measure, i.e., how much salespeople sell, how many leads marketing creates, or how many phone calls telemarketing makes, but that quality can’t be measured, i.e., excellent customer service, good technical support, or what differentiates a good consultant from a great one.

What to measure

Many organizations use Key Performance Indicators (KPIs) at multiple levels to measure their success at reaching targets, and will then manage the factors influencing the KPI to get it to where they want it to be. A KPI is a value that is measured and shows how effective a company is in reaching key business goals.

Setting a KPI and measuring a specific value is however not always as straightforward as it might seem. To set a KPI, the underlying business objective needs to be properly understood. In one example, a department manager’s KPI included the volume of sales, measured in dollars. In an effort to improve sales, the manager decided to change the remuneration of her sales reps from a fixed salary to a small, basic salary plus commission on sales made. The idea behind this was to incentivize the work, which would lead to increased sales. In the early months after implementing the change, the sales made by account reps did indeed increase dramatically. The CFO then however discovered that the profit margin on those increased sales was substantially lower than the minimum the company expected. The sales reps were discounting the product to increase sales, resulting in a high commission, but the net effect was that the company made less profit.

It is critical that the company’s objectives are clearly understood by all parties and that a suitable metric is measured to check if the objective is being met.

Can quality be measured?

Those arguing that quality, such as excellent customer service, or good technical support can’t be measured, often express the view that the only way that a company can determine how good their service or support is, is by asking the customer. I agree with that statement, but when you do that, aren’t you measuring these aspects? If 50% of your customers feel that your service and support is good, that is a measure against which you can manage and improve those objectives.

The same can be done for any qualitative metric. It merely becomes a question of what is appropriate to measure, and how to obtain those metrics. Qualitative measures often have to be done indirectly, i.e., you need to measure indirect results rather than direct ones.

 The role of Business Intelligence

With the sheer volume of data available across the business, and with much of it residing in different systems, it becomes very difficult to extract the relevant metrics to measure and improve. This is where Business intelligence or BI comes in.

BI utilizes computer-based techniques to spot, extract, and analyze business data, including things like sales, marketing, and production in order to make substantial improvements. Business Intelligence uses data already collected in the business. It is able to utilize data from such diverse sources as website analytics, accounting systems, customer relationship management (CRM) and email management systems.

A Business Intelligence system can automatically use and analyze all the information from these applications in real-time. This enables companies to quickly see, manage and improve their performance. BI goes further than simply measuring performance so that it can be improved, but also helps identify weaknesses in the company.

When an organization grows to the point where huge volumes of data are involved, analytics are used to examine large and varied data sets to uncover correlations, hidden patterns, customer preferences and market trends; so, organizations can make more-informed business decisions.

Both BI and big data analytics can hugely benefit Organization & Planning within any business. If you have all this information, irrespective of how exactly it was obtained or measured, managing the direction you want to go becomes an informed decision that can be planned for, rather than a guessing game based on ‘gut feel.’

A crucial element that is required in today’s fast-moving world is an organization’s ability to respond rapidly to changes in both the external and internal environment. This is known as Business agility, and it is not possible to do if the business does not measure what is going on inside and around it, and then manages accordingly.

Related References

 

Data Modeling – Column Data Classification

Data Modeling, Column Data Classification, Field Data Classification

Data Modeling

 

Column Data Classification

When analyzing individual column data, at its most foundational level, column data can be classified by their fundamental use/characteristics.  Granted, when you start rolling up the structure into multiple columns, table structure and table relationship, then other classifications/behaviors, such as keys (primary and foreign), indexes, and distribution come into play.  However, many times when working with existing data sets it is essential to understand the nature the existing data to begin the modeling and information governance process.

Column Data Classification

Generally, individual columns can be classified into the classifications:

  • Identifier — A column/field which is unique to a row and/or can identify related data (e.g., Person ID, National identifier, ). Basically, think primary key and/or foreign key.
  • Indicator — A column/field, often called a Flag, that has a binary condition (e.g., True or False, Yes or No, Female or Male, Active or Inactive). Frequently used to identify compliance with complex with a specific business rule.
  • Code — A column/field that has a distinct and defined set of values, often abbreviated (e.g., State Code, Currency Code)
  • Temporal — A column/field that contains some type date, timestamp, time, interval, or numeric duration data
  • Quantity — A column/field that contains a numeric value (decimals, integers, etc.) and is not classified as an Identifier or Code (e.g., Price, Amount, Asset Value, Count)
  • Text — A column/field that contains alphanumeric values, possibly long text, and is not classified as an Identifier or Code (e.g., Name, Address, Long Description, Short Description)
  • Large Object (LOB)– A column/field that contains data traditional long text fields or binary data like graphics. The large objects can be broadly classified as Character Large Objects (CLOBs), Binary Large Objects (BLOBs), and Double-Byte Character Large Object (DBCLOB or NCLOB).

Related References

What is a Common Data Model (CDM)?

Data Model, Common Data Model, CDM, What is a Common Data Model (CDM)

Data Model

 

What is a Common Data Model (CDM)?

 

A Common Data Model (CDM) is a share data structure designed to provide well-formed and standardized data structures within an industry (e.g. medical, Insurance, etc.) or business channel (e.g. Human resource management, Asset Management, etc.), which can be applied to provide organizations a consistent unified view of business information.   These common models can be leveraged as accelerators by organizations form the foundation for their information, including SOA interchanges, Mashup, data vitalization, Enterprise Data Model (EDM), business intelligence (BI), and/or to standardize their data models to improve meta data management and data integration practices.

Related references

IBM, IBM Analytics

IBM Analytics, Technology, Database Management, Data Warehousing, Industry Models

github.com

Observational Health Data Sciences and Informatics (OHDSI)/Common Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Utilities Data Model

Oracle

Industries, Communications, Service Providers, Products, Data Mode, Oracle Communications Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Airline data Model

 

Data Modeling – Fact Table Effective Practices

Database Table

Database Table

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Related References

Data Modeling – Dimension Table Effective Practices

Database Table

Database Table

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

 

 

InfoSphere DataStage – DataStage Parallel Job Peer Code Review Checklist Template

SDLC Development Phase

SDLC Development Phase

Peer code review happens during the development phase and focus on the overall quality and compliance to standards of code and configuration artifacts. However, the hard part of performing a Peer code review isn’t, performing the review, but rather to achieving consistency and thoroughness in the review.   This is where a checklist can contribute significantly, providing a list of things to check and providing a relative weight for the findings.  I hope this template assists with your DataStage job review process.

 

ETL Error Handling Effective Practices

Error, ETL Error Handling Effective Practices

Error

ETL Error Handling Effective Practices

ETL (extract, transform, and load) error handling practices can vary, but three basic approaches can significantly assist in having effective ETL error handling practices.  Effective error handling practices begin in the requirements and design phases. All too often, error handling practices are left to the build phase and the fall to the developer practices. This is an area where standard practices are not well defined or adapted by the ETL developer community.  So, here are a few effective error handling practices which will contribute to process stability, information timeliness, information accuracy, and reduce the level of effort required to support the application once in operation.

Anticipating ETL Errors

In the requirements and design phases, with proper consideration, many errors can be avoided altogether in the ETL process. When discussing requirements and preparing designs consideration should be given to error handling, especially, treatment of common errors. As an effective practice, anticipated errors should be treated within the ETL process. Some examples, to consider our:

  • Replacement of special characters: do any special characters need to be removed, if found? This is generally determined that the field level and should be considered in the source to target mapping (STTM) and business rules. Also, the passing between differing systems and working with VARCHAR fields, should the ‘unicode’ extended property be set.
  • Removal of leading and trailing spaces: removal of unnecessary leading and trailing spaces should be considered when changing fields from CHAR to VARCHAR and when working with keys used as primary keys, join keys, and/or lookup keys.
  • Deduplication of data: duplicate data prevention practices and business rules should always be considered. These can be of a couple of types:
    • First, is file processing conventions, such as, assigning timestamps to files and removal or movement of process files to prevent reprocessing.
    • Second, is rules for dedication of duplicate rows, including the dedication of appropriate keys for determining duplicate rows.
    • Third, if duplicate rows are being produced as a result of more than one input source system, identification of the authoritative source should be considered to resolve conflicts.
  • Null Value Treatment: null value treatment can be extraordinarily important, especially, when working with keys and traditional data warehouse models. It is important to be mindful of the fact that to the database and the ETL nulls and spaces are not the same thing. They may or may not be the same thing in the mind of the consumer of the information. So,  business rules should indicate the treatment of both spaces and nulls. In some circumstances, especially, when using surrogate keys in data warehousing business processes sometimes need to know the difference between a null in a space or even a no and the space and an unknown value. So these three scenarios should be considered when forming business rules and treating the ETL.  Here are a couple questions that could be asked informing your solution:
    • do nulls and spaces mean the same thing to the business community?
    • Is a space considered an unknown value?
    • Does a null need to be uniquely identified as different from a space and/or an unknown look up value?
    • If surrogate keys are in use for the field in question, which of these scenarios require a unique surrogate key, other than the unknown unique surrogate key?
  • Missing or Invalid Value Replacement or Defaults: having replacement values or defaults is especially important for any fields which are not nullable and/or require a surrogate key for data warehouse dimensions.  Also, for reporting to be meaningful replacement or default value assignments can be important, as well (e.g. for cubes, and statistical calculations).

Rejecting Rows

Rows should not be rejected, unless there is a specific business requirement and/or need to do so.  Rejecting rows causes data inaccuracies by omission and undermines the consumer’s confidence in the accuracy of the information being delivered.  This can be, especially, problematic for accounting and other activities, which must balance across information sets.

  • If value look ups are in use:
    • Unknown and null values need to have a treatment rule to prevent errors.
    • Two surrogate key or transformation default values may be necessary, if the ability to distinguish between an unknown/Invalid value and a null value is required.
    • Make sure the look up ‘Key Type’ are aligned (e.g. equality, caseless equality) to the formatting of both inputs to the look up
    • That the complete unique key is in use.

Information Consistency Practices

Information consistency practices allow the information to be transformed and enriched to make the information more consistent for ‘like to like’ comparisons, usability, and/or readability. As an effective practice consider these Standard formatting recommendations, which can be good requirements questions and should be included in the STTM:

  • Making descriptive and/or text fields consistent in their format (e.g. mixed case, Proper case, upper case).
  • Have use consistent date formatting, when converting dates to text fields.
  • When dealing with currency, convert the currency to consistent ISO currency codes (e.g. USD, CAD, EUR) and decimal (e.g.  two decimal places).
  • Identification of financial records into categories (e.g. credit and debit) with a default group behavior included (e.g. N/A or Unknown).