Data Warehousing vs. Data Virtualization

Information Management
Information Management

Today, a business heavily depends on data to gain insights into their processes and operations and to develop new ways to increase market share and profits. In most cases, data required to generate the insights are sourced and located in diverse places, which requires reliable access mechanism. Currently, data warehousing and data virtualization are two principal techniques used to store and access the sources of critical data in a company. Each approach offers various capabilities and can be deployed for particular use cases as described in this article.

Data Warehousing

A data warehouse is designed and developed to secure host historical data from different sources. In effect, this technique protects data sources from performance degradation caused by the impact of sophisticated analytics and enormous demands for reports. Today, various tools and platforms have been developed for data warehouse automation in companies. They can be deployed to quicken development, automate testing, maintenance, and other steps involved in data warehousing. In a data warehouse, data is stored as a series of snapshots, where a record represents data at a particular time. In effect, companies can analyze data warehouse snapshots to compare data between different periods. The results are converted into insights required to make crucial business decisions.

Moreover, a data warehouse is optimized for other functions, such as data retrieval. The technology duplicates data to allow database de-normalization that enhances query performance. The solution is further deployed to create an enterprise data warehouse (EDW) used to service the entire organization.

Data Warehouse Information Architecture
Data Warehouse Information Architecture

Features of a Data Warehouse

A data warehouse is subject-oriented, and it is designed to help entities analyze data. For instance, a company can start a data warehouse focused on sales to learn more about sales data. Analytics on this warehouse can help establish insights such as the best customer for the period. The data warehouse is subject oriented since it can be defined based on a subject matter.

A data warehouse is integrated. Data from various sources is first out into a consistent format. The process requires the firm to resolve some challenges, such as naming conflicts and inconsistencies on units of measure.

A data warehouse in nonvolatile. In effect, data entered into the warehouse should not change after it is stored. This feature increases accuracy and integrity in data warehousing.

A data warehouse is time variant since it focuses on data changes over time. Data warehousing discovers trends in business by using large amounts of historical data. In effect, a typical operation in a data warehouse scans millions of rows to return an output.

A data warehouse is designed and developed to handle ad hoc queries. In most cases, organizations may not predict the amount of workload of a data warehouse. Therefore, it is recommendable to optimize the data warehouse to perform optimally over any possible query operation.

A data warehouse is regularly updated by the ETL process using bulk data modification techniques. Therefore, end users cannot directly update the data warehouse.

Advantages of Data Warehousing

The primary motivation for developing a data warehouse is to provide timely information required for decision making in an organization. A business intelligence data warehouse serves as an initial checkpoint for crucial business data. When a company stores its data in a data warehouse, tracking it becomes natural. The technology allows users to perform quick searches to be able to retrieve and analyze static data.

Another driver for companies investing in data warehouses involves integrating data from disparate sources. This capability adds value to operational applications like customer relationship management systems. A well-integrated warehouse allows the solution to translate information to a more usable and straightforward format, making it easy for users to understand the business data.

The technology also allows organizations to perform a series of analysis on data.

A data warehouse reduces the cost to access historical data in an organization.

Data warehousing provides standardization of data across an organization. Moreover, it helps identify and eliminate errors. Before loading data, the solution shows inconsistencies to users and corrects them.

A data warehouse also improves the turnaround time for analysis and report generation.

The technology makes it easy for users to access and share data. A user can conduct a quick search on a data warehouse to find and analyze static data without wasting time.

Data warehousing removes informational processing load from transaction-oriented databases.

Disadvantages of Data Warehousing

While data warehousing technology is undoubtedly beneficial to many organizations, not all data warehouses are relevant to a business. In some cases, a data warehouse can be expensive to scale and maintain.

Preparing a data warehouse is time-consuming since it requires users to input raw data, which has to be achieved manually.

A data warehouse is not a perfect choice for handing unstructured and complex raw data. Moreover, it faces difficulties incompatibility. Depending on the data sources, companies may require a business intelligence team to ensure compatibility is achieved for data coming from sources running distinct operating systems and programs.

The technology requires a maintenance cost to continue working correctly. The solution needs to be updated with latest features that might be costly. Regularly maintaining a data warehouse will need a business to spend more on top of the initial investment.

A data warehouse use can be limited due to information privacy and confidentiality issues. In most cases, businesses collect and store sensitive data belonging to their clients. Viewing it is only allowed to individual employees, which limits the benefits offered by a data warehouse.

Data Warehousing Use Case

There are a series of ways organizations use data warehouses. Businesses can optimize the technology for performance by identifying the type of data warehouse they have.

  1. A data warehouses can be used by an organization that is struggling to report efficiently on business operations and activities. The solution makes it possible to access the required data
  2. A data warehouse is necessary for an organization where data is copied separately by different divisions for analysis in spreadsheets that are not consistent with one another.
  3. Data warehousing is crucial in organizations where uncertainties about data accuracy are causing executives to question the veracity of reports.
  4. A data warehouse is crucial for business intelligence acceleration. The technology delivers rapid data insights to analysts at different scales, concurrency, and without requiring manual tuning or optimization of a database.
Data Virtualization Information Architecture
Data Virtualization Information Architecture

Data Virtualization

Data virtualization technology does not require transfer or storage of data. Instead, users employ a combination of application programming interfaces (APIs) and metadata (data about data) to interface with data in different sources. Users use joined queries to gain access to the original data sources. In other words, data virtualization offers a simplified and integrated view to business data in real-time as requested by business users, applications, and analytics. In effect, the technology makes it possible to integrate data from distinct sources, formats, and locations, without replication. It creates a unified virtual data layer that delivers data services to support users and various business applications.

Data virtualization performs many of the same data integration functions, that is, extract, transform, and load, data replication, and federation. It leverages modern technology to deliver real-time data integration with agility, low cost, and high speed. In effect, data virtualization eliminates traditional data integration and reduces the need for replicated data warehouses and data marts in most cases.

Capabilities and Benefits of Data Virtualization

There are various benefits of implementing data virtualization in an organization.

Firstly, data virtualization allows access and leverage of all information that helps a firm achieve a competitive advantage. The solution offers a unified virtual layer that abstracts the underlying source complexity and presents disparate data sources as a single source.

Data virtualization is cheaper since it does not require actual hardware devices to be installed. In other words, organizations no longer need to purchase and dedicate a lot of IT resources and additional monetary investment to create on-site resources, similar to the one used in a data warehouse.

Data virtualization allows speedy deployment of resources. In this solution, resource provisioning is fast and straightforward. Organizations are not required to set up physical machines or to create local networks or install other IT components. Users have a single point of access to a virtual environment that can be distributed to the entire company.

Data virtualization is an energy-efficient system since the solution does not require additional local hardware and software. Therefore, an organization will not be required to install cooling systems.

Disadvantages of Data Virtualization

Data virtualization creates a security risk. In the modern world, having information is a cheap way to make money. In effect, company data is frequently targeted by hackers. Implementing data virtualization from disparate sources may give an opportunity to malicious users to steal critical information and use it for monetary gain.

Data virtualization requires a series of channels or links that must work in cohesion to perform the intended task. In this cases, all data sources should be available for virtualization to work effectively.

Data Virtualization Use Cases

  • Companies that rely on business intelligence require data virtualization for rapid prototyping to meet immediate business needs. Data virtualization can create a real-time reporting solution that unifies access to multiple internal databases.
  • Provisioning data services for single-view applications, such as in customer service and call center applications require data virtualization.

 

End Of Support For IBM InfoSphere 9.1.0

IBM Information Server (IIS)
IBM Information Server (IIS)

End of Support for IBM InfoSphere Information Server 9.1.0

IBM InfoSphere Information Server 9.1.0 will reach End of Support on 2018-09-30.  If you are still on the InfoSphere Information Server (IIS) 9.1.0, I hope you have a plan to migrate to an 11-series version soon.  InfoSphere Information Server (IIS) 11.7 would be worth considering if you don’t already own an 11-series license. InfoSphere Information Server (IIS) 11.7 will allow you to take advantage of the evolving thin client tools and other capabilities in the 2018 release pipeline without needing to perform another upgrade.

Related References

IBM Support, End of support notification: InfoSphere Information Server 9.1.0

IBM Support, Software lifecycle, InfoSphere Information Server 9.1.0

IBM Knowledge Center, Home, InfoSphere Information Server 11.7.0, IBM InfoSphere Information Server Version 11.7.0 documentation

Infosphere Information Server (IIS) – Where you can view DataStage and QualityStage Logs?

During the course of the week, the discussion happened regarding the different places where a person might read the DataStage and QualityStage logs in InfoSphere. I hadn’t really thought about it, but here are a few places that come to mind:

  • IBM InfoSphere DataStage and QualityStage Operations Console
  • IBM InfoSphere DataStage and QualityStage Director client
  • IBM InfoSphere DataStage and QualityStage Designer client by pressing Ctrl+L

Printable PDF Version of this Article

Related Reference

IBM Knowledge Center> InfoSphere Information Server 11.7.0 > InfoSphere DataStage and QualityStage > Monitoring jobs

IBM Knowledge Center > InfoSphere Information Server 11.7.0 > Installing > Troubleshooting software installation > Log files

Essbase Connector Error – Client Commands are Currently Not Being Accepted

DataStage Essbase Connector, Essbase Connector Error, Client Commands are Currently Not Being Accepted
DataStage Essbase Connector

While investigating a recent Infosphere Information Server (IIS), Datastage, Essbase Connect error I found the explanations of the probable causes of the error not to be terribly meaningful.  So, now that I have run our error to ground, I thought it might be nice to jot down a quick note of the potential cause of the ‘Client Commands are Currently Not Being Accepted’ error, which I gleaned from the process.

Error Message Id

  • IIS-CONN-ESSBASE-01010

Error Message

An error occurred while processing the request on the server. The error information is 1051544 (message on contacting or from application:[<<DateTimeStamp>>]Local////3544/Error(1013204) Client Commands are Currently Not Being Accepted.

Possible Causes of The Error

This Error is a problem with access to the Essbase object or accessing the security within the Essbase Object.  This can be a result of multiple issues, such as:

  • Object doesn’t exist – The Essbase object didn’t exist in the location specified,
  • Communications – the location is unavailable or cannot be reached,
  • Path Security – Security gets in the way to access the Essbase object location
  • Essbase Security – Security within the Essbase object does not support the user or filter being submitted. Also, the Essbase object security may be corrupted or incomplete.
  • Essbase Object Structure –  the Essbase object was not properly structured to support the filter or the Essbase filter is malformed for the current structure.

Related References

IBM Knowledge Center, InfoSphere Information Server 11.7.0, Connecting to data sources, Enterprise applications, IBM InfoSphere Information Server Pack for Hyperion Essbase

Printable PDF Version of This Article

 

Parallel jobs on Windows fail with APT_IOPort::readBlkVirt;error

APT_IOPort::readBlkVirt Error Screenshot
APT_IOPort::readBlkVirt Error Screenshot

This a known error for windows systems and applies to DataStage and DataQuality jobs using the any or all the three join type stages (Join, Merge, and Lookup).

Error Message

  • <<Link name>>,0: APT_IOPort::readBlkVirt: read for block header, partition 0, [fd 4], returned -1 with errno 10,054 (Unknown error)

Message ID

  • IIS-DSEE-TFIO-00223

Applies To

  • Windows systems only
  • Parallel Engine Jobs the three join type stages (Join, Merge, and Lookup). It does not apply to Server Engine jobs.
  • Infosphere Information Server (IIS), Datastage and DataQuality 9.1 and higher

The Fix

  • Add the APT_NO_IOCOMM_OPTIMIZATION in project administrator and set to blank or 0. I left it blank so it would not impact other jobs
  • Add the environment variable to the job producing the error and set to 1

What it APT_NO_IOCOMM_OPTIMIZATION Does

  • Sets the use of shared memory as the transport type, rather than using the default sockets transport type.
  • Note that in most cases sockets transport type is faster, so, you likely will not to set this across the project as the default for all job. It is best to apply it as necessary for problematic jobs.

Related References

InfoSphere DataStage and QualityStage, Version 9.1 Job Compatibility

IBM Support, JR54078: PARALLEL JOBS ON WINDOWS FAIL WITH APT_IOPORT::READBLKVIRT; ERROR

IBM Support, Information Server DataStage job fails with unknown error 10,054.

 

DataStage – How to Pass the Invocation ID from one Sequence to another

DataStage Invocation ID Passing Pattern Overview
DataStage Invocation ID Passing Pattern Overview

When you are controlling a chain of sequences in the job stream and taking advantage of reusable (multiple instances) jobs it is useful to be able to pass the Invocation ID from the master controlling sequence and have it passed down and assigned to the job run.  This can easily be done with needing to manual enter the values in each of the sequences, by leveraging the DSJobInvocationId variable.  For this to work:

  • The job must have ‘Allow Multiple Instance’ enabled
  • The Invocation Id must be provided in the Parent sequence must have the Invocation Name entered
  • The receiving child sequence will have the invocation variable entered
  • At runtime, a DataStage invocation id instance of the multi-instance job will generate with its own logs.

Variable Name

  • DSJobInvocationId

Note

This approach allows for the reuse of job and the assignment of meaningful instance extension names, which are managed for a single point of entry in the object tree.

Related References: 

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and QualityStage > Designing DataStage and QualityStage jobs > Building sequence jobs > Sequence job activities > Job Activity properties

DataStage – How to use single quoted parameter list in an Oracle Connector

Data Integration
Data Integration

While working with a client’s 9.1 DataStage version, I ran into a situation where they wanted to parameterize SQL where clause lists in an Oracle Connector stage, which honestly was not very straight forward to figure out.  First, if the APT_OSL_PARAM_ESC_SQUOTE is not set and single quotes are used in the parameter, the job creates unquoted invalid SQL when the parameter is populated.  Second, I found much of the information confusing and/or incomplete in its explanation.   After some research and some trial and error, here is how I resolved the issue.  I’ll endeavor to be concise, but holistic in my explanation.

When this Variable applies

This where I know this process applies, there may be other circumstances to which is this applicable, but I’m listing the ones here with which I have recent experience.

Infosphere Information Server Datastage

  • Versions 91, 11.3, and 11.5

Oracle RDBMS

  • Versions 11g and 12c

Configurations process

Here is a brief explanation of the steps I used to implement the where clause as a parameter.  Please note that in this example, I am using a job parameter to populate on a portion of the where clause, you can certainly pass the entire where clause as a parameter, if it is not too long.

Configure Project Variable in Administrator

  • Add APT_OSL_PARAM_ESC_SQUOTE to project in Administrator
  • Populate the APT_OSL_PARAM_ESC_SQUOTE Variable \
APT_OSL_PARAM_ESC_SQUOTE Project Variable
APT_OSL_PARAM_ESC_SQUOTE Project Variable

Create job parameter

Following your project name convention or standard practice, if you customer and/or project do not have established naming conventions, create the job parameter in the job. See jp_ItemSource parameter in the image below.

Job Parameter In Oracle Connector
Job Parameter In Oracle Connector

Add job parameter to Custom SQL in Select Oracle Connector Stage

On the Job parameter has been created, add the job parameter to the SQL statement of the job.

Job Parameter In SQL
Job Parameter In SQL

Related References

IBM Knowledge Center > InfoSphere Information Server 11.5.0

Connecting to data sources > Databases > Oracle databases > Oracle connector

IBM Support > Limitation of the Parameter APT_OSL_PARAM_ESC_SQUOTE on Plugins on Parallel Canvas

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and Quality > Stage > Reference > Parallel Job Reference > Environment Variables > Miscellaneous > APT_OSL_PARAM_ESC_SQUOTE