Big Data vs. Virtualization

Big Data Information Approaches

Big Data Information Approaches

Globally, organizations are facing challenges emanating from data issues, including data consolidation, value, heterogeneity, and quality. At the same time, they have to deal with the aspect of Big Data. In other words, consolidating, organizing, and realizing the value of data in an organization has been a challenge over the years. To overcome these challenges, a series of strategies have been devised. For instance, organizations are actively leveraging on methods such as Data Warehouses, Data Marts, and Data Stores to meet their data assets requirements. Unfortunately, the time and resources required to deliver value using these legacy methods is a distressing issue. In most cases, typical Data Warehouses applied for business intelligence (BI) rely on batch processing to consolidate and present data assets. This traditional approach is affected by the latency of information.

Big Data

As the name suggests, Big Data describes a large volume of data that can either be structured or unstructured. It originates from business processes among other sources. Presently, artificial intelligence, mobile technology, social media, and the Internet of Things (IoT) have become new sources of vast amounts of data. In Big Data, the organization and consolidation matter more than the volume of the data. Ultimately, big data can be analyzed to generate insights that can be crucial in strategic decision making for a business.

Features of Big Data

The term Big Data is relatively new. However, the process of collecting and preserving vast amounts of information for different purposes has been there for decades. Big Data gained momentum recently with the three V’s features that include volume, velocity, and variety.

Volume: First, businesses gather information from a set of sources, such as social media, day-to-day operations, machine to machine data, weblogs, sensors, and so on. Traditionally, storing the data was a challenge. However, the requirement has been made possible by new technologies such as Hadoop.

Velocity: Another defining nature of Big Data is that it flows at an unprecedented rate that requires real-time processing. Organizations are gathering information from RFID tags, sensors, and other objects that need timely processing of data torrents.

Variety: In modern enterprises, information comes in different formats. For instance, a firm can gather numeric and structured data from traditional databases as well as unstructured emails, video, audio, business transactions, and texts.

Complexity: As mentioned above, Big Data comes from diverse sources and in varying formats. In effect, it becomes a challenge to consolidate, match, link, cleanse, or modify this data across an organizational system. Unfortunately, Big Data opportunities can only be explored when an organization successfully correlates relationships and connects multiple data sets to prevent it from spiraling out of control.

Variability: Big Data can have inconsistent flows within periodic peaks. For instance, in social media, a topic can be trending, which can tremendously increase collected data. Variability is also common while dealing with unstructured data.

Big Data Potential and Importance

The vast amount of data collected and preserved on a global scale will keep growing. This fact implies that there is more potential to generate crucial insights from this information. Unfortunately, due to various issues, only a small fraction of this data actually gets analyzed. There is a significant and untapped potential that businesses can explore to make proper and beneficial use of this information.

Analyzing Big Data allows businesses to make timely and effective decisions using raw data. In reality, organizations can gather data from diverse sources and process it to develop insights that can aid in reducing operational costs, production time, innovating new products, and making smarter decisions. Such benefits can be achieved when enterprises combine Big Data with analytic techniques, such as text analytics, predictive analytics, machine learning, natural language processing, data mining and so on.

Big Data Application Areas

Practically, Big Data can be used in nearly all industries. In the financial sector, a significant amount of data is gathered from diverse sources, which requires banks and insurance companies to innovate ways to manage Big Data. This industry aims at understanding and satisfying their customers while meeting regulatory compliance and preventing fraud. In effect, banks can exploit Big Data using advanced analytics to generate insights required to make smart decisions.

In the education sector, Big Data can be employed to make vital improvements on school systems, quality of education and curriculums. For instance, Big Data can be analyzed to assess students’ progress and to design support systems for professors and tutors.

Healthcare providers, on the other hand, collect patients’ records and design various treatment plans. In the healthcare sector, practitioners and service providers are required to offer accurate and timely treatment that is transparent to meet the stringent regulations in the industry and to enhance the quality of life. In this case, Big Data can be managed to uncover insights that can be used to improve the quality of service.

Governments and different authorities can apply analytics to Big Data to create the understanding required to manage social utilities and to develop solutions necessary to solve common problems, such as city congestion, crime, and drug use. However, governments must also consider other issues such as privacy and confidentiality while dealing with Big Data.

In manufacturing and processing, Big Data offers insights that stakeholders can use to efficiently use raw materials to output quality products. Manufacturers can perform analytics on big data to generate ideas that can be used to increase market share, enhance safety, minimize wastage, and solve other challenges faster.

In the retail sector, companies rely heavily on customer loyalty to maintain market share in a highly competitive market. In this case, managing big data can help retailers to understand the best methods to utilize in marketing their products to existing and potential consumers, and also to sustain relationships.

Challenges Handling Big Data

With the introduction of Big Data, the challenge of consolidating and creating value on data assets becomes magnified. Today, organizations are expected to handle increased data velocity, variety, and volume. It is now a business necessity to deal with traditional enterprise data and Big Data. Traditional relational databases are suitable for storing, processing, and managing low-latency data. Big Data has increased volume, variety, and velocity, making it difficult for legacy database systems to efficiently handle it.

Failing to act on this challenge implies that enterprises cannot tap the opportunities presented by data generated from diverse sources, such as machine sensors, weblogs, social media, and so on. On the contrary, organizations that will explore Big Data capabilities amidst its challenges will remain competitive. It is necessary for businesses to integrate diverse systems with Big Data platforms in a meaningful manner, as heterogeneity of data environments continue to increase.

Virtualization

Virtualization involves turning physical computing resources, such as databases and servers into multiple systems. The concept consists of making the function of an IT resource simulated in software, making it identical to the corresponding physical object. Virtualization technique uses abstraction to create a software application to appear and operate like hardware to provide a series of benefits ranging from flexibility, scalability, performance, and reliability.

Typically, virtualization is made possible using virtual machines (VMs) implemented in microprocessors with necessary hardware support and OS-level implementations to enhance computational productivity. VMs offers additional convenience, security, and integrity with little resource overhead.

Benefits of Virtualization

Achieving the economics of wide-scale functional virtualization using available technologies is easy to improve reliability by employing virtualization offered by cloud service providers on fully redundant and standby basis. Traditionally, organizations would deploy several services to operate at a fraction of their capacity to meet increased processing and storage demands. These requirements resulted in increased operating costs and inefficiencies. With the introduction of virtualization, the software can be used to simulate functionalities of hardware. In effect, businesses can outstandingly eliminate the possibility of system failures. At the same time, the technology significantly reduces capital expense components of IT budgets. In future, more resources will be spent on operating, than acquisition expenses. Company funds will be channeled to service providers instead of purchasing expensive equipment and hiring local personnel.

Overall, virtualization enables IT functions across business divisions and industries to be performed more efficiently, flexibly, inexpensively, and productively. The technology meaningfully eliminates expensive traditional implementations.

Apart from reducing capital and operating costs for organizations, virtualization minimizes and eliminates downtime. It also increases IT productivity, responsiveness, and agility. The technology provides faster provisioning of resources and applications. In case of incidents, virtualization allows fast disaster recovery that maintains business continuity.

Types of Virtualization

There are various types of virtualization, such as a server, network, and desktop virtualization.

In server virtualization, more than one operating system runs on a single physical server to increase IT efficiency, reduce costs, achieve timely workload deployment, improve availability and enhance performance.

Network virtualization involves reproducing a physical network to allow applications to run on a virtual system. This type of virtualization provides operational benefits and hardware independence.

In desktop virtualization, desktops and applications are virtualized and delivered to different divisions and branches in a company. Desktop virtualization supports outsourced, offshore, and mobile workers who can access simulate desktop on tablets and iPads.

Characteristics of Virtualization

Some of the features of virtualization that support the efficiency and performance of the technology include:

Partitioning: In virtualization, several applications, database systems, and operating systems are supported by a single physical system since the technology allows partitioning of limited IT resources.

Isolation: Virtual machines can be isolated from the physical systems hosting them. In effect, if a single virtual instance breaks down, the other machine, as well as the host hardware components, will not be affected.

Encapsulation: A virtual machine can be presented as a single file while abstracting other features. This makes it possible for users to identify the VM based on a role it plays.

Data Virtualization – A Solution for Big Data Challenges

Virtualization can be viewed as a strategy that helps derive information value when needed. The technology can be used to add a level of efficiency that makes big data applications a reality. To enjoy the benefits of big data, organizations need to abstract data from different reinforcements. In other words, virtualization can be deployed to provide partitioning, encapsulation, and isolation that abstracts the complexities of Big Data stores to make it easy to integrate data from multiple stores with other data from systems used in an enterprise.

Virtualization enables ease of access to Big Data. The two technologies can be combined and configured using the software. As a result, the approach makes it possible to present an extensive collection of disassociated and structured and unstructured data ranging from application and weblogs, operating system configuration, network flows, security events, to storage metrics.

Virtualization improves storage and analysis capabilities on Big Data. As mentioned earlier, the current traditional relational databases are incapable of addressing growing needs inherent to Big Data. Today, there is an increase in special purpose applications for processing varied and unstructured big data. The tools can be used to extract value from Big Data efficiently while minimizing unnecessary data replication. Virtualization tools also make it possible for enterprises to access numerous data sources by integrating them with legacy relational data centers, data warehouses, and other files that can be used in business intelligence. Ultimately, companies can deploy virtualization to achieve a reliable way to handle complexity, volume, and heterogeneity of information collected from diverse sources. The integrated solutions will also meet other business needs for near-real-time information processing and agility.

In conclusion, it is evident that the value of Big Data comes from processing information gathered from diverse sources in an enterprise. Virtualizing big data offers numerous benefits that cannot be realized while using physical infrastructure and traditional database systems. It provides simplification of Big Data infrastructure that reduces operational costs and time to results. Shortly, Big Data use cares will shift from theoretical possibilities to multiple use patterns that feature powerful analytics and affordable archival of vast datasets. Virtualization will be crucial in exploiting Big Data presented as abstracted data services.

 

Netezza / Puredata – How to replace or trim CHAR(0) is NULL characters in a field

PureData Powered by Netezza

PureData Powered by Netezza

Occasionally, one runs into the problem of hidden field values breaking join criteria.  I have had to clean up bad archive and conversion data with hidden characters serval times over the last couple of weeks, so, I thought I might as well capture this note for future use.

I tried the Replace command which is prevalent for Netezza answers to this issue on the web, but my client’s version does not support that command.  So, I needed to use the Translate command instead to accomplish it.  It took a couple of searches of the usual bad actors to find the character causing the issue, which on this day was chr(0).  Here is a quick mockup of the command I used to solve this issue.

Example Select Statement

Here is a quick example select SQL to identify problem rows.

SELECT TRANSLATE(F.BLOGTYPE_CODE, CHR(0), ”) AS BLOGTYPE_CODE, BT.BLOG_TYP_ID, LENGTH(BT.BLOG_TYP_ID) AS LNGTH_BT, LENGTH(F.BLOGTYPE_CODE) AS LNGTH_ BLOGTYPE

FROM  BLOGS_TBL F,  BLOG_TYPES BT WHERE TRANSLATE(F.BLOGTYPE_CODE, CHR(0), ”) =  BT.BLOG_TYP_ID AND LENGTH(BT.BLOG_TYP_ID) <>Length(LENGTH(F.BLOGTYPE_CODE) ;

 

Example Update Statement

Here is a quick shell update statement to remove the Char(0) characters from the problem field.

Update <<Your Table Name>> A

Set A.<<Your Field Name>> = TRANSLATE(A.<<Your FieldName>>, CHR(0), ”)

where length(A.<<Your Field Name>>) <> Length(A.<<Your FieldName>>) And << Additional criteria>>;

 

 

 

Data Warehousing vs. Data Virtualization

Information Management

Information Management

Today, a business heavily depends on data to gain insights into their processes and operations and to develop new ways to increase market share and profits. In most cases, data required to generate the insights are sourced and located in diverse places, which requires reliable access mechanism. Currently, data warehousing and data virtualization are two principal techniques used to store and access the sources of critical data in a company. Each approach offers various capabilities and can be deployed for particular use cases as described in this article.

Data Warehousing

A data warehouse is designed and developed to secure host historical data from different sources. In effect, this technique protects data sources from performance degradation caused by the impact of sophisticated analytics and enormous demands for reports. Today, various tools and platforms have been developed for data warehouse automation in companies. They can be deployed to quicken development, automate testing, maintenance, and other steps involved in data warehousing. In a data warehouse, data is stored as a series of snapshots, where a record represents data at a particular time. In effect, companies can analyze data warehouse snapshots to compare data between different periods. The results are converted into insights required to make crucial business decisions.

Moreover, a data warehouse is optimized for other functions, such as data retrieval. The technology duplicates data to allow database de-normalization that enhances query performance. The solution is further deployed to create an enterprise data warehouse (EDW) used to service the entire organization.

Data Warehouse Information Architecture

Data Warehouse Information Architecture

Features of a Data Warehouse

A data warehouse is subject-oriented, and it is designed to help entities analyze data. For instance, a company can start a data warehouse focused on sales to learn more about sales data. Analytics on this warehouse can help establish insights such as the best customer for the period. The data warehouse is subject oriented since it can be defined based on a subject matter.

A data warehouse is integrated. Data from various sources is first out into a consistent format. The process requires the firm to resolve some challenges, such as naming conflicts and inconsistencies on units of measure.

A data warehouse in nonvolatile. In effect, data entered into the warehouse should not change after it is stored. This feature increases accuracy and integrity in data warehousing.

A data warehouse is time variant since it focuses on data changes over time. Data warehousing discovers trends in business by using large amounts of historical data. In effect, a typical operation in a data warehouse scans millions of rows to return an output.

A data warehouse is designed and developed to handle ad hoc queries. In most cases, organizations may not predict the amount of workload of a data warehouse. Therefore, it is recommendable to optimize the data warehouse to perform optimally over any possible query operation.

A data warehouse is regularly updated by the ETL process using bulk data modification techniques. Therefore, end users cannot directly update the data warehouse.

Advantages of Data Warehousing

The primary motivation for developing a data warehouse is to provide timely information required for decision making in an organization. A business intelligence data warehouse serves as an initial checkpoint for crucial business data. When a company stores its data in a data warehouse, tracking it becomes natural. The technology allows users to perform quick searches to be able to retrieve and analyze static data.

Another driver for companies investing in data warehouses involves integrating data from disparate sources. This capability adds value to operational applications like customer relationship management systems. A well-integrated warehouse allows the solution to translate information to a more usable and straightforward format, making it easy for users to understand the business data.

The technology also allows organizations to perform a series of analysis on data.

A data warehouse reduces the cost to access historical data in an organization.

Data warehousing provides standardization of data across an organization. Moreover, it helps identify and eliminate errors. Before loading data, the solution shows inconsistencies to users and corrects them.

A data warehouse also improves the turnaround time for analysis and report generation.

The technology makes it easy for users to access and share data. A user can conduct a quick search on a data warehouse to find and analyze static data without wasting time.

Data warehousing removes informational processing load from transaction-oriented databases.

Disadvantages of Data Warehousing

While data warehousing technology is undoubtedly beneficial to many organizations, not all data warehouses are relevant to a business. In some cases, a data warehouse can be expensive to scale and maintain.

Preparing a data warehouse is time-consuming since it requires users to input raw data, which has to be achieved manually.

A data warehouse is not a perfect choice for handing unstructured and complex raw data. Moreover, it faces difficulties incompatibility. Depending on the data sources, companies may require a business intelligence team to ensure compatibility is achieved for data coming from sources running distinct operating systems and programs.

The technology requires a maintenance cost to continue working correctly. The solution needs to be updated with latest features that might be costly. Regularly maintaining a data warehouse will need a business to spend more on top of the initial investment.

A data warehouse use can be limited due to information privacy and confidentiality issues. In most cases, businesses collect and store sensitive data belonging to their clients. Viewing it is only allowed to individual employees, which limits the benefits offered by a data warehouse.

Data Warehousing Use Case

There are a series of ways organizations use data warehouses. Businesses can optimize the technology for performance by identifying the type of data warehouse they have.

  1. A data warehouses can be used by an organization that is struggling to report efficiently on business operations and activities. The solution makes it possible to access the required data
  2. A data warehouse is necessary for an organization where data is copied separately by different divisions for analysis in spreadsheets that are not consistent with one another.
  3. Data warehousing is crucial in organizations where uncertainties about data accuracy are causing executives to question the veracity of reports.
  4. A data warehouse is crucial for business intelligence acceleration. The technology delivers rapid data insights to analysts at different scales, concurrency, and without requiring manual tuning or optimization of a database.
Data Virtualization Information Architecture

Data Virtualization Information Architecture

Data Virtualization

Data virtualization technology does not require transfer or storage of data. Instead, users employ a combination of application programming interfaces (APIs) and metadata (data about data) to interface with data in different sources. Users use joined queries to gain access to the original data sources. In other words, data virtualization offers a simplified and integrated view to business data in real-time as requested by business users, applications, and analytics. In effect, the technology makes it possible to integrate data from distinct sources, formats, and locations, without replication. It creates a unified virtual data layer that delivers data services to support users and various business applications.

Data virtualization performs many of the same data integration functions, that is, extract, transform, and load, data replication, and federation. It leverages modern technology to deliver real-time data integration with agility, low cost, and high speed. In effect, data virtualization eliminates traditional data integration and reduces the need for replicated data warehouses and data marts in most cases.

Capabilities and Benefits of Data Virtualization

There are various benefits of implementing data virtualization in an organization.

Firstly, data virtualization allows access and leverage of all information that helps a firm achieve a competitive advantage. The solution offers a unified virtual layer that abstracts the underlying source complexity and presents disparate data sources as a single source.

Data virtualization is cheaper since it does not require actual hardware devices to be installed. In other words, organizations no longer need to purchase and dedicate a lot of IT resources and additional monetary investment to create on-site resources, similar to the one used in a data warehouse.

Data virtualization allows speedy deployment of resources. In this solution, resource provisioning is fast and straightforward. Organizations are not required to set up physical machines or to create local networks or install other IT components. Users have a single point of access to a virtual environment that can be distributed to the entire company.

Data virtualization is an energy-efficient system since the solution does not require additional local hardware and software. Therefore, an organization will not be required to install cooling systems.

Disadvantages of Data Virtualization

Data virtualization creates a security risk. In the modern world, having information is a cheap way to make money. In effect, company data is frequently targeted by hackers. Implementing data virtualization from disparate sources may give an opportunity to malicious users to steal critical information and use it for monetary gain.

Data virtualization requires a series of channels or links that must work in cohesion to perform the intended task. In this cases, all data sources should be available for virtualization to work effectively.

Data Virtualization Use Cases

  • Companies that rely on business intelligence require data virtualization for rapid prototyping to meet immediate business needs. Data virtualization can create a real-time reporting solution that unifies access to multiple internal databases.
  • Provisioning data services for single-view applications, such as in customer service and call center applications require data virtualization.

 

End Of Support For IBM InfoSphere 9.1.0

IBM Information Server (IIS)

IBM Information Server (IIS)

End of Support for IBM InfoSphere Information Server 9.1.0

IBM InfoSphere Information Server 9.1.0 will reach End of Support on 2018-09-30.  If you are still on the InfoSphere Information Server (IIS) 9.1.0, I hope you have a plan to migrate to an 11-series version soon.  InfoSphere Information Server (IIS) 11.7 would be worth considering if you don’t already own an 11-series license. InfoSphere Information Server (IIS) 11.7 will allow you to take advantage of the evolving thin client tools and other capabilities in the 2018 release pipeline without needing to perform another upgrade.

Related References

IBM Support, End of support notification: InfoSphere Information Server 9.1.0

IBM Support, Software lifecycle, InfoSphere Information Server 9.1.0

IBM Knowledge Center, Home, InfoSphere Information Server 11.7.0, IBM InfoSphere Information Server Version 11.7.0 documentation

Essbase Connector Error – Client Commands are Currently Not Being Accepted

DataStage Essbase Connector, Essbase Connector Error, Client Commands are Currently Not Being Accepted

DataStage Essbase Connector

While investigating a recent Infosphere Information Server (IIS), Datastage, Essbase Connect error I found the explanations of the probable causes of the error not to be terribly meaningful.  So, now that I have run our error to ground, I thought it might be nice to jot down a quick note of the potential cause of the ‘Client Commands are Currently Not Being Accepted’ error, which I gleaned from the process.

Error Message Id

  • IIS-CONN-ESSBASE-01010

Error Message

An error occurred while processing the request on the server. The error information is 1051544 (message on contacting or from application:[<<DateTimeStamp>>]Local////3544/Error(1013204) Client Commands are Currently Not Being Accepted.

Possible Causes of The Error

This Error is a problem with access to the Essbase object or accessing the security within the Essbase Object.  This can be a result of multiple issues, such as:

  • Object doesn’t exist – The Essbase object didn’t exist in the location specified,
  • Communications – the location is unavailable or cannot be reached,
  • Path Security – Security gets in the way to access the Essbase object location
  • Essbase Security – Security within the Essbase object does not support the user or filter being submitted. Also, the Essbase object security may be corrupted or incomplete.
  • Essbase Object Structure –  the Essbase object was not properly structured to support the filter or the Essbase filter is malformed for the current structure.

Related References

IBM Knowledge Center, InfoSphere Information Server 11.7.0, Connecting to data sources, Enterprise applications, IBM InfoSphere Information Server Pack for Hyperion Essbase

Printable PDF Version of This Article

 

DataStage – How to Pass the Invocation ID from one Sequence to another

DataStage Invocation ID Passing Pattern Overview

DataStage Invocation ID Passing Pattern Overview

When you are controlling a chain of sequences in the job stream and taking advantage of reusable (multiple instances) jobs it is useful to be able to pass the Invocation ID from the master controlling sequence and have it passed down and assigned to the job run.  This can easily be done with needing to manual enter the values in each of the sequences, by leveraging the DSJobInvocationId variable.  For this to work:

  • The job must have ‘Allow Multiple Instance’ enabled
  • The Invocation Id must be provided in the Parent sequence must have the Invocation Name entered
  • The receiving child sequence will have the invocation variable entered
  • At runtime, a DataStage invocation id instance of the multi-instance job will generate with its own logs.

Variable Name

  • DSJobInvocationId

Note

This approach allows for the reuse of job and the assignment of meaningful instance extension names, which are managed for a single point of entry in the object tree.

Related References: 

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and QualityStage > Designing DataStage and QualityStage jobs > Building sequence jobs > Sequence job activities > Job Activity properties

Netezza / PureData – How To Quote a Single Quote in Netezza SQL

How To Quote a Single Quote in Netezza SQL?

The short answer is to use four single quotes (””), which will result in a single quote within the select statement results.

How to Assemble the SQL to Quote a Single Quote in a SQL Select Statement

Knowing how to construct a list to embed in a SQL where clause ‘in’ list or to add to an ETL job can be a serious time saver eliminating the need to manually edit large lists.  In the example below, I used the Select result set to create a rather long list of values, which needed to be included in an ELT where clause.  By:

  • Adding the comma delimiter (‘,’) and a Concatenate (||) on the front
  • Followed by adding a quoted single Quote (entered as four single quotes (””)) and a Concatenate (||)
  • The Field I which to have delaminated and Quoted (S1.ORDER_NUM)
  • And closed with a quoted single Quote (entered as four single quotes (””))

This results in a delimited and quoted list ( ,’116490856′) which needs only to have the parentheses added and the first comma removed, which is much less work than manually editing the 200 item that resulted from this select.

Example SQL:

SELECT Distinct

‘,’||””|| S1.ORDER_NUM||”” as Quoted_Order_Number

FROM Sales S1

 

How to Quote A Single Quote Example SQL

How to Quote A Single Quote Example SQL

Related Reference

DataStage – How to use single quoted parameter list in an Oracle Connector

Data Integration

Data Integration

While working with a client’s 9.1 DataStage version, I ran into a situation where they wanted to parameterize SQL where clause lists in an Oracle Connector stage, which honestly was not very straight forward to figure out.  First, if the APT_OSL_PARAM_ESC_SQUOTE is not set and single quotes are used in the parameter, the job creates unquoted invalid SQL when the parameter is populated.  Second, I found much of the information confusing and/or incomplete in its explanation.   After some research and some trial and error, here is how I resolved the issue.  I’ll endeavor to be concise, but holistic in my explanation.

When this Variable applies

This where I know this process applies, there may be other circumstances to which is this applicable, but I’m listing the ones here with which I have recent experience.

Infosphere Information Server Datastage

  • Versions 91, 11.3, and 11.5

Oracle RDBMS

  • Versions 11g and 12c

Configurations process

Here is a brief explanation of the steps I used to implement the where clause as a parameter.  Please note that in this example, I am using a job parameter to populate on a portion of the where clause, you can certainly pass the entire where clause as a parameter, if it is not too long.

Configure Project Variable in Administrator

  • Add APT_OSL_PARAM_ESC_SQUOTE to project in Administrator
  • Populate the APT_OSL_PARAM_ESC_SQUOTE Variable \
APT_OSL_PARAM_ESC_SQUOTE Project Variable

APT_OSL_PARAM_ESC_SQUOTE Project Variable

Create job parameter

Following your project name convention or standard practice, if you customer and/or project do not have established naming conventions, create the job parameter in the job. See jp_ItemSource parameter in the image below.

Job Parameter In Oracle Connector

Job Parameter In Oracle Connector

Add job parameter to Custom SQL in Select Oracle Connector Stage

On the Job parameter has been created, add the job parameter to the SQL statement of the job.

Job Parameter In SQL

Job Parameter In SQL

Related References

IBM Knowledge Center > InfoSphere Information Server 11.5.0

Connecting to data sources > Databases > Oracle databases > Oracle connector

IBM Support > Limitation of the Parameter APT_OSL_PARAM_ESC_SQUOTE on Plugins on Parallel Canvas

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and Quality > Stage > Reference > Parallel Job Reference > Environment Variables > Miscellaneous > APT_OSL_PARAM_ESC_SQUOTE

 

OLTP vs Data Warehousing

Database, OLTP vs Data Warehousing

Database

OLTP Versus Data Warehousing

I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager, and another. You get the picture.
In this article I will explain this very same thing to you. So you know  how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.

OLTP

OLTP stands for OLine Transactional Processing. With other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnoses and a treatment plan in there as well. Doctors, nurses and other people working with patients use this system on a daily basis to enter and get detailed information on their patients.
The way the data is stored within operational systems is so the data can be used efficiently by the people working directly on the product, or with the patient in this case.

Data Warehousing

A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day to day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnoses that takes more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.

The difference between OLTP and Data Warehousing

This is how a Data Warehouse works:

How a Data Warehouse works

How a Data Warehouse works

The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transforms the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that reports are formed with a reporting tool, from the data that lies in the Data Warehouse.

This is how OLTP works:

How OLTP works

How OLTP works

Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports form the operational databases.

Pro’s and Con’s

Data Warehousing

Pro’s:

  • There is no strain on the operational systems during business hours
    • As you can schedule the ETL processes to run during the hours the least amount of people are using the operational system, you won’t disturb the operational processes. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
  • Data from different systems can be combined
    • It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
  • Data is optimized for making queries and reports
    • You use different data in reports than you use on a day to day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month and year is saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
  • Data is saved longer than in the source systems
    • The source systems need to have their old records deleted when they are no longer used in the day to day operations. So they get deleted to gain performance.

Con’s:

  • You always look at the past
    • A Data Warehouse is updated once a night, or even just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients are in the hospital are right now. Or what surgeon didn’t show up on time this morning.
  • You don’t have all the data
    • A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
  • Data isn’t the same as the data in the source systems
    • Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example: the Data Warehouse at the hospital excluded all transactions that were marked as cancelled. If you try to get the same reports from both systems, and don’t exclude the cancelled transactions in the source system, you’ll get different results.

online transactional processing (OLTP)

Pro’s

  • You get real time data
    • If someone is entering a new record now, you’ll see it right away in your report. No delays.
  • You’ve got all the details
    • You have access to all the details that the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.

Con’s

  • You are putting strain on an application during business hours.
    • When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day to day operations. And if you make an error, by for instance forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
  • You can’t compare the data with data from other sources.
    • Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data is always going to be different because it is granulated on a different level, or not all data is relevant for both systems.
  • You don’t have access to old data
    • To keep the applications at peak performance, old data, that’s irrelevant to day to day operations is deleted.
  • Data is optimized to suit day to day operations
    • And not for report making. This means you’ll have to get creative with your queries to get the data you need.

So what method should you use?

That all depends on what you need at that moment. If you need detailed information about things that are happening now, you should use OLTP.
If you are looking for trends, or insights on a higher level, you should use a Data Warehouse.

 Related References

 

 

Oracle – How to get a list of user permission grants

IBM Infosphere Information Server (IIS), Oracle – How to get a list of user permission grants

IBM Infosphere Information Server (IIS)

Since the Infosphere, information server, repository, has to be installed manually with the scripts provided in the IBM software, sometimes you run into difficulties. So, here’s a quick script, which I have found useful in the past to identify user permissions for the IAUSER on Oracle database’s to help rundown discrepancies in user permissions.

 

SELECT *

FROM ALL_TAB_PRIVS

WHERE  GRANTEE = ‘iauser’

 

If we cannot run against the ALL_TAB_PRIVS view, then we can try the ALL_TAB_PRIVS view:

 

SELECT *

FROM USER_TAB_PRIVS

WHERE  GRANTEE = ‘iauser’

 

Related References

oracle help Center > Database Reference > ALL_TAB_PRIVS view

What is a Common Data Model (CDM)?

Data Model, Common Data Model, CDM, What is a Common Data Model (CDM)

Data Model

 

What is a Common Data Model (CDM)?

 

A Common Data Model (CDM) is a share data structure designed to provide well-formed and standardized data structures within an industry (e.g. medical, Insurance, etc.) or business channel (e.g. Human resource management, Asset Management, etc.), which can be applied to provide organizations a consistent unified view of business information.   These common models can be leveraged as accelerators by organizations form the foundation for their information, including SOA interchanges, Mashup, data vitalization, Enterprise Data Model (EDM), business intelligence (BI), and/or to standardize their data models to improve meta data management and data integration practices.

Related references

IBM, IBM Analytics

IBM Analytics, Technology, Database Management, Data Warehousing, Industry Models

github.com

Observational Health Data Sciences and Informatics (OHDSI)/Common Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Utilities Data Model

Oracle

Industries, Communications, Service Providers, Products, Data Mode, Oracle Communications Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Airline data Model

 

What is Information Management?

Information Management (IM)

Information Management (IM)

Information Management Definition

Information Management (IM) tends to vary a based on your business perspective, but is all the systems, processes, practice (business and technical) within organizations for the creation, use, and disposal of business information to support business operations.

Information Management (IM) Activities

Information Management activities may include, but are not be limited to:

  • Information creation, capture, storage, and disposal
  • The governance of information, practices, meaning and usage
  • Information protection, Regulatory compliance, privacy, and limiting legal liability
  • Technological infrastructure, such as, architecture, strategies and delivery enablement

Related References

 

Netezza / PureData – How to convert a timestamp to date in SQL

Netezza Convert Timestamp to Date

Netezza Convert Timestamp to Date

 

Since timestamps are integer, which are numbers, the date function can easily convert a timestamp to a date in Netezza SQL.

Date Function Arguments, per IBM Documentation

First argument data type Second argument data type Returns
date numeric(8,0) date

Example Timestamp to Date Conversion SQL

In this example CREATEDATE is stored as a timestamp, despite the field name.

SELECT

DATE(CREATEDDATE)

, COUNT(*) as CNT

FROM Blog.CASES_FS

GROUP BY

date(CREATEDDATE)

ORDER BY

date(CREATEDDATE);

Related References

Date/time functions

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Date/time functions

Netezza date/time data type representations

PureData System for Analytics, PureData System for Analytics 7.2.0, IBM Netezza User-Defined Functions, Data type helper API reference, Temporal data type helper functions, Netezza date/time data type representations

SFDC – Using a date literal in a where clause

Salesforce Connector

I found working with date literal, when working with the Infosphere SFDC Connector soql, to be counterintuitive for me.  At least as I, normally, as I use SQL.  I spent a little time running trials in Workbench, before I finally locked on to the ‘where clause’ criteria data pattern.  So, here a quick example.

SOQL DATE String Literals Where Clause Rules

Basically, the date pattern is straight forward. The basic rules are for a soql where clause:

  • No quotes
  • No functions
  • No Casting function, or casting for the where soql where clause to read.

Example SOQL DATE String Literals

So, here are a couple of date string literal examples in SQL:

  • 1901-01-01
  • 2016-01-31
  • 9999-10-31

Example SQL with Date String Literal Where Clause

 

Select

t.id,

t.Name,

t.Target_Date__c,

t.User_Active__c

From Target_and_Segmentation__c t

where t.Target_Date__c > 2014-10-31

 

Related References

Salesforce Developer Documentation

Home, Developer Documentation, Force.com SOQL and SOSL Reference

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_dateformats.htm

Salesforce Workbench

Home, Technical Library, Workbench

 

InfoSphere / Datastage – What are The support Connectors stages for dashDB?

dashDB

dashDB

In a recent discussion, the question came up concern which Infosphere Datastage connectors and/or stages are supported by IBM for dashDB.  So, it seems appropriate to share the insight gained from the question being answered.

What Datastage Connectors and/or stages are Supported for dashDB

You have three choices as to connectors, which may best meet you your needs based on the nature of your environment and the configuration chooses which have been applied:

  1. The DB2 Connector Stage
  2. The JDBC Connector stage
  3. The ODBC Stage

Related References

Connecting to IBM dashDB

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Information Server on Cloud offerings, Connecting to other systems, Connecting to IBM dashDB

DB2 connector

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, IBM DB2 databases, DB2 connector

ODBC stage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Older stages for connectivity, ODBC stage

JDBC data sources

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Multiple data sources, JDBC data sources

Netezza / PureData – How to Number for day of week in SQL?

Netezza / PureData - Numeric Day of Week

Netezza / PureData – Numeric Day of Week

I had reason today to get the number of the day of the week, in PureData / Netezza, which I don’t seem to have discussed in previous posts.  So, here is a simple script to get the number for the day of week with a couple of flavors, which may prove useful.

Basic Format

select extract(dow from <<FieldName>>) from <<SchemaName>>.<<tableName>>

Example SQL

 

SELECT

CURRENT_DATE

,  TO_CHAR(CURRENT_DATE,’DAY’) AS DAY_OF_WEEK

—WEEK STARTS ON MONDAY

,  EXTRACT(DOW FROM CURRENT_DATE)-1 AS DAY_OF_WEEK_NUMBER_STARTS_ON_MONDAY

—WEEK STARTS ON SUNDAY

,  EXTRACT(DOW FROM CURRENT_DATE) AS DAY_OF_WEEK_NUMBER_STARTS_ON_SUNDAY

—WEEK STARTS ON SATURDAY

,  EXTRACT(DOW FROM CURRENT_DATE)+1 AS DAY_OF_WEEK_NUMBER_STARTS_ON_SATURDAY

FROM _V_DUAL;

 

Related References

Extract date and time values

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Functions and operators, Functions, Extract date and time values

Date/time functions

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Date/time functions

Template patterns for date/time conversions

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Conversion functions, Template patterns for date/time conversions

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html

Database – What is a Primary Key?

Database Table

Database Table

What is a primary Key?

What a primary key is depends, somewhat, on the database.  However, in its simplest form a primary key:

  • Is a field (Column) or combination of Fields (columns) which uniquely identifies every row.
  • Is an index in database systems which use indexes for optimization
  • Is a type of table constraint
  • Is applied with a data definition language (DDL) alter command
  • And, depending on the data model can, define parent-Child relationship between tables

Related References

What is the convert function in Datastage?

Algorithm

Algorithm

 

What is the convert function in Datastage?

In its simplest form, the convert function in Infosphere DataStage is a string replacement operation.  Convert can be used to replace a specific character, a list of characters, or a unicode character (e.g. thumbs Up Sign or Grinning Face).

Convert Syntax

convert(‘<<Value to be replaced’,'<<Replacement value >>’,<<Input field>>)

Using the Convert Function to remove a list of Characters

Special Characters in DataStage Handles/converts special characters in a transformer stage, which can cause issues in XML processing and certain databases.

Convert a list of General Characters

Convert(“;:?\+&,*`#’$()|^~@{}[]%!”,”, TrimLeadingTrailing(Lnk_In.Description))

Convert Decimal and Double Quotes

Convert(‘ ” . ‘,”, Lnk_In.Description)

Convert Char(0)

This example replaces Char(0) with nothing essentially removing it as padding and/or space.

convert(char(0),”,Lnk_In.Description)

 

Related References

String functions

InfoSphere Information Server, InfoSphere Information Server 11.5.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Parallel transform functions, String functions

Data Modeling – Fact Table Effective Practices

Database Table

Database Table

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Related References