You are currently browsing the category archive for the ‘Predictive Analytics’ category.

The emerging Internet of Things (IoT) extends digital connectivity to devices and sensors in homes, businesses, vehicles and potentially almost anywhere. This innovation enables devices designed for it to generate and transmit data about their operations; analytics using this data can facilitate monitoring and a range of automatic functions.vr_oi_goals_of_using_operational_intelligence_updated

To perform these functions IoT requires what Ventana Research calls Operational Intelligence (OI), a discipline that has evolved from the capture and analysis of instrumentation, networking and machine-to-machine interactions of many types. We define operational intelligence as a set of event-centered information and analytic processes operating across an organization that enable people to use that event information to take effective actions and make optimal decisions. Our benchmark research into Operational Intelligence shows that organizations most often want to use such event-centric architectures for defining metrics (37%) and assigning thresholds for alerts (35%) and for more action-oriented processes of sending notifications to users (33%) and linking events to activities (27%).

In many industries, organizations can gain competitive advantage if they can reduce the elapsed time between an event occurring and actions taken or decisions made in response to it. Existing business intelligence (BI) tools provide useful analysis of and reporting on data drawn from previously recorded transactions, but to improve competitiveness and maximize efficiencies organizations are concluding that employees and processes – in IT, business operations and front-line customer sales, service and support – also need to be able to detect and respond to events as they happen. Our research into big data integration shows that nearly one in four companies currently integrate data into big data stores in real time. The challenge is to go further and act upon both the data that is stored and the data that is streaming in a timely manner.

The evolution of operational intelligence, especially in conjunction with IoT, is encouraging companies to revisit their priorities and spending for information technology and application management. However, sorting out the range of options poses a challenge for both business and IT leaders. Some see potential value in expanding their network infrastructure to support OI. Others are implementing event processing (EP) systems that employ new technology to detect meaningful patterns, anomalies and relationships among events. Increasingly, organizations are using dashboards, visualization and modeling to notify nontechnical people of events and enable them to understand their significance and take appropriate and immediate action.

As with any innovation, using OI for IoT may require substantial changes. These are among the challenges organizations face as they consider adopting operational intelligence:

  • They find it difficult to evaluate the business value of enabling real-time sensing of data and event streams using identification tags, agents and other systems embedded not only in physical locations like warehouses but also in business processes, networks, mobile devices, data appliances and other technologies.
  • They lack an IT architecture that can support and integrate these systems as the volume and frequency of information increase.
  • They are uncertain how to set reasonable business and IT expectations, priorities and implementation plans for important technologies that may conflict or overlap. These can include business intelligence, event processing, business process management, rules management, network upgrades and new or modified applications and databases.
  • They don’t understand how to create a personalized user experience that enables nontechnical employees in different roles to monitor data or event streams, identify significant changes, quickly understand the correlation between events and develop a context in which to determine the right decisions or actions to take.

Ventana Research has announced new benchmark research on The Internet of Things and Operational Intelligence that will identify trends and best practices associated with this technology and these processes. It will explore organizations’ experiences with initiatives related to events and data and with attempts to align IT projects, resources and spending with new business objectives that demand real-time intelligence and event-driven architectures. The research will investigate how organizations are increasing their responsiveness to events by rebalancing the roles of networks, applications and databases to reduce latency; it also will explore ways in which they are using sensor data and alerts to anticipate problematic events. We will benchmark the performance of organizations’ implementations, including IoT, event stream processing, event and activity monitoring, alerting, event modeling and workflow, and process and rules management.

As operational intelligence evolves as the core of IoT platforms, it is an important time to take a closer look at this emerging opportunity and challenge. For those interested in learning more or becoming involved in this upcoming research, please let me know.


Ventana Research

Splunk’s annual gathering, this year called .conf 2015, in late September hosted almost 4,000 Splunk customers, partners and employees. It is one of the fastest-growing user conferences in the technology industry. The area dedicated to Splunk partners has grown from a handful of booths a few years ago to a vast showroom floor many times larger. While the conference’s main announcement was the release of Splunk Enterprise 6.3, its flagship platform, the progress the company is making in the related areas of machine learning and the Internet of Things (IoT) most caught my attention.

Splunk’s strength is its ability to index, normalize, correlate and query data throughout the technology stack, including applications, servers, networks and sensors. It uses distributed search that enables correlation and analysis of events across local- and wide-area networks without moving vast amounts of data. Its architectural approach unifies cloud and on-premises implementations and provides extensibility for developers building applications. Originally, Splunk provided an innovative way to troubleshoot complex technology issues, but over time new uses for Splunk-based data have emerged, including digital marketing analytics, cyber security, fraud prevention and connecting digital devices in the emerging Internet of Things. Ventana Research has covered Splunk since its establishment in the market, most recently in this analysis of mine.

Splunk’s experience in dealing directly with distributed, time-series data and processes on a large scale puts it in position to address the Internet of Things from an industrial perspective. This sort of data is at the heart of large-scale industrial control systems, but it often comes in different formats and its implementation is based on different formats and protocols. For instance, sensor technology and control systems that were invented 10 to 20 years ago use very different technology than modern systems. Furthermore, as with computer technology, there are multiple layers in stack models that have to communicate. Splunk’s tools help engineers and systems analysts cross-reference these disparate systems in the same way that it queries computer system and network data, however, the systems can be vastly different. To address this challenge, Splunk turns to its partners and its extensible platform. For example, Kepware has developed plug-ins that use its more than 150 communication drivers so users can stream real-time industrial sensor and machine data directly into the Splunk platform. Currently, the primary value drivers for organizations in this field of the industrial IoT are operational efficiency, predictive maintenance and asset management. At the conference, Splunk showcased projects in these areas including one with Target that uses Splunk to improve operations in robotics and manufacturing.

For its part, Splunk is taking a multipronged approach by acquiring companies, investing in internal development and enabling its partner ecosystem to build new products. One key enabler of its approach to IoT is machine learning algorithms built on the Splunk platform. In machine learning a model can use new data to continuously learn and adapt its answers to queries. This differs from conventional predictive analytics, in which users build models and validate them based on a particular sample; the model does not adapt over time. With machine learning, for instance, if a piece of equipment or an automobile shows a certain optimal pattern of operation over time, an algorithm can identify that pattern and build a model for how that system should behave. When the equipment begins to act in a less optimal or anomalous way, the system can alert a human operator that there may be a problem, or in a machine-to-machine situation, it can invoke a process to solve the problem or recalibrate the machine.

Machine learning algorithms allow event processes to be audited, analyzed and acted upon in real time. They enable predictive capabilities for maintenance, transportation and logistics, and asset management and can also be applied in more people-oriented domains such as fraud prevention, security, business process improvement, and digital products.  IoT potentially can have a major impact on business processes, but only if organizations can realign systems to discover-and-adapt rather than model-and-apply approaches. For instance, processes are often carried out in an uneven fashion different from the way the model was conceived and communicated through complex process documentation and systems. As more process flows are directly instrumented and more processes carried out by machines, the ability to model directly based on the discovery of those event flows and to adapt to them (through human learning or machine learning) becomes key to improving organizational processes. Such realignment of business processes, however, often involves broad organizational transformation.Our benchmark research on operational intelligence shows that challenges associated with people and processes, rather than information and technology, most often hold back organizational improvement.

Two product announcements made at the conference illuminate the direction Splunk is taking with IoT and machine learning. The first is User Behavior Analytics (UBA), based VR2015_InnovationAwardWinneron its acquisition of Caspida, which produces advanced algorithms that can detect anomalous behavior within a network. Such algorithms can model internal user behavior, and when behavior deviates from the specified norm, it can generate an alert that can be addressed through investigative processes usingSplunk Enterprise Security 4.0. Together, Splunk Enterprise Security 4.0 and UBA won the 2015 Ventana Research CIO Innovation Award.The acquisition of Caspida shows that Splunk is not afraid to acquire companies in niche areas where they can exploit their platform to deliver organizational value. I expect that we will see more such acquisitions of companies with high value ML algorithms as Splunk carves out specific positions in the emergent markets.

The other product announced is IT Service Intelligence (ITSI), which highlights machine learning algorithms alongside of Splunk’s core capabilities. The IT Service Intelligence App is an application in which end users deploy machine learning to see patterns in various IT service scenarios. ITSI can inform and enable multiple business uses such as predictive maintenance, churn analysis, service level agreements and chargebacks. Similar to UBA, it uses anomaly detection to point out issues and enables managers to view highly distributed processes such as claims process data in insurance companies. At this point, however, use of ITSI (like other areas of IoT) may encounter cultural and political issues as organizations deal with changes in the roles of IT and operations management. Splunk’s direction with ITSI shows that the company is staying close to its IT operations knitting as it builds out application software, but such development also puts Splunk into new competitive scenarios where legacy technology and processes may still be considered good enough.

We note that ITSI is built using Splunk’s Machine Learning Toolkit and showcase, which currently is in preview mode. The vr_Big_Data_Analytics_08_top_capabilities_of_big_data_analyticsplatform is an important development for the company and fills one of the gaps that I pointed out in its portfolio last year. Addressing this gap enables Splunk and its partners to create services that apply advanced analytics to big data that almost half (45%) of organizations find important. The use of predictive and advanced analytics on big data I consider a killer application for big data; our benchmark research on big data analytics backs this claim: Predictive analytics is the type of analytics most (64%) organizations wish to pursue on big data.

Organizations currently looking at IoT use cases should consider Splunk’s strategy and tools in the context of specific problems they need to address. Machine learning algorithms built for particular industries are key so it is important to understand if the problem can be addressed using prebuilt applications provided by Splunk or one of its partners, or if the organization will need to build its own algorithms using the Splunk machine learning platform or alternatives. Evaluate both the platform capabilities and the instrumentation, the type of protocols and formats involved and how that data will be consumed into the system and related in a uniform manner. Most of all, be sure the skills and processes in the organization align with the technology from an end user and business perspective.


Ventana Research

The concept and implementation of what is called big data are no longer new, and many organizations, especially larger ones, view it as a way to manage and understand the flood of data they receive. Our benchmark research on big data analytics shows that business intelligence (BI) is the most common type of system to which organizations deliver big data. However, BI systems aren’t a good fit for analyzing big data. They were built to provide interactive analysis of structured data sources using Structured Query Language (SQL). Big data includes large volumes of data that does not fit into rows and columns, such as sensor data, text data and Web log data. Such data must be transformed and modeled before it can fit into paradigms such as SQL.

The result is that currently many organizations run separate systems for big data and business intelligence. On one system, conventional BI tools as well as new visual discovery tools act on structured data sources to do fast interactive analysis. In this area analytic databases can use column store approaches and visualization tools as a front end for fast interaction with the data. On other systems, big data is stored in distributed systems such as the Hadoop Distributed File System (HDFS). Tools that use it have been developed to access, process and analyze the data. Commercial distribution companies aligned with the open source Apache Foundation, such as Cloudera, Hortonworks and MapR, have built ecosystems around the MapReduce processing paradigm. MapReduce works well for search-based tasks but not so well for the interactive analytics for which business intelligence systems are known. This situation has created a divide between business technology users, who gravitate to visual discovery tools that provide easily accessible and interactive data exploration, and more technically skilled users of big data tools that require sophisticated access paradigms and elongated query cycles to explore data.

vr_Big_Data_Analytics_07_dissatisfaction_with_big_data_analyticsThere are two challenges with the MapReduce approach. First, working with it is a highly technical endeavor that requires advanced skills. Our big data analytics research shows that lack of skills is the most widespread reason for dissatisfaction with big data analytics, mentioned by more than two-thirds of companies. To fill this gap, vendors of big data technologies should facilitate use of familiar interfaces including query interfaces and programming language interfaces. For example, our research shows that Standard SQL is the most important method for implementing analysis on Hadoop. To deal with this challenge, the distribution companies and others offer SQL abstraction layers on top of HDFS, such as HIVE and Cloudera Impala. Companies that I have written about include Datameer and Platfora, whose systems help users interact with Hadoop data via interactive systems such as spreadsheets and multidimensional cubes. With their familiar interaction paradigms such systems have helped increase adoption of Hadoop and enable more than a few experts to access big data systems.

The second challenge is latency. As a batch process MapReduce must sort and aggregate all of the data before creating analytic output. Technology such as Tez, developed by Hortonworks, and Cloudera Impala aim to address such speed limitations; the first leverages MapReduce, and the other circumvents MapReduce altogether. Adoption of these tools has moved the big data market forward, but challenges remain such as the continuing fragmentation of the Hadoop ecosystem and a lack of standardization in approaches.

An emerging technology holds promise for bridging the gap between big data and BI in a way that can unify big data ecosystems rather than dividing them. Apache Spark, under development since 2010 at the University of California Berkeley’s AMPLab, addresses both usability and performance concerns for big data. It adds flexibility by running on multiple platforms in terms of both clustering (such as Hadoop YARN and Apache Mesos) and distributed storage (for example, HDFS, Cassandra, Amazon S3 and OpenStack’s Swift). Spark also expands the potential uses because the platform includes an SQL abstraction layer (Spark SQL), a machine learning library (MLlib), a graph library (GraphX) and a near-real-time engine (Spark Streaming). Furthermore, Spark can be programmed using modern languages such as Python and Scala. Having all of these components integrated is important because interactive business intelligence, advanced analytics and operational intelligence on big data all can work without dealing with the complexity of having individual proprietary systems that were necessary to do the same things previously.

Because of this potential Spark is becoming a rallying point for providers of big data analytics. It has become the most active Apache project as key open source contributors moved their focus from other Hadoop projects to it. Out of the effort in Berkeley, Databricks was founded for commercial development of open source Apache Spark and has raised more than $46 million. Since the initial release in May 2014 the momentum for Spark has continued to build; major companies have made announcements around Apache Spark. IBM said it will dedicate 3,500 researchers and engineers to develop the platform and help customers deploy it. This is the largest dedicated Spark effort in the industry, akin to the move IBM made in the late 1990s with the Linux open source operating system. Oracle has built Spark into its Big Data Appliance. Microsoft has Spark as an option on its HDInsight big data approach but has also announced Prajna, an alternative approach to Spark. SAP has announced integration with its SAP HANA platform, although it represents “coopetition” for SAP’s in-memory platform. In addition, all the major business intelligence players have built or are building connectors to run on Spark. In time, Spark likely will serve as a data ingestion engine for connecting devices in the Internet of Things (IoT). For instance, Spark can integrate with technologies such as Apache Kafka or Amazon Kinesis to instantly process and analyze IoT data so that immediate action can be taken. In this way, as it is envisioned by its creators, Spark can serve as the nexus of multiple systems.

Because it is a flexible in-memory technology for big data, Spark opens the door to many new opportunities, which in business use include interactive analysis, advanced customer analytics,VentanaResearch_NextGenPredictiveAnalytics_BenchmarkResearchfraud detection, and systems and network management. At the same time, it is not yet a mature technology and for this reason,  organizations considering adoption should tread carefully. While Spark may offer better performance and usability, MapReduce is already widely deployed. For those users, it is likely best to maintain the current approach and not fix what is not broken. For future big data use, however, Spark should be carefully compared to other big data technologies. In this case as well as others, technical skills can still be a concern. Scala, for instance, one of the key languages used with Spark, has little adoption, according to our recent research on next-generation predictive analytics. Manageability is an issue as for any other nascent technology and should be carefully addressed up front. While, as noted, vendor support for Spark is becoming apparent, frequent updates to the platform can mean disruption to systems and processes, so examine the processes for these updates. Be sure that vendor support is tied to meaningful business objectives and outcomes. Spark is an exciting new technology, and for early adopters that wish to move forward with it today, both big opportunities and challenges are in store.


Ventana Research

One of the key findings in our latest benchmark research into predictive analytics is that companies are incorporating predictive analytics into their operational systems more often than was the case three years ago. The research found that companies are less inclined to purchase stand-alone predictive analytics tools (29% vs 44% three years ago) and more inclined to purchase predictive analytics built into business intelligence systems (23% vs 20%), applications (12% vs 8%), databases (9% vs 7%) and middleware (9% vs 2%). This trend is not surprising since operationalizing predictive analytics – that is, building predictive analytics directly into business process workflows – improves companies’ ability to gain competitive advantage: those that deploy predictive analyticsvr_NG_Predictive_Analytics_12_frequency_of_updating_predictive_models within business processes are more likely to say they gain competitive advantage and improve revenue through predictive analytics than those that don’t.

In order to understand the shift that is underway, it is important to understand how predictive analytics has historically been executed within organizations. The marketing organization provides a useful example since it is the functional area where organizations most often deploy predictive analytics today. In a typical organization, those doing statistical analysis will export data from various sources into a flat file. (Often IT is responsible for pulling the data from the relational databases and passing it over to the statistician in a flat file format.) Data is cleansed, transformed, and merged so that the analytic data set is in a normalized format. It then is modeled with stand-alone tools and the model is applied to records to yield probability scores. In the case of a churn model, such a probability score represents how likely someone is to defect. For a marketing campaign, a probability score tells the marketer how likely someone is to respond to an offer. These scores are produced for marketers on a periodic basis – usually monthly. Marketers then work on the campaigns informed by these static models and scores until the cycle repeats itself.

The challenge presented by this traditional model is that a lot can happen in a month and the heavy reliance on process and people can hinder the organization’s ability to respond quickly to opportunities and threats. This is particularly true in fast-moving consumer categories such as telecommunications or retail. For instance, if a person visits the company’s cancelation policy web page the instant before he or she picks up the phone to cancel the contract, this customer’s churn score will change dramatically and the action that the call center agent should take will need to change as well. Perhaps, for example, that score change should mean that the person is now routed directly to an agent trained to deal with possible defections. But such operational integration requires that the analytic software be integrated with the call agent software and web tracking software in near-real time.

Similarly, the models themselves need to be constantly updated to deal with the fast pace of change. For instance, if a telecommunications carrier competitor offers a large rebate to customers to switch service providers, an organization’s churn model can be rendered out of date and should be updated. Our research shows that organizations that constantly update their models gain competitive advantage more often than those that only update them periodically (86% vs 60% average), more often show significant improvement in organizational activities and processes (73% vs 44%), and are more often very satisfied with their predictive analytics (57% vs 23%).

Building predictive analytics into business processes is more easily discussed than done; complex business and technical challenges must be addressed. The skills gap that I recently wrote about is a significant barrier to implementing predictive analytics. Making predictive analytics operational requires not only statistical and business skills but technical skills as well.   From a technical perspective, one of the biggest challenges for operationalizing predictive analytics is accessing and preparing data which I wrote about. Four out of ten companies say that this is the part of the predictive analytics process vr_NG_Predictive_Analytics_02_impact_of_doing_more_predictive_analyticswhere they spend the most time. Choosing the right software is another challenge that I wrote about. Making that choice includes identifying the specific integration points with business intelligence systems, applications, database systems, and middleware. These decisions will depend on how people use the various systems and what areas of the organization are looking to operationalize predictive analytics processes.

For those that are willing to take on the challenges of operationalizing predictive analytics the rewards can be significant, including significantly better competitive positioning and new revenue opportunities. Furthermore, once predictive analytics is initially deployed in the organization it snowballs, with more than nine in ten companies going on to increase their use of predictive analytics. Once companies reach that stage, one third of them (32%) say predictive analytics has had a transformational impact and another half (49%) say it provides a significant positive benefits.


Ventana Research

Our benchmark research into predictive analytics shows that lack of resources, including budget and skills, is the number-one business barrier to the effective deployment and use of predictive analytics; awareness – that is, an understanding of how to apply predictive analytics to business problems – is second. In order to secure resources and address awareness problems a business case needs to be created and communicated clearly wherever appropriate across the organization. A business case presents the reasoning for initiating a project or task. A compelling business case communicates the nature of the proposed project and the arguments, both quantified and unquantifiable, for its deployment.

The first steps in creating a business case for predictive analytics are to understand the audience and to communicate with the experts who will be involved in leading the project. Predictive analytics can be transformational in nature and therefore the audience potentially is broad, including many disciplines within the organization. Understand who should be involved in business case creation a list that may include business users, analytics users and IT. Those most often primarily responsible for designing and deploying predictive analytics are data scientists (in 31% of organizations), the business intelligence and data warehouse team (27%), those working in general IT (16%) and line of business analysts (13%), so be sure to involve these groups. Understand the specific value and challenges for each of the constituencies so the business case can represent the interests of these key stakeholders. I discuss the aspects of the business where these groups will see predictive analytics most adding value here and here.

For the business case for a predictive analytics deployment to be persuasive, executives also must understand how specifically the deployment will impact their areas of responsibilityvr_NG_Predictive_Analytics_01_front_office_functions_use_predictive_anal.._ and what the return on investment will be. For these stakeholders, the argument should be multifaceted. At a high level, the business case should explain why predictive analytics is important and how it fits with and enhances the organization’s overall business plan. Industry benchmark research and relevant case studies can be used to paint a picture of what predictive analytics can do for marketing (48%), operations (44%) and IT (40%), the functions where predictive analytics is used most.

A business case should show how predictive analytics relates to other relevant innovation and analytic initiatives in the company. For instance, companies have been spending money on big data, cloud and visualization initiatives where software returns can be more difficult to quantify. Our research into big data analytics and data and analytics in the cloud show that the top benefit for these initiatives are communication and knowledge sharing. Fortunately, the business case for predictive analytics can cite the tangible business benefits our research identified, the most often identified of which are achieving competitive advantage (57%), creating new revenue opportunities (50%), and increasing profitability vr_NG_Predictive_Analytics_03_benefits_of_predictive_analytics(46%). But the business case can be made even stronger by noting that predictive analytics can have added value when it is used to leverage other current technology investments. For instance, our big data analytics research shows that the most valuable type of analytics to be applied to big data is predictive analytics.

To craft the specifics of the business case, concisely define the business issue that will be addressed. Assess the current environment and offer a gap analysis to show the difference between the current environment and the future environment). Offer a recommended solution, but also offer alternatives. Detail the specific value propositions associated with the change. Create a financial analysis summarizing costs and benefits. Support the analysis with a timeline including roles and responsibilities. Finally, detail the major risk factors and opportunity costs associated with the project.

For complex initiatives, break the overall project into a series of shorter projects. If the business case is for a project that will involve substantial work, consider providing separate timelines and deliverables for each phase. Doing so will keep stakeholders both informed and engaged during the time it takes to complete the full project. For large predictive analytics projects, it is important to break out the due-diligence phase and try not to make any hard commitments until that phase is completed. After all, it is difficult to establish defensible budgets and timelines until one knows the complete scope of the project.

Ensure that the project time line is realistic and addresses all the key components needed for a successful deployment.  In particular with predictive analytics projects, make certain that it reflects a thoughtful approach to data access, data quality and data preparation. We note that four in 10 organizations say vr_NG_Predictive_Analytics_08_time_spent_in_predictive_analytic_processthat the most time spent in the predictive analytics process is in data preparation and another 22 percent say that they spend the most time accessing data sources. If data issues have not been well thought through, it is next to impossible for the predictive analytics initiative to be successful. Read my recent piece on operationalizing predictive analytics to show how predictive analytics will align with specific business processes.

If you are proposing the implementation of new predictive analytics software, highlight the multiple areas of return beyond competitive advantage and revenue benefits. Specifically, new software can have a total lower cost of ownership and generate direct cost savings from improved operating efficiencies. A software deployment also can yield benefits related to people (productivity, insight, fewer errors), management (creativity, speed of response), process (shorter time on task or time to complete) and information (easier access, more timely, accurate and consistent). Create a comprehensive list of the major benefits the software will provide compared to the existing approach, quantifying the impact wherever possible. Detail all major costs of ownership whether the implementation is on-premises or cloud-based: these will include licensing, maintenance, implementation consulting, internal deployment resources, training, hardware and other infrastructure costs. In other words, think broadly about both the costs and the sources of return in building the case for new technology. Also, read my recent piece on procuring predictive analytics software.

Understanding the audience, painting the vision, crafting the specific case, outlining areas of return, specifying software, noting risk factors, and being as comprehensive as possible are all part of a successful business plan process. Sometimes, the initial phase is really just a pitch for project funding and there won’t be any dollar allocation until people are convinced that the program will get them what they need.  In such situations multiple documents may be required, including a short one- to two-page document that outlines vision and makes a high-level argument for action from the organizational stakeholders. Once a cross functional team and executive support is in place, a more formal assessment and design plan following the principles above will have to be built.

Predictive analytics offers significant returns for organizations willing pursue it, but establishing a solid business case is the first step for any organization.


Ventana Research

Our research into next-generation predictive analytics shows that along with not having enough skilled resources, which I discussed in my previous analysisNGPA AP #4 image 1the inability to readily access and integrate data is a primary reason for dissatisfaction with predictive analytics (in 62% of participating organizations). Furthermore, this area consumes the most time in the predictive analytics process: The research finds that preparing data for analysis (40%) and accessing data (22%) are the parts of the predictive analysis process that create the most challenges for organizations. To allow more time for actual analysis, organizations must work to improve their data-related processes.

Organizations apply predictive analytics to many categories of information. Our research shows that the most common categories are customer (used by 50%), marketing (44%), product (43%), financial (40%) and sales (38%). Such information often has to be combined from various systems and enriched with information from new sources. Before users can apply predictive analytics to these blended data sets, the information must be put into a common form and represented as a normalized analytic data set. Unlike in data warehouse systems, which provide a single data source with a common format, today data is often located in a variety of systems that have different formats and data models. Much of the current challenge in accessing and integrating data comes from the need to include not only a variety of relational data sources but also less structured forms of data. Data that varies in both structures and sizes is commonly called big data.

To deal with the challenge of storing and computing big data, organizations planning to use predictive analytics increasingly turn to big data technology. While flat files and relational databases on standard hardware, each cited by almost two-thirds (63%) of participants, are still the most commonly used tools for predictive analytics, more than half (52%) of organizations now use data warehouse appliances for Using Big Data with Predictive Analytics predictive analytics, and 31 percent use in-memory databases, which the second-highest percentage (24%) plan to adopt in the next 12 to 24 months. Hadoop and NoSQL technologies lag in adoption, currently used by one in four organizations, but in the next 12 to 24 months an additional 29 percent intend to use Hadoop and 20 percent more will use other NoSQL approaches. Furthermore, more than one-quarter (26%) of organizations are evaluating Hadoop for use in predictive analytics, which is the most of any technology.

 Some organizations are considering moving from on-premises to cloud-based storage of data for predictive analytics; the most common reasons for doing so are to improve accessing data (for 49%) and preparing data for analysis (43%). This trend speaks to the increasing importance of cloud-based data sources as well as cloud-based tools that provide access to many information sources and provide predictive analytics. As organizations accumulate more data and need to apply predictive analytics in a scalable manner, we expect the need to access and use big data and cloud-based systems to increase.

While big data systems can help handle the size and variety of data, they do not of themselves solve the challenges of data access and normalization. This is especially true for organizations that need to blend new data that resides in isolated systems. How to do this is critical for organizations to consider, especially in light of the people using predictive analytic system and their skills. There are three key considerations here. One is the user interface, the most common of which are spreadsheets (used by 48%), graphical workflow modeling tools (44%), integrated development environments (37%) and menu-driven modeling tools (35%). Second is the number of data sources to deal with and which are supported by the system; our research shows that four out of five of organizations need to access and integrate five or more data sources. The third consideration is which analytic languages and libraries to use and which are supported by the system; the research finds that Microsoft Excel, SQL, R, Java and Python are the most widely used for predictive analytics. Considering these three priorities both in terms of the resident skills, processes, current technology, and information sources that need to be accessed are crucial for delivering value to the organization with predictive analytics.

While there has been an exponential increase in data available to use in predictive analytics as well as advances in integration technology, our research shows that data access and preparation are still the most challenging and time-consuming tasks in the predictive analytics process. Although technology for these tasks has improved, complexity of the data has increased through the emergence of different data types, large-scale data and cloud-based data sources. Organizations must pay special attention to how they choose predictive analytics tools that can give easy access to multiple diverse data sources including big data stores and provide capabilities for data blending and provisioning of analytic data sets. Without these capabilities, predictive analytics tools will fall short of expectations.


Ventana Research

The Performance Index analysis we performed as part of our next-generation predictive analytics benchmark research shows that only one in four organizations, those functioning at the highest Innovative level of performance, can use predictive analytics to compete effectively against others that use this technology less well. We analyze performance in detail in four dimensions (People, Process, Information and Technology), and for predictive analytics we find that organizations perform best in the Technology dimension, with 38 percent reaching the top Innovative level. This is often the case in our analyses, as organizations initially perform better in the details of selectingvr_NG_Predictive_Analytics_performance_06_dimensions and managing new tools than in the other dimensions. Predictive analytics is not a new technology per se, but the difference is that it is becoming more common in business units, as I have written.

In contrast to organizations’ performance in the Technology dimension, only 10 percent reach the Innovative level in People and only 11 percent in Process. This disparity uncovered by the research analysis suggests there is value in focusing on the skills that are used to design and deploy predictive analytics. In particular, we found that one of the two most-often cited reasons why participants are not fully satisfied with the organization’s use of predictive analytics is that there are not enough skilled resources (cited by 62%). In addition, 29 percent said that the need for too much training or customized skills is a barrier to changing their predictive analytics.

The challenge for many organizations is to find the combination of domain knowledge, statistical and mathematical knowledge, and technical knowledge that it needs to be able to integrate predictive analytics into other technology systems and into operations in the lines of business, which I also have discussed. The need for technical knowledge is evident in the research findings on the jobs held by individual participants: Three out of four require technical sophistication. More than one-third (35%) are data scientists who have a deep understanding of predictive analytics and its use as well as of data-related technology; one-fourth are data analysts who understand the organization’s data and systems but have limited knowledge of predictive analytics; and 16 percent described themselves as predictive analytics experts who have a deep understanding of this topic but not of technology in general. The research also finds that those most often primarily responsible for designing and deploying predictive analytics are data scientists (in 31% of organizations) or members of the business intelligence and data warehouse team (27%). This focus on business intelligence and data warehousing vr_NG_Predictive_Analytics_16_why_users_dont_produce_predictive_analysesrepresents a shift toward integrating predictive analytics with other technologies and indicates a need to scale predictive analytics across the organization.

In only about half (52%) of organizations are the people who design and deploy predictive analytics the same people who utilize the output of these processes. The most common reasons cited by research participants that users of predictive analytics don’t produce their own analyses are that they don’t have enough skills training (79%) and don’t understand the mathematics involved (66%). The research also finds evidence that skills training pays off: Fully half of those who said they received adequate training in applying predictive analytics to business problems also said they are very satisfied with their predictive analytics; percentages dropped precipitously for those who said the training was somewhat adequate (8%) and inadequate (6%). It is clear that professionals trained in both business and technology are necessary for an organization to successfully understand, deploy and use predictive analytics.

To determine the technical skills and training necessary for predictive analytics, it is important to understand which languages and libraries are used. The research shows that the most common are SQL (used by 67% of organizations) and Microsoft Excel (64%), with which many people are familiar and which are relatively easy to use. The three next-most commonly used are much more sophisticated: the open source language R (by 58%), Java (42%) and Python (36%). Overall, many languages are in use: Three out of five organizations use four or more of them. This array reflects the diversity of approaches to predictive analytics. Organizations must assess what languages make sense for their uses, and vendors must support many languages for predictive analytics to meet the demands of all customers.

The research thus makes clear that organizations must pay attention to a variety of skills and how to combine them with technology to ensure success in using predictive analytics. Not all the skills necessary in an analytics-driven organization can be combined in one person, as I discussed in my analysis of analytic personas. We recommend that as organizations focus on the skills discussed above, they consider creating cross-functional teams from both business and technology groups.


Ventana Research

To impact business success, Ventana Research recommends viewing predictive analytics as a business investment rather than an IT investment.  Our recent benchmark research into next-generation predictive analytics  reveals that since our previous research on the topic in 2012, funding has shifted from general business budgets (previously 44%) to line of business IT budgets (previously 19%). Now more than vr_NG_Predictive_Analytics_15_preferences_in_purchasing_predictive_analy.._  half of organizations fund such projects from business budgets: 29 percent from general business budgets and 27 percent from a line of business IT budget. This shift in buying reflects the mainstreaming of predictive analytics in organizations,  which I recently wrote about .

This shift in funding of initiatives coincides with a change in the preferred format for predictive analytics. The research reveals that 15 percent fewer organizations prefer to purchase predictive analytics as stand-alone technology today than did in the previous research (29% now vs. 44% then). Instead we find growing demand for predictive analytics tools that can be integrated with operational environments such as business intelligence or transaction applications. More than two in five (43%) organizations now prefer predictive analytics embedded in other technologies. This integration can help businesses respond faster to market opportunities and competitive threats without having to switch applications.

  vr_NG_Predictive_Analytics_14_considerations_in_evaluating_predictive_an.._ The features most often sought in predictive analytics products further confirm business interest. Usability (very important to 67%) and capability (59%) are the top buying criteria, followed by reliability (52%) and manageability (49%). This is consistent with the priorities of organizations three years ago with one important exception: Manageability was one of the two least important criteria then (33%) but today is nearly tied with reliability for third place. This change makes sense in light of a broader use of predictive analytics and the need to manage an increasing variety of models and input variables.

Further, as a business investment predictive analytics is most often used in front-office functions, but the research shows that IT and operations are closely associated with these functions. The top four areas of predictive analytics use are marketing (48%), operations (44%), IT (40%) and sales (38%). In the previous research operations ranked much lower on the list.

To select the most useful product, organizations must understand where IT and business buyers agree and disagree on what matters. The research shows that they agree closely on how to deploy the tools: Both expressed a greater preference to deploy on-premises (business 53%, IT 55%) but also agree in the number of those who prefer it on demand through cloud computing (business 22%, IT 23%). More than 90 percent on both sides said the organization plans to deploy more predictive analytics, and they also were in close agreement (business 32%, IT 33%) that doing so would have a transformational impact, enabling the organization to do things it couldn’t do before.

However, some distinctions are important to consider, especially when looking at the business case for predictive analytics. Business users more often focus on the benefit of achieving competitive advantage (60% vs. 50% of IT) and creating new revenue opportunities (55% vs. 41%), which are the two benefits most often cited overall. On the other hand, IT professionals more often focus on the benefits of in­creased upselling and cross-selling (53% vs. 32%), reduced risk (26% vs. 21%) and better compliance (26% vs. 19%); the last two reflect key responsibilities of the IT group.

Despite strong business involvement, when it comes to products, IT, technical and data experts are indispensable for the evaluation and use of predictive analytics. Data scientists or the head of data management are most often involved in recommending (52%) and evaluating (56%) predictive analytics technologies. Reflecting the need to deploy predictive analytics to business units, analysts and IT staff are the next-most influential roles for evaluating and recommending. This involvement of technically sophisticated individuals combined with the movement away from organizations buying stand-alone tools indicates an increasingly team-oriented approach.

Purchase of predictive analytics often requires approval from high up in the organization, which underscores the degree of enterprise-wide interest in this technology. The CEO or president is most likely to be involved in the final decision in small (87%) and midsize (76%) companies. In contrast, large companies rely most on IT management (40%), and very large companies rely most on the CIO or head of IT (60%). We again note the importance of IT in the predictive analytics decision-making process in larger organizations. In the previous research, in large companies IT management was involved in approval in 9 percent of them and the CIO was involved in only 40 percent.

As predictive analytics becomes more widely used, buyers should take a broad view of the design and deployment requirements of the organization and specific lines of business. They should consider which functional areas will use the tools and consider issues involving people, processes and information as well as technology when evaluating such systems. We urge business and IT buyers to work together during the buying process with the common goal of using predictive analytics to deliver value to the enterprise.


Ventana Research

Our recently released benchmark research into next-generation predictive analytics  shows that in this increasingly important area many organizations are moving forward in the dimensions of information and technology, but most are challenged to find people with the right skills and to align organizationalVentanaResearch_NextGenPredictiveAnalytics_BenchmarkResearch processes to derive business value from predictive analytics.

For those that have done so, the rewards can be significant. One-third of organizations participating in the research said that using predictive  analytics leads to transformational change – that is, it enables them to do things they couldn’t do before – and at least half said that it provides competitive advantage or creates new revenue opportunities. Reflecting the  vr_NG_Predictive_Analytics_03_benefits_of_predictive_analytics momentum behind predic­tive analytics today, virtually all participants (98%) that have engaged in predictive analytics said that they will be rolling out more of it.

Our research shows that predictive analytics is being used most often in the front offices of organizations, specifically in marketing (48%), operations (44%) and IT (40%). While operations and IT are not often considered front-office functions, we find that they are using predictive analytics in service to customers. For instance, the ability to manage and impact the customer experience by applying analytics to big data is an increasingly important approach that  I recently wrote about . As conventional channels of communication give way to digital channels, the use of predictive analytics in operations and IT becomes more valuable for marketing and customer service.

However, the most widespread barrier to making changes in predictive analytics is lack of resources (cited by 52% of organizations), which includes finding the necessary skills to design and deploy programs. The research shows that currently consultants and data scientists are those most often needed. Half the time those designing the system are also the end users of it, which indicates that using predictive analytics still requires advanced skills. Lack of awareness (cited by 48%) is the second-most common barrier; many organizations fail to understand the vr_NG_Predictive_Analytics_06_technical_challenges_to_predictive_analyti.._  value of predictive analytics in their business. Some of the reluctance to implement predictive analytics may be because doing so can require significant change. Predictive analytics often represents a new way of thinking and can necessitate revamping of key organizational processes.

From a technical perspective, the most common deployment challenge is difficulty in integrating predictive analytics into the information architecture, an issue cited by half of participants. This is not surprising given the diversity of tools and databases involved in big data. Problems with accessing source data (30%), inappropriate algorithms (26%) and inaccurate results (21%) also impede use. Accessing and normalizing data sources is a significant issue as many different types of data must be incorporated to use predictive analytics optimally. Blending this data and turning it into a clean analytic data set often takes significant effort. Confirming this is the finding that data preparation is the most challenging part of the analytic process for half of the organizations in the research.

Regarding interaction with other established systems, business intelligence is most often the integration point (for 56% of companies). However, it also is increasingly embedded in databases and middleware. The ability to perform modeling in databases is important since it enables analysts to work with large data sets and do more timely model updates and scoring. Embedding into middleware has grown fourfold since our previous research on predictive analytics in 2012; this has implications for the emerging Internet of Things (IoT), through which people will interact with an increasing array of devices.

Another sign of the broader adoption of predictive analytics is how and where buying decisions are made. Budgets for  vr_NG_Predictive_Analytics_07_funding_improvement_in_predictive_analytic.._ predictive analytics are shifting. Since the previous research, funding sourced from general business budgets has declined 9 percent and increased 8 percent in line-of-business IT budgets. This comports with a shift in the form in which organizations prefer to buy predictive analytics, which now is less as a stand-alone product and more embedded in other systems. Usability and functionality are still the top buying criteria, reflecting needs to simplify predictive analytics tools and address the skills gap while still being able to access a range of capabilities.

Overall the research shows that the application of predictive analytics to business processes sets high-performing organizations apart from others. Companies more often achieve competitive advantage with predictive analytics when they support the deployment of predictive analytics in business processes (66% vs. 57% overall), use business intelligence and data warehouse teams to design and deploy predictive analytics (71% vs. 58%) and fund predictive analytics as a shared service (73% vs. 58%). Similarly, those that train employees in the application of predictive analytics to business problems achieve more satisfaction and better outcomes.

Organizations looking to improve their business through predictive analytics should examine what others are doing. Since the time of our previous research, innovation has expanded and there are more peer organizations across industries and business functions that can be emulated. And the search for such innovation need not be limited to within one’s industry; cross-industry examples also can be enlightening. More concretely, the research finds that people and processes are where organizations can improve most in predictive analytics. We advise them to concentrate on streamlining processes, acquiring necessary skills and supporting both with technology available in the market. To begin, develop a practical predictive analytics strategy and enlist all stakeholders in the organization to support initiatives.


Ventana Research

Our benchmark research into big data analytics shows that marketing in the form of cross-selling and upselling (38%) and customer understanding (32%) are the top use cases for big data analytics. Related to these uses, organizations today spend billions of dollars on programs seeking customer loyalty andvr_Big_Data_Analytics_09_use_cases_for_big_data_analytics satisfaction. A powerful metric that impacts this spending is net promoter score (NPS), which attempts to connect brand promotion with revenue. NPS has proven to be a popular metric among major brands and Fortune 500 companies. Today, however, the advent of big data systems brings the value and the accuracy of NPS into question. It and similar loyalty metrics face displacement by big data analytics capabilities that can replace stated behavior and survey-based attitudinal data with actual behavioral data (sometimes called revealed behavior) combined with unstructured data sources such as social media. Revealed behavior shows what people have actually done and thus is a better predictor of what they will do in the future than what they say they have done or intend to do in the future. With interaction through various customer touch points (the omnichannel approach) it is possible to measure both attitudes and revealed behavior in a digital format and to analyze such data in an integrated fashion. Using innovative technology such as big data analytics can overcome three inherent drawbacks of NPS and similar customer loyalty and satisfaction metrics.

Such metrics have been part of the vernacular in boardrooms, organizational cultures and MBA programs since the 1980s, based on frameworks such as the Balanced Scorecard introduced by Kaplan and Norton. Net promoter score, a metric to inform the customer quadrant of such scorecards, is based on surveys in which participants are asked how likely they are to promote a brand based on an 11-point scale. The percentage of detractors (scores 0-6) is subtracted from the percentage of promoters (9-10) to produce the net promoter score. This score helps companies assess satisfaction around a brand and allows executives and managers to allocate resources. The underlying assumption is that attitude toward a brand is a leading indicator of intent and behavior. As such, NPS ostensibly can predict things such as churn behavior (the net number of new customers minus those leaving). By understanding attitudes and behavioral intent, marketers can intervene with actions such as timely offers and others intended to change behavior such as customers leaving.

Until recently, NPS and similar loyalty approaches have been one of the most adopted methods to track attitudes and vr_Customer_Analytics_02_drivers_for_new_customer_analyticsbehaviors in customer interactions and to provide a logical way to impact and improve the customer experience. The prominence of such loyalty programs and metrics reflects an increasing focus on the customer. An indication of this increased focus is found in our next-generation customer analytics benchmark research, in which improving the customer experience (63%), improving customer service strategy (57%) and improving outcomes of interactions (51%) are the top drivers for adopting customer analytics. Nevertheless, while satisfaction and loyalty metrics such as NPS are entrenched in many organizations, there are three fundamental problems with them that can be overcome using big data analytics. Let’s look at each of these challenges and how big data analytics can overcome them.

It is prone to error. Current methods and metrics are vulnerable to errors, most deriving from one of three sources.

Coverage error results from measuring only a segment of a population and projecting the results onto the entire population. The problem here is clear if we imagine using data about California to draw conclusions about the entire United States. While researchers try to overcome such coverage error with stratified sampling methods, it necessitates significant investment usually not associated with business research. Additionally, nonresponse error, a subset of coverage error, results from people opting out of being measured.

Sample error is the statistical error associated with making conclusions about a population based on only a subset of a population. Researchers can overcome it by increasing sample sizes, but this, too, requires significant investment usually not associated with business research.

Measurement error is a complex topic that deserves an extended discussion beyond the scope here, but it presumes that analysts should start with a hypothesis and try to disprove it rather than to prove it. From there, iteration is needed to come as close to the truth as possible. In the case of NPS, measurement error can simply be the result of people not telling the truth or being unduly influenced by a recent experience that skews evaluations such as brand impression or likelihood to promote a brand. Another instance occurs when a proper response option is not represented and people are forced to give an incorrect response.

Big data can address these error vulnerability because it uses a census approach to data collection. Today companies can capture data about nearly every customer interaction with the brand, including customer service calls, website experiences, social media posts and transactions. Because the data is collected across the entire population and includes more revealed behavior than attitudinal and stated behavior, the error problems associated with NPS can be largely overcome.

It lacks causal linkage with financial metrics. The common claim that a higher NPS leads to increased revenue, like the presumed relationship between customer satisfaction and business outcomes, is impossible to prove in all circumstances and all industries. For instance, a pharmaceutical company trying to tie NPS to revenue might ask a doctor how likely he is to write a prescription for a certain drug. The doctor might see this as a compromising question and not be willing to answer honestly. Regarding satisfaction metrics, Microsoft in the 1990s had very low user satisfaction but high loyalty because it had a virtual monopoly. The airline industry today sees similar dynamics.

Big data analytics can show causal linkage between measurement of the customer experience and the organization’s financial metrics. It can link systems of record such as enterprise resource planning and enterprise performance management with systems of engagement such as content management, social media, marketing and sales. Collecting large data sets of customer interactions over time enable systems to relate customer experiences with purchase behaviors such as recency, frequency and size of purchase. This can be done on an ongoing basis and can be tested with randomized experiments. With big data platforms that can reduce data to the lowest common denominators in the form of key-value pairs, the only obstacles are to have the right skill sets, big data analytic software and enough data to be able to isolate variables and repeat the experiments over time. When there is enough data to do so, causal patterns emerge that can link customer attitudes and experiences directly with transactional outcomes. As long as there is enough data, such linkage can be revealed in any type of market such as wallet share in consumer packaged goods or “winner take all” markets such as automobiles.

It lacks actionable data. Often loyalty metrics such as NPS are tied to employee compensation. Those employees have a motivation to understand the metric and what action is needed to improve the score, but that is not easy due to a number of factors. Unlike quantitative metrics such as revenue or profitability, NPS and similar loyalty metrics are softer metrics whose impacts are not easily understood. Furthermore, the measurement may happen just once or twice a year, and the composition of the sample can change over time. Often what happens is a customer satisfaction team and consultants responsible for the research and analysis prepare the trend and driver analysis and share that with various teams with suggested areas of improvement and action to be taken. Such information is disseminated based on aggregated data broken out by important product and service segments and perhaps customer journey timelines. The problem is that even if employees understand the metric and how to impact it, by the time action is taken within the organization, it is not timely and not customized in an individual manner.

Big data analytics inherently has a streamlined capability to act upon data. Instead of the traditional process of reporting results and waiting months for action to be taken on those results and new results to show up in an NPS program, data can be acted upon immediately by all employees. A big reason for this is that data is now collected at a granular level for individual customers. For instance, if a customer with a high customer lifetime value (CLV) score shows signs that are precursors of switching companies, a report can be issued to show all interactions in that individual’s customer journey and highlight the most impactful events. Then an alert can be sent and a personal interaction such as a phone call or a face-to-face meeting can be set up with the objective of preventing the customer’s defection. Incentives such as a bank automatically waiving certain fees, an airline giving an upgrade to first-class or a grocery store giving a gift certificate can be recommended by the system as a next best action.  It can also be done on a more automated but still personalized basis where the individual customer can be discreetly addressed to see how he or she can be made happy. Each of the actions can be measured against the value of the customer and contextualized forvr_Big_Data_Analytics_08_top_capabilities_of_big_data_analytics that customer. In this way, big data analytics platforms can bring together what used to be separate analytic models and action plans related to loyalty, churn, micromarketing campaigns and next best action. It is not surprising in this context that applying predictive analytics is the most important capability for big data analytics for nearly two-thirds (64%) of organizations participating in our research.I wrote about these ideas a few years ago, but only recently have I seen information systems capable of disrupting this entire category. It will not happen overnight since many NPS and satisfaction programs are tied to a component of employee compensation and internal processes that are not easily changed. Furthermore, NPS can still have value as a metric to understand word of mouth around a brand and in areas that lack data and better metrics. However, as attitudinal and behavioral big data continue to be collected and big data analytics technology continues to mature, revealed behavior will always outperform attitudinal and stated behavior data. Organizations that can challenge their conventional NPS wisdom and overcome internal political obstacles are likely to see superior return from their customer experience management investments.


Ventana Research

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter


  • 72,942 hits
%d bloggers like this: