You are currently browsing the monthly archive for October 2015.

The concept and implementation of what is called big data are no longer new, and many organizations, especially larger ones, view it as a way to manage and understand the flood of data they receive. Our benchmark research on big data analytics shows that business intelligence (BI) is the most common type of system to which organizations deliver big data. However, BI systems aren’t a good fit for analyzing big data. They were built to provide interactive analysis of structured data sources using Structured Query Language (SQL). Big data includes large volumes of data that does not fit into rows and columns, such as sensor data, text data and Web log data. Such data must be transformed and modeled before it can fit into paradigms such as SQL.

The result is that currently many organizations run separate systems for big data and business intelligence. On one system, conventional BI tools as well as new visual discovery tools act on structured data sources to do fast interactive analysis. In this area analytic databases can use column store approaches and visualization tools as a front end for fast interaction with the data. On other systems, big data is stored in distributed systems such as the Hadoop Distributed File System (HDFS). Tools that use it have been developed to access, process and analyze the data. Commercial distribution companies aligned with the open source Apache Foundation, such as Cloudera, Hortonworks and MapR, have built ecosystems around the MapReduce processing paradigm. MapReduce works well for search-based tasks but not so well for the interactive analytics for which business intelligence systems are known. This situation has created a divide between business technology users, who gravitate to visual discovery tools that provide easily accessible and interactive data exploration, and more technically skilled users of big data tools that require sophisticated access paradigms and elongated query cycles to explore data.

vr_Big_Data_Analytics_07_dissatisfaction_with_big_data_analyticsThere are two challenges with the MapReduce approach. First, working with it is a highly technical endeavor that requires advanced skills. Our big data analytics research shows that lack of skills is the most widespread reason for dissatisfaction with big data analytics, mentioned by more than two-thirds of companies. To fill this gap, vendors of big data technologies should facilitate use of familiar interfaces including query interfaces and programming language interfaces. For example, our research shows that Standard SQL is the most important method for implementing analysis on Hadoop. To deal with this challenge, the distribution companies and others offer SQL abstraction layers on top of HDFS, such as HIVE and Cloudera Impala. Companies that I have written about include Datameer and Platfora, whose systems help users interact with Hadoop data via interactive systems such as spreadsheets and multidimensional cubes. With their familiar interaction paradigms such systems have helped increase adoption of Hadoop and enable more than a few experts to access big data systems.

The second challenge is latency. As a batch process MapReduce must sort and aggregate all of the data before creating analytic output. Technology such as Tez, developed by Hortonworks, and Cloudera Impala aim to address such speed limitations; the first leverages MapReduce, and the other circumvents MapReduce altogether. Adoption of these tools has moved the big data market forward, but challenges remain such as the continuing fragmentation of the Hadoop ecosystem and a lack of standardization in approaches.

An emerging technology holds promise for bridging the gap between big data and BI in a way that can unify big data ecosystems rather than dividing them. Apache Spark, under development since 2010 at the University of California Berkeley’s AMPLab, addresses both usability and performance concerns for big data. It adds flexibility by running on multiple platforms in terms of both clustering (such as Hadoop YARN and Apache Mesos) and distributed storage (for example, HDFS, Cassandra, Amazon S3 and OpenStack’s Swift). Spark also expands the potential uses because the platform includes an SQL abstraction layer (Spark SQL), a machine learning library (MLlib), a graph library (GraphX) and a near-real-time engine (Spark Streaming). Furthermore, Spark can be programmed using modern languages such as Python and Scala. Having all of these components integrated is important because interactive business intelligence, advanced analytics and operational intelligence on big data all can work without dealing with the complexity of having individual proprietary systems that were necessary to do the same things previously.

Because of this potential Spark is becoming a rallying point for providers of big data analytics. It has become the most active Apache project as key open source contributors moved their focus from other Hadoop projects to it. Out of the effort in Berkeley, Databricks was founded for commercial development of open source Apache Spark and has raised more than $46 million. Since the initial release in May 2014 the momentum for Spark has continued to build; major companies have made announcements around Apache Spark. IBM said it will dedicate 3,500 researchers and engineers to develop the platform and help customers deploy it. This is the largest dedicated Spark effort in the industry, akin to the move IBM made in the late 1990s with the Linux open source operating system. Oracle has built Spark into its Big Data Appliance. Microsoft has Spark as an option on its HDInsight big data approach but has also announced Prajna, an alternative approach to Spark. SAP has announced integration with its SAP HANA platform, although it represents “coopetition” for SAP’s in-memory platform. In addition, all the major business intelligence players have built or are building connectors to run on Spark. In time, Spark likely will serve as a data ingestion engine for connecting devices in the Internet of Things (IoT). For instance, Spark can integrate with technologies such as Apache Kafka or Amazon Kinesis to instantly process and analyze IoT data so that immediate action can be taken. In this way, as it is envisioned by its creators, Spark can serve as the nexus of multiple systems.

Because it is a flexible in-memory technology for big data, Spark opens the door to many new opportunities, which in business use include interactive analysis, advanced customer analytics,VentanaResearch_NextGenPredictiveAnalytics_BenchmarkResearchfraud detection, and systems and network management. At the same time, it is not yet a mature technology and for this reason,  organizations considering adoption should tread carefully. While Spark may offer better performance and usability, MapReduce is already widely deployed. For those users, it is likely best to maintain the current approach and not fix what is not broken. For future big data use, however, Spark should be carefully compared to other big data technologies. In this case as well as others, technical skills can still be a concern. Scala, for instance, one of the key languages used with Spark, has little adoption, according to our recent research on next-generation predictive analytics. Manageability is an issue as for any other nascent technology and should be carefully addressed up front. While, as noted, vendor support for Spark is becoming apparent, frequent updates to the platform can mean disruption to systems and processes, so examine the processes for these updates. Be sure that vendor support is tied to meaningful business objectives and outcomes. Spark is an exciting new technology, and for early adopters that wish to move forward with it today, both big opportunities and challenges are in store.

Regards,

Ventana Research

One of the key findings in our latest benchmark research into predictive analytics is that companies are incorporating predictive analytics into their operational systems more often than was the case three years ago. The research found that companies are less inclined to purchase stand-alone predictive analytics tools (29% vs 44% three years ago) and more inclined to purchase predictive analytics built into business intelligence systems (23% vs 20%), applications (12% vs 8%), databases (9% vs 7%) and middleware (9% vs 2%). This trend is not surprising since operationalizing predictive analytics – that is, building predictive analytics directly into business process workflows – improves companies’ ability to gain competitive advantage: those that deploy predictive analyticsvr_NG_Predictive_Analytics_12_frequency_of_updating_predictive_models within business processes are more likely to say they gain competitive advantage and improve revenue through predictive analytics than those that don’t.

In order to understand the shift that is underway, it is important to understand how predictive analytics has historically been executed within organizations. The marketing organization provides a useful example since it is the functional area where organizations most often deploy predictive analytics today. In a typical organization, those doing statistical analysis will export data from various sources into a flat file. (Often IT is responsible for pulling the data from the relational databases and passing it over to the statistician in a flat file format.) Data is cleansed, transformed, and merged so that the analytic data set is in a normalized format. It then is modeled with stand-alone tools and the model is applied to records to yield probability scores. In the case of a churn model, such a probability score represents how likely someone is to defect. For a marketing campaign, a probability score tells the marketer how likely someone is to respond to an offer. These scores are produced for marketers on a periodic basis – usually monthly. Marketers then work on the campaigns informed by these static models and scores until the cycle repeats itself.

The challenge presented by this traditional model is that a lot can happen in a month and the heavy reliance on process and people can hinder the organization’s ability to respond quickly to opportunities and threats. This is particularly true in fast-moving consumer categories such as telecommunications or retail. For instance, if a person visits the company’s cancelation policy web page the instant before he or she picks up the phone to cancel the contract, this customer’s churn score will change dramatically and the action that the call center agent should take will need to change as well. Perhaps, for example, that score change should mean that the person is now routed directly to an agent trained to deal with possible defections. But such operational integration requires that the analytic software be integrated with the call agent software and web tracking software in near-real time.

Similarly, the models themselves need to be constantly updated to deal with the fast pace of change. For instance, if a telecommunications carrier competitor offers a large rebate to customers to switch service providers, an organization’s churn model can be rendered out of date and should be updated. Our research shows that organizations that constantly update their models gain competitive advantage more often than those that only update them periodically (86% vs 60% average), more often show significant improvement in organizational activities and processes (73% vs 44%), and are more often very satisfied with their predictive analytics (57% vs 23%).

Building predictive analytics into business processes is more easily discussed than done; complex business and technical challenges must be addressed. The skills gap that I recently wrote about is a significant barrier to implementing predictive analytics. Making predictive analytics operational requires not only statistical and business skills but technical skills as well.   From a technical perspective, one of the biggest challenges for operationalizing predictive analytics is accessing and preparing data which I wrote about. Four out of ten companies say that this is the part of the predictive analytics process vr_NG_Predictive_Analytics_02_impact_of_doing_more_predictive_analyticswhere they spend the most time. Choosing the right software is another challenge that I wrote about. Making that choice includes identifying the specific integration points with business intelligence systems, applications, database systems, and middleware. These decisions will depend on how people use the various systems and what areas of the organization are looking to operationalize predictive analytics processes.

For those that are willing to take on the challenges of operationalizing predictive analytics the rewards can be significant, including significantly better competitive positioning and new revenue opportunities. Furthermore, once predictive analytics is initially deployed in the organization it snowballs, with more than nine in ten companies going on to increase their use of predictive analytics. Once companies reach that stage, one third of them (32%) say predictive analytics has had a transformational impact and another half (49%) say it provides a significant positive benefits.

Regards,

Ventana Research

Our benchmark research into predictive analytics shows that lack of resources, including budget and skills, is the number-one business barrier to the effective deployment and use of predictive analytics; awareness – that is, an understanding of how to apply predictive analytics to business problems – is second. In order to secure resources and address awareness problems a business case needs to be created and communicated clearly wherever appropriate across the organization. A business case presents the reasoning for initiating a project or task. A compelling business case communicates the nature of the proposed project and the arguments, both quantified and unquantifiable, for its deployment.

The first steps in creating a business case for predictive analytics are to understand the audience and to communicate with the experts who will be involved in leading the project. Predictive analytics can be transformational in nature and therefore the audience potentially is broad, including many disciplines within the organization. Understand who should be involved in business case creation a list that may include business users, analytics users and IT. Those most often primarily responsible for designing and deploying predictive analytics are data scientists (in 31% of organizations), the business intelligence and data warehouse team (27%), those working in general IT (16%) and line of business analysts (13%), so be sure to involve these groups. Understand the specific value and challenges for each of the constituencies so the business case can represent the interests of these key stakeholders. I discuss the aspects of the business where these groups will see predictive analytics most adding value here and here.

For the business case for a predictive analytics deployment to be persuasive, executives also must understand how specifically the deployment will impact their areas of responsibilityvr_NG_Predictive_Analytics_01_front_office_functions_use_predictive_anal.._ and what the return on investment will be. For these stakeholders, the argument should be multifaceted. At a high level, the business case should explain why predictive analytics is important and how it fits with and enhances the organization’s overall business plan. Industry benchmark research and relevant case studies can be used to paint a picture of what predictive analytics can do for marketing (48%), operations (44%) and IT (40%), the functions where predictive analytics is used most.

A business case should show how predictive analytics relates to other relevant innovation and analytic initiatives in the company. For instance, companies have been spending money on big data, cloud and visualization initiatives where software returns can be more difficult to quantify. Our research into big data analytics and data and analytics in the cloud show that the top benefit for these initiatives are communication and knowledge sharing. Fortunately, the business case for predictive analytics can cite the tangible business benefits our research identified, the most often identified of which are achieving competitive advantage (57%), creating new revenue opportunities (50%), and increasing profitability vr_NG_Predictive_Analytics_03_benefits_of_predictive_analytics(46%). But the business case can be made even stronger by noting that predictive analytics can have added value when it is used to leverage other current technology investments. For instance, our big data analytics research shows that the most valuable type of analytics to be applied to big data is predictive analytics.

To craft the specifics of the business case, concisely define the business issue that will be addressed. Assess the current environment and offer a gap analysis to show the difference between the current environment and the future environment). Offer a recommended solution, but also offer alternatives. Detail the specific value propositions associated with the change. Create a financial analysis summarizing costs and benefits. Support the analysis with a timeline including roles and responsibilities. Finally, detail the major risk factors and opportunity costs associated with the project.

For complex initiatives, break the overall project into a series of shorter projects. If the business case is for a project that will involve substantial work, consider providing separate timelines and deliverables for each phase. Doing so will keep stakeholders both informed and engaged during the time it takes to complete the full project. For large predictive analytics projects, it is important to break out the due-diligence phase and try not to make any hard commitments until that phase is completed. After all, it is difficult to establish defensible budgets and timelines until one knows the complete scope of the project.

Ensure that the project time line is realistic and addresses all the key components needed for a successful deployment.  In particular with predictive analytics projects, make certain that it reflects a thoughtful approach to data access, data quality and data preparation. We note that four in 10 organizations say vr_NG_Predictive_Analytics_08_time_spent_in_predictive_analytic_processthat the most time spent in the predictive analytics process is in data preparation and another 22 percent say that they spend the most time accessing data sources. If data issues have not been well thought through, it is next to impossible for the predictive analytics initiative to be successful. Read my recent piece on operationalizing predictive analytics to show how predictive analytics will align with specific business processes.

If you are proposing the implementation of new predictive analytics software, highlight the multiple areas of return beyond competitive advantage and revenue benefits. Specifically, new software can have a total lower cost of ownership and generate direct cost savings from improved operating efficiencies. A software deployment also can yield benefits related to people (productivity, insight, fewer errors), management (creativity, speed of response), process (shorter time on task or time to complete) and information (easier access, more timely, accurate and consistent). Create a comprehensive list of the major benefits the software will provide compared to the existing approach, quantifying the impact wherever possible. Detail all major costs of ownership whether the implementation is on-premises or cloud-based: these will include licensing, maintenance, implementation consulting, internal deployment resources, training, hardware and other infrastructure costs. In other words, think broadly about both the costs and the sources of return in building the case for new technology. Also, read my recent piece on procuring predictive analytics software.

Understanding the audience, painting the vision, crafting the specific case, outlining areas of return, specifying software, noting risk factors, and being as comprehensive as possible are all part of a successful business plan process. Sometimes, the initial phase is really just a pitch for project funding and there won’t be any dollar allocation until people are convinced that the program will get them what they need.  In such situations multiple documents may be required, including a short one- to two-page document that outlines vision and makes a high-level argument for action from the organizational stakeholders. Once a cross functional team and executive support is in place, a more formal assessment and design plan following the principles above will have to be built.

Predictive analytics offers significant returns for organizations willing pursue it, but establishing a solid business case is the first step for any organization.

Regards,

Ventana Research

As I discussed in the state of data and analytics in the cloud recently, usability is a top evaluation criterion for organizations in selecting cloud-based analytics software. Data access of cloud and on-premises systems are essential antecedents of usability. They can help business people perform analytic tasks themselves without having to rely on IT. Some tools allow data integration by business users on an ad hoc basis, but to provide an enterprise integration process and a governed information platform, IT involvement is often necessary. Once that is done, though, using cloud-based data for analytics can help, empowering business users and improving communication and process .

vr_DAC_16_dealing_with_multiple_data_sourcesTo be able to make the best decisions, organizations need access to multiple integrated data sources. The research finds that the most common data sources are predictable: business applications (51%), business intelligence applications (51%), data warehouses or operational data stores (50%), relational databases (41%) and flat files (33%). Increasingly, though, organizations also are including less structured sources such as semistructured documents (33%), social media (27%) and nonrelational database systems (19%). In addition there are important external data sources, including business applications (for 61%), social media data (48%), Internet information (42%), government sources (33%) and market data (29%). Whether stored in the cloud or locally, data must be normalized and combined into a single data set so that analytics can be performed.

Given the distributed nature of data sources as well as the diversity of data types, information platforms and integration approaches are changing. While more than three in five companies (61%) still do integration primarily between on-premises systems, significant percentages are now doing integration from the cloud to on-premises (47%) and from on-premises to the cloud (39%). In the future, this trend will become more pronounced. According to our research, 85 percent of companies eventually will integrate cloud data with on-premises sources, and 84 percent will do the reverse. We expect that hybrid architectures, a mix of on-premises and cloud data infrastructures, will prevail in enterprise information architectures for years to come while slowly evolving to equality of bidirectional data transfer between the two types.

Further analysis shows that a focus on integrating data for cloud analytics can give organizations competitive advantage. Those who said it is very important to integrate data for cloud-based analytics (42% of participants) also said they are very confident in their ability to use the cloud for analytics (35%); that’s three times more often than those who said integrating data is important (10%) or somewhat important (9%). Those saying that integration is very important also said more often that cloud-based analytics helps their customers, partners and employees in an array of ways, including improved presentation of data and analytics (62% vs. 43% of those who said integration is important or somewhat important), gaining access to many different data sources (57% vs. 49%) and improved data quality and data management (59% vs. 53%). These numbers indicate that organizations that neglect the integration aspects of cloud analytics are likely to be at a disadvantage compared to their peers that make it a priority.

Integration for cloud analytics is typically a manual task. In particular, almost half (49%) of organizations in the research use spreadsheets to manage the integration and preparation of cloud-based data. Yet doing so poses serious challenges: 58 percent of those using spreadsheets said it hampers their ability to manage processes efficiently. While traditional methods may suffice for integrating relatively small and well-defined data sets in an on-premises environment, they have limits when dealing with the scale and complexity of cloud-based data. vr_DAC_02_satisfaction_with_data_integration_toolsThe research also finds that organizations utilizing newer integration tools are satisfied with them more often than those using older tools. More than three-fourths (78%) of those using tools provided by a cloud applications  provider said they are satisfied or somewhat satisfied with them, as are even more (86%) of those using data integration tools designed for cloud computing; by comparison, fewer of those using spreadsheets (56%) or traditional enterprise data integration tools (71%) are satisfied.

This is not surprising. Modern cloud connectors are designed to connect via loosely coupled interfaces that allow cloud systems to share data in a flexible manner. The research thus suggests that for organizations needing to integrate data from cloud-based data sources, switching to modern integration tools can streamline the process.

Overall three-quarters of companies in our research said that it is important or very important to access data from cloud-based sources for analysis. Cloud-based analytics isn’t useful unless the right data can be fed into the analytic process. But without capable tools this is not easy to do. A substantial impediment is that analysts spend the majority of their time in accessing and preparing the data rather than in actual analysis. Complicating the task, each data source can represent a different, possibly complex, data model. Furthermore, the data sets may have varying data formats and interface requirements, which are not easily addressed with legacy integration tools.

Such complexity is the new reality, and new tools and approaches have come to market to address these complexities. For organizations looking to integrate their data for cloud-based analytics, we recommend exploring these new integration processes and technologies.

Regards,

Ventana Research

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Stats

  • 73,504 hits
%d bloggers like this: