You are currently browsing the category archive for the ‘IT Analytics & Performance’ category.

Splunk’s annual gathering, this year called .conf 2015, in late September hosted almost 4,000 Splunk customers, partners and employees. It is one of the fastest-growing user conferences in the technology industry. The area dedicated to Splunk partners has grown from a handful of booths a few years ago to a vast showroom floor many times larger. While the conference’s main announcement was the release of Splunk Enterprise 6.3, its flagship platform, the progress the company is making in the related areas of machine learning and the Internet of Things (IoT) most caught my attention.

Splunk’s strength is its ability to index, normalize, correlate and query data throughout the technology stack, including applications, servers, networks and sensors. It uses distributed search that enables correlation and analysis of events across local- and wide-area networks without moving vast amounts of data. Its architectural approach unifies cloud and on-premises implementations and provides extensibility for developers building applications. Originally, Splunk provided an innovative way to troubleshoot complex technology issues, but over time new uses for Splunk-based data have emerged, including digital marketing analytics, cyber security, fraud prevention and connecting digital devices in the emerging Internet of Things. Ventana Research has covered Splunk since its establishment in the market, most recently in this analysis of mine.

Splunk’s experience in dealing directly with distributed, time-series data and processes on a large scale puts it in position to address the Internet of Things from an industrial perspective. This sort of data is at the heart of large-scale industrial control systems, but it often comes in different formats and its implementation is based on different formats and protocols. For instance, sensor technology and control systems that were invented 10 to 20 years ago use very different technology than modern systems. Furthermore, as with computer technology, there are multiple layers in stack models that have to communicate. Splunk’s tools help engineers and systems analysts cross-reference these disparate systems in the same way that it queries computer system and network data, however, the systems can be vastly different. To address this challenge, Splunk turns to its partners and its extensible platform. For example, Kepware has developed plug-ins that use its more than 150 communication drivers so users can stream real-time industrial sensor and machine data directly into the Splunk platform. Currently, the primary value drivers for organizations in this field of the industrial IoT are operational efficiency, predictive maintenance and asset management. At the conference, Splunk showcased projects in these areas including one with Target that uses Splunk to improve operations in robotics and manufacturing.

For its part, Splunk is taking a multipronged approach by acquiring companies, investing in internal development and enabling its partner ecosystem to build new products. One key enabler of its approach to IoT is machine learning algorithms built on the Splunk platform. In machine learning a model can use new data to continuously learn and adapt its answers to queries. This differs from conventional predictive analytics, in which users build models and validate them based on a particular sample; the model does not adapt over time. With machine learning, for instance, if a piece of equipment or an automobile shows a certain optimal pattern of operation over time, an algorithm can identify that pattern and build a model for how that system should behave. When the equipment begins to act in a less optimal or anomalous way, the system can alert a human operator that there may be a problem, or in a machine-to-machine situation, it can invoke a process to solve the problem or recalibrate the machine.

Machine learning algorithms allow event processes to be audited, analyzed and acted upon in real time. They enable predictive capabilities for maintenance, transportation and logistics, and asset management and can also be applied in more people-oriented domains such as fraud prevention, security, business process improvement, and digital products.  IoT potentially can have a major impact on business processes, but only if organizations can realign systems to discover-and-adapt rather than model-and-apply approaches. For instance, processes are often carried out in an uneven fashion different from the way the model was conceived and communicated through complex process documentation and systems. As more process flows are directly instrumented and more processes carried out by machines, the ability to model directly based on the discovery of those event flows and to adapt to them (through human learning or machine learning) becomes key to improving organizational processes. Such realignment of business processes, however, often involves broad organizational transformation.Our benchmark research on operational intelligence shows that challenges associated with people and processes, rather than information and technology, most often hold back organizational improvement.

Two product announcements made at the conference illuminate the direction Splunk is taking with IoT and machine learning. The first is User Behavior Analytics (UBA), based VR2015_InnovationAwardWinneron its acquisition of Caspida, which produces advanced algorithms that can detect anomalous behavior within a network. Such algorithms can model internal user behavior, and when behavior deviates from the specified norm, it can generate an alert that can be addressed through investigative processes usingSplunk Enterprise Security 4.0. Together, Splunk Enterprise Security 4.0 and UBA won the 2015 Ventana Research CIO Innovation Award.The acquisition of Caspida shows that Splunk is not afraid to acquire companies in niche areas where they can exploit their platform to deliver organizational value. I expect that we will see more such acquisitions of companies with high value ML algorithms as Splunk carves out specific positions in the emergent markets.

The other product announced is IT Service Intelligence (ITSI), which highlights machine learning algorithms alongside of Splunk’s core capabilities. The IT Service Intelligence App is an application in which end users deploy machine learning to see patterns in various IT service scenarios. ITSI can inform and enable multiple business uses such as predictive maintenance, churn analysis, service level agreements and chargebacks. Similar to UBA, it uses anomaly detection to point out issues and enables managers to view highly distributed processes such as claims process data in insurance companies. At this point, however, use of ITSI (like other areas of IoT) may encounter cultural and political issues as organizations deal with changes in the roles of IT and operations management. Splunk’s direction with ITSI shows that the company is staying close to its IT operations knitting as it builds out application software, but such development also puts Splunk into new competitive scenarios where legacy technology and processes may still be considered good enough.

We note that ITSI is built using Splunk’s Machine Learning Toolkit and showcase, which currently is in preview mode. The vr_Big_Data_Analytics_08_top_capabilities_of_big_data_analyticsplatform is an important development for the company and fills one of the gaps that I pointed out in its portfolio last year. Addressing this gap enables Splunk and its partners to create services that apply advanced analytics to big data that almost half (45%) of organizations find important. The use of predictive and advanced analytics on big data I consider a killer application for big data; our benchmark research on big data analytics backs this claim: Predictive analytics is the type of analytics most (64%) organizations wish to pursue on big data.

Organizations currently looking at IoT use cases should consider Splunk’s strategy and tools in the context of specific problems they need to address. Machine learning algorithms built for particular industries are key so it is important to understand if the problem can be addressed using prebuilt applications provided by Splunk or one of its partners, or if the organization will need to build its own algorithms using the Splunk machine learning platform or alternatives. Evaluate both the platform capabilities and the instrumentation, the type of protocols and formats involved and how that data will be consumed into the system and related in a uniform manner. Most of all, be sure the skills and processes in the organization align with the technology from an end user and business perspective.

Regards,

Ventana Research

The concept and implementation of what is called big data are no longer new, and many organizations, especially larger ones, view it as a way to manage and understand the flood of data they receive. Our benchmark research on big data analytics shows that business intelligence (BI) is the most common type of system to which organizations deliver big data. However, BI systems aren’t a good fit for analyzing big data. They were built to provide interactive analysis of structured data sources using Structured Query Language (SQL). Big data includes large volumes of data that does not fit into rows and columns, such as sensor data, text data and Web log data. Such data must be transformed and modeled before it can fit into paradigms such as SQL.

The result is that currently many organizations run separate systems for big data and business intelligence. On one system, conventional BI tools as well as new visual discovery tools act on structured data sources to do fast interactive analysis. In this area analytic databases can use column store approaches and visualization tools as a front end for fast interaction with the data. On other systems, big data is stored in distributed systems such as the Hadoop Distributed File System (HDFS). Tools that use it have been developed to access, process and analyze the data. Commercial distribution companies aligned with the open source Apache Foundation, such as Cloudera, Hortonworks and MapR, have built ecosystems around the MapReduce processing paradigm. MapReduce works well for search-based tasks but not so well for the interactive analytics for which business intelligence systems are known. This situation has created a divide between business technology users, who gravitate to visual discovery tools that provide easily accessible and interactive data exploration, and more technically skilled users of big data tools that require sophisticated access paradigms and elongated query cycles to explore data.

vr_Big_Data_Analytics_07_dissatisfaction_with_big_data_analyticsThere are two challenges with the MapReduce approach. First, working with it is a highly technical endeavor that requires advanced skills. Our big data analytics research shows that lack of skills is the most widespread reason for dissatisfaction with big data analytics, mentioned by more than two-thirds of companies. To fill this gap, vendors of big data technologies should facilitate use of familiar interfaces including query interfaces and programming language interfaces. For example, our research shows that Standard SQL is the most important method for implementing analysis on Hadoop. To deal with this challenge, the distribution companies and others offer SQL abstraction layers on top of HDFS, such as HIVE and Cloudera Impala. Companies that I have written about include Datameer and Platfora, whose systems help users interact with Hadoop data via interactive systems such as spreadsheets and multidimensional cubes. With their familiar interaction paradigms such systems have helped increase adoption of Hadoop and enable more than a few experts to access big data systems.

The second challenge is latency. As a batch process MapReduce must sort and aggregate all of the data before creating analytic output. Technology such as Tez, developed by Hortonworks, and Cloudera Impala aim to address such speed limitations; the first leverages MapReduce, and the other circumvents MapReduce altogether. Adoption of these tools has moved the big data market forward, but challenges remain such as the continuing fragmentation of the Hadoop ecosystem and a lack of standardization in approaches.

An emerging technology holds promise for bridging the gap between big data and BI in a way that can unify big data ecosystems rather than dividing them. Apache Spark, under development since 2010 at the University of California Berkeley’s AMPLab, addresses both usability and performance concerns for big data. It adds flexibility by running on multiple platforms in terms of both clustering (such as Hadoop YARN and Apache Mesos) and distributed storage (for example, HDFS, Cassandra, Amazon S3 and OpenStack’s Swift). Spark also expands the potential uses because the platform includes an SQL abstraction layer (Spark SQL), a machine learning library (MLlib), a graph library (GraphX) and a near-real-time engine (Spark Streaming). Furthermore, Spark can be programmed using modern languages such as Python and Scala. Having all of these components integrated is important because interactive business intelligence, advanced analytics and operational intelligence on big data all can work without dealing with the complexity of having individual proprietary systems that were necessary to do the same things previously.

Because of this potential Spark is becoming a rallying point for providers of big data analytics. It has become the most active Apache project as key open source contributors moved their focus from other Hadoop projects to it. Out of the effort in Berkeley, Databricks was founded for commercial development of open source Apache Spark and has raised more than $46 million. Since the initial release in May 2014 the momentum for Spark has continued to build; major companies have made announcements around Apache Spark. IBM said it will dedicate 3,500 researchers and engineers to develop the platform and help customers deploy it. This is the largest dedicated Spark effort in the industry, akin to the move IBM made in the late 1990s with the Linux open source operating system. Oracle has built Spark into its Big Data Appliance. Microsoft has Spark as an option on its HDInsight big data approach but has also announced Prajna, an alternative approach to Spark. SAP has announced integration with its SAP HANA platform, although it represents “coopetition” for SAP’s in-memory platform. In addition, all the major business intelligence players have built or are building connectors to run on Spark. In time, Spark likely will serve as a data ingestion engine for connecting devices in the Internet of Things (IoT). For instance, Spark can integrate with technologies such as Apache Kafka or Amazon Kinesis to instantly process and analyze IoT data so that immediate action can be taken. In this way, as it is envisioned by its creators, Spark can serve as the nexus of multiple systems.

Because it is a flexible in-memory technology for big data, Spark opens the door to many new opportunities, which in business use include interactive analysis, advanced customer analytics,VentanaResearch_NextGenPredictiveAnalytics_BenchmarkResearchfraud detection, and systems and network management. At the same time, it is not yet a mature technology and for this reason,  organizations considering adoption should tread carefully. While Spark may offer better performance and usability, MapReduce is already widely deployed. For those users, it is likely best to maintain the current approach and not fix what is not broken. For future big data use, however, Spark should be carefully compared to other big data technologies. In this case as well as others, technical skills can still be a concern. Scala, for instance, one of the key languages used with Spark, has little adoption, according to our recent research on next-generation predictive analytics. Manageability is an issue as for any other nascent technology and should be carefully addressed up front. While, as noted, vendor support for Spark is becoming apparent, frequent updates to the platform can mean disruption to systems and processes, so examine the processes for these updates. Be sure that vendor support is tied to meaningful business objectives and outcomes. Spark is an exciting new technology, and for early adopters that wish to move forward with it today, both big opportunities and challenges are in store.

Regards,

Ventana Research

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Stats

  • 72,942 hits
%d bloggers like this: