You are currently browsing the monthly archive for November 2014.

PentahoWorld, the first user conference for this 10-year-old supplier of data integration and business intelligence that provides business analytics, attracted more than 400 customers in roles ranging from IT and database professionals to business analysts and end users. The diversity of the crowd reflects Pentaho’s broad portfolio of products. It covers the integration aspects of big data analytics with the Pentaho Data Integration tools and the front-end tools and visualization with the Pentaho Business Analytics. In essence its portfolio provides end-to-end data to analytics through what they introduced as Big Data Orchestration that brings governed data delivery and streamlined data refinery together on one platform.

vr_BDI_03_plans_for_big_data_technologyPentaho has made progress in business over the past year, picking up Fortune 1000 clients and moving from providing analytics to midsize companies to serving more major companies such as Halliburton, Lufthansa and NASDAQ. One reason for this success is Pentaho’s ability to integrate large scale data from multiple sources including enterprise data warehouses, Hadoop and other NoSQL approaches. Our research into big data integration shows that Hadoop is a key technology that 44 percent of organizations are likely to use, but it is just one option in the enterprise data environment. A second key for Pentaho has been the embeddable nature of its approach, which enables companies, especially those selling cloud-based software as a service (SaaS), to use analytics to gain competitive advantage by placing its tools within their applications. For more detail on Pentaho’s analytics and business intelligence tools please my previous analytic perspective.

A key advance for the company over the past year has been the development and refinement of what the company calls big data blueprints. These are general use cases in such areas as ETL offloading and customer analytics. Each approach includes design patterns for ETL and analytics that work with high-performance analytic databases including NoSQL variants such as Mongo and Cassandra.

The blueprint concept is important for several reasons. First, it helps Pentaho focus on specific market needs. Second, it shows customers and partners processes that enable them to get immediate return on the technology investment. The same research referenced above shows that organizations manage their information and technology better than their people and processes; to realize full value from spending on new technology, they need to pay more attention to how the technology fits with these cultural aspects.

vr_Info_Optimization_09_most_important_end_user_capabilitiesAt the user conference, the company announced release 5.2 of its core business analytics products and featured its Governed Data Delivery concept and Streamlined Data Refinery. The Streamlined Data Refinery provides a process for business analysts to access the already integrated data provided through PDI and create data models on the fly. The advantage is that this is not a technical task and the business analyst does not have to understand the underlying metadata or the data structures. The user chooses the dimensions of the analysis using menus that offer multiple combinations to be chosen in an ad hoc manner. Then the Streamlined Data Refinery automatically generates a data cube that is available for fast querying of an analytic database. Currently, Pentaho supports only the HP Vertica database, but its roadmap promises to add high-performance databases from other suppliers. The entire process can take only a few minutes and provides a much more flexible and dynamic process than asking IT to rebuild a data model every time a new question is asked.

While Pentaho Data Integration enables users to bring together all available data and integrate it to find new insights, Streamlined Data Refinery gives business users direct access to the blended data. In this way they can explore data dynamically without involving IT. The other important aspect is that it easily provides the lineage of the data. Internal or external auditors often need to understand the nature of the data and the integration, which data lineage supports. Such a feature should benefit all types of businesses but especially those in regulated industries. This approach addresses the two top needs of business end users, which according to our benchmark research into information optimization, are to drill into data (for 37%) and search for specific information (36%).

Another advance is Pentaho 5.2’s support for Kerberos security on Cloudera, Hortonworks and MapR. Cloudera, currently the largest Hadoop distribution, and Hortonworks, which is planning to raise capital via a public offering, hold the lion’s share of the commercial Hadoop market. Kerberos puts a layer of authentication security between the Pentaho Data Integration tool and the Hadoop data. This helps address security concerns which have dramatically increased over the past year after major breaches at retailers, banks and government institutions.

These announcements show results of Pentaho’s enterprise-centric customer strategy as well as the company’s investment in senior leadership. Christopher Dziekan, the new chief product officer, presented a three-year roadmap that focuses on data access, governance and data integration. It is good to see the company put its stake in the ground with a well-formed vision of the big data market. Given the speed at which the market is changing and the necessity for Pentaho to consider the needs of its open source community, it will be interesting to see how the company adjusts the roadmap going forward.

For enterprises grappling with big data integration and trying to give business users access to new information sources, Pentaho’s Streamlined Data Refinery deserves a look. For both enterprises and ISVs that want to apply integration and analytics in context of another application, Pentaho’s REST-based APIs allow embedding of end-to-end analytic capabilities. Together with the big data blue prints discussed above, Pentaho is able to deliver a targeted yet flexible approach to big data.


Ventana Research

vr_oi_information_sources_for_operational_intelligenceAt a conference of more than 3,500 users, Splunk executives showed off their company’s latest tools. Splunk makes software for discovering, monitoring and analyzing machine data, which is often considered data exhaust since it is a by-product of computing processes and applications. But machine data is essential to a smoothly running technology infrastructure that supports business process. One advantage is that because machine data not recorded by end users, it is less subject to input error. Splunk has grown rapidly by solving fundamental problems associated with the complexities of information technology and challenging assumptions in IT systems and network management that is rapidly being referred to as big data analytics. The two main and related assumptions it challenges are that different types of IT systems should be managed separately and that data should be modeled prior to recording it. Clint Sharp, Splunk’s director of product marketing, pointed out that network and system data can come from several sources and argued that utilizing point solution tools and a “model first” approach does not work when it has to deal with big data and a question-and-answer paradigm. Our research into Operational Intelligence finds that IT systems are most important information source in almost two thirds (62%) of organizations. Splunk used the conference to show how it has brought to these data management innovations the business trends of mobility, cloud deployment and security.

Presenters from major customer companies demonstrated how they work with Splunk Enterprise. For example, according to Michael Connor, senior platform architect for Coca-Cola, bringing all the company’s data into Splunk allowed the IT department to reduce trouble tickets by 80 percent and operating costs by 40 percent. Beyond asserting the core value of streamlining IT operations and the ability to quickly provision system resources, Connor discussed other uses for data derived from the Splunk product. Coca-Cola IT used a free community add-on to deliver easy-to-use dashboards for the security team. He showed how channel managers compare different vending environments in ways they had never done before. They also can conduct online ethnographic studies to better understand behavior patterns and serve different groups. For Coca-Cola, the key to success for the application was to bring data from various platforms in the organization into one data platform. This challenge, he said, is more to do with people and processes than technology, since many parts of an organization are protective of their data, in effect forming what he called “data cartels.” This situation is not uncommon. Our research into information optimization shows that organizations need these so-called softer disciplines to catch up with their capabilities in technology and information to realize full value from data and analytics initiatives.

In keeping up with trends, Splunk is making advances in mobility. One is MINT for monitoring mobile devices. With the company’s acquisition of BugSense as a foundation, Splunk has built an extension of its core platform that consumes and indexes application and other machine data from mobile devices. The company is offering the MINT Express version to developers so they can build the operational service into their applications. Similar to the core product, MINT has the ability to track transactions, network latency and crashes throughout the IT stack. It can help application developers quickly solve user experience issues by understanding root causes and determining responsibility. For instance, MINT Express can answer questions such as these: Is it an application issue or a carrier issue? Is it a bad feature or a system problem? After it is applied, end-user customers get a better digital experience which results in more time spent with the application and increased customer loyalty in a mobile environment where the cost of switching is low. Splunk also offers MINT Enterprise, which allows users to link and cross-reference data in Splunk Enterprise. The ability to instrument data in a mobile environment, draw a relationship with the enterprise data  and display key operational variables is critical to serving and satisfying consumers. By extending this capability into the mobile sphere, Splunk MINT delivers value for corporate IT operations as well as the new breed of cloud software providers. However, Splunk risks stepping on its partners’ toes as it takes advantage of certain opportunities as in mobility. In my estimation, the risk is worth taking given that mobility is a systemic change that represents enormous opportunity. Our research into business technology innovation shows mobility in a virtual tie with collaboration for the second-most important innovation priority for companies today.

vr_Info_Optimization_17_with_more_sources_easy_access_more_importantCloud computing is another major shift that the company is prioritizing. Praveen Rangnath, director of cloud product marketing, said that Splunk Cloud enables the company to deliver 100 percent on service level agreements through fail-over capabilities across AWS availability zones, redundant operations across indexers and search heads, and by using Splunk on Splunk itself. Perhaps the most important capability of the cloud product is its integration of enterprise and on-demand systems. This capability allows a single view and queries across multiple data sources no matter where they physically reside. Coupled with Splunk’s abilities to ingest data from various NoSQL systems – such as Mongo, Cassandra, Accumulo, Amazon’s Elastic Map Reduce, Amazon S3 and even mainframes – with the Ironstream crawler, its hybrid search capability is unique. The company’s significant push into the cloud is reflected by both a 33 percent reduction in price and its continued investment into the platform. According to our research into information optimization one of the biggest challenges with big data is simplification of data access; as data sources increase easy access becomes more important. More than 92 percent of organizations that have  16 to 20 data sources rated information simplification very important. As data proliferates both on-premises and in the cloud, Splunk’s software abstracts users from the technical complexities of integrating and accessing the hybrid environment. (Exploring this and related issues, our upcoming benchmark research into data and analytics in the cloud will examine trends in business intelligence and analytics related to cloud computing.)

vr_ngbi_br_importance_of_bi_technology_considerations_updatedUsability is another key consideration: In our research on next-generation business intelligence nearly two-thirds (63%) of organizations said that is an important evaluation criterion, more than any other one. At the user conference Divanny Lamas, senior manager of product management, discussed new features aimed at the less sophisticated Splunk user. Advanced Feature Extractor enables users to extract fields in a streamlined fashion that does not require them to write an expression. Instant Pivot enables easy access to a library of panels and dashboards that allows end users to pivot and visually explore data. Event Pattern Detection clusters patterns in the data to make different product usage metrics and issues impacting downtime easier to resolve. Each of these advances represents progress in broader usability and organizational appeal. While Splunk continues to make its data accessible to business users, gaining broader adoption is still an uphill battle because much of the Splunk data is technical in nature. The current capabilities address the technologically sophisticated knowledge worker or the data analyst, while a library of plug-ins allows more line-of-business end-users to perform visualization. (For more on the analytic user personas that matter in the organization and what they need to be successful, please see my analysis.)

Splunk is building an impressive platform for collecting and analyzing data across the organization. The question from the business analytics perspective is whether the data can be modeled in ways that easily represent each organization’s unique business challenges. Splunk provides search capabilities for IT data by default, but when other data sources need to be brought in for more advanced reporting and correlation, it requires the data to be normalized, categorized and parsed. Currently, business users apply various data models and frameworks from major IT vendors as well as various agencies and data brokers. This dispersion could provide an opportunity for Splunk to provide a unified platform; the more data businesses ingest, the more likely they will rely on such a platform. Splunk’s Common Information Model provides a metadata framework using key-value pair representation similar to what other providers of cloud analytic applications are doing. When we consider the programmable nature of the platform including RESTful APIs and various SDKs, HUNK’s streamlined access to Hadoop and other NoSQL sources, Splunk BD connect for relational sources, the Splunk Cloud hybrid access model and the instrumentation of mobile data in MINT, the expansive platform idea seems plausible.

vr_Big_Data_Analytics_04_types_of_big_data_for_analyticsA complicating factor as to whether Splunk will become such a platform for operational intelligence and big data analytics is the Internet of Things (IoT), which collects data from various devices. Massive amounts of sensor data already are moving through the Internet, but IoT approaches and service architectures are works in progress. Currently, many of these architectures do not communicate with others. Given Splunk’s focus on machine data which is a key type of input for big data analytics in 42 percent of organizations according to our research, IoT appears to be a natural fit as it is generating event-centered data which is a type of input for big data analytics in 48 percent of organizations. There is some debate about whether Splunk is a true event processing engine, but that depends on how the category is defined. Log messages, its specialty, are not events per se but rather are data related to something that has happened in an IT infrastructure. Once correlated, this data points directly to something of significance, including events that can be acted upon. If such a correlation triggers a system action, and that action is taken in time to solve the problem, then the data provides value and it should not matter if the system is acting in real time or near real time. In this way, the data itself is Splunk’s advantage. To be successful in becoming a broader data platform, the company will need to advance its Common Information Model, continue to emphasize the unique value of machine data, build their developer and partner ecosystem, and encourage customers to push the envelope and develop new use cases.

For organizations considering Splunk for the first time, IT operations, developer operations, security, fraud management and compliance management are obvious areas to evaluate. Splunk’s core value is that it simplifies administration, reduces IT costs and can reduce risk through pattern recognition and anomaly detection. Each of these areas can deliver value immediately. For those with a current Splunk implementation, we suggest examining use cases related to business analytics. Specifically, comparative analysis and analysis of root causes, online ethnography and feature optimization in the context of the user experience can all deliver value. As ever more data comes into their systems, companies also may find it reasonable to consider Splunk in other new ways like big data analytics and operational intelligence.


Ventana Research

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.


  • 73,783 hits
%d bloggers like this: