You are currently browsing the tag archive for the ‘Pentaho’ tag.

PentahoWorld 2015, Pentaho’s second annual user conference, held in mid-October, centered on the general availability of release 6.0 of its data integration and analytics platform and its acquisition by Hitachi Data Systems (HDS) earlier this year. Company spokespeople detailed the development of the product in relation to the roadmap laid out in 2014 and outlined plans for its integration with those of HDS and its parent Hitachi. They also discussed Pentaho’s and HDS’s shared intentions regarding the Internet of Things (IoT), particularly in telecommunications, healthcare, public infrastructure and IT analytics.

Pentaho competes on the basis of what it calls a “streamlined data refinery” that enables a flexible way to access, transform and integrate data and embed and present analytic data sets in usable formats without writing new code. In addition, it integrates a visual analytic workflow interface with a business intelligence front end including customization extensions; this is a differentiator for the company since much of the self-serve analytics market in which it competes is still dominated by separate point products.

Pentaho 6 aims to provide manageable and scalable self-service analytics. A key advance in the new version is what Pentaho calls “virtualized data sets” that logically aggregate multiple data sets according to transformations and integration specified by the Pentaho Data Integration (PDI) analytic workflow interface. This virtual approach allows the physical processing to be executed close to the data in various systems such as Hadoop or an RDBMS, which relieves users of the burden of having to continually move data back and forth between the vr_oi_factors_impeding_ol_implementationquery and the response systems. In this way, logical data sets can be served up for consumption in Pentaho Analytics as well as other front-end interfaces in a timely and flexible manner.

One challenge that emerges when accessing multiple integrated and transformed data sets is data lineage. Tracking its lineage is important to establish trust in the data among users by enabling them to ascertain the origin of data prior to transformation and integration. This is particularly useful in regulated industries that may need access to and tracking of source data to prove compliance. This becomes even more complicated with events and completely sourcing them along with the large number of them as found in over a third of organizations in our operational intelligence benchmark research that examined operational centric analytics and business intelligence.

Similarly, Pentaho 6 uses Simple Network Management Protocol (SNMP) to deliver application programming interface (API) extensions so that third-party tools can help provide governance lower in the system stack to further enable reliability of data. Our benchmark research consistently shows that manageability of systems is important for user organizations and in particular for big data environments.

The flexibility introduced with virtual tables and improvements in Pentaho 6.0 around in-line modeling (a concept I discussed after last year’s event are two critical means to building self-service analytic environments. Marrying various data systems with different data models, sometimes referred to as big data integration, has proven to be a difficult challenge in such environments. Pentaho’s continued focus on vr_BDI_01_automating_big_data_integrationbig data integration and providing an integration backbone to the many business intelligence tools (in addition to its own) are potential competitive differentiators for the company. While analysts and users prefer integrated tool sets, today’s fragmented analytics market is increasingly dominated by separate tools that prepare data and surface data for consumption. Front-end tools alone cannot automate the big data integration process, which Pentaho PDI can do.Our research into big data integration shows the importance of eliminating manual tasks in this process: 78 percent of companies said it is important or very important to automate their big data integration processes. Pentaho’s ability to integrate with multiple visual analytics tools is important for the company, especially in light of the HDS accounts, which likely have a variety of front-end tools. In addition, the ability to provide an integrated front end can be attractive to independent software vendors, analytics services providers and certain end-user organizations that would like to embed both integration and visualization without having to license multiple products.

Going forward, Pentaho is focused on joint opportunities with HDS such as the emerging Internet of Things. Pentaho cites established industrial customers such as Halliburton, Intelligent Mechatonic Systems and Kirchoff Datensysteme Software as reference accounts for IoT. In addition, a conference participant from Caterpillar Marine Asset Intelligence shared how it embeds Pentaho to help analyze and predict equipment failure on maritime equipment. Pentaho’s ability to integrate and analyze multiple data sources is key to delivering business value in each of these environments, but the company also possesses a little-known asset in the Weka machine learning library, which is an integrated part of the product suite. Our research on next-generation predictive analytics finds that Weka is used by 5 percent of organizations, and many of the companies that use it are large or very large, which is Pentaho’s target market. Given the importance of machine learning in the IoT category, it will be interesting to see how Pentaho leverages this asset.

Also at the conference, an HDS spokesperson discussed its target markets for IoT or what the company calls “social innovation.” These markets include telecommunications, healthcare, public infrastructure and IT analytics and reflect HDS’s customer base and the core businesses of its parent company Hitachi. Pentaho Data Integration is currently embedded within major customer environments such as Caterpillar, CERN, FINRA, Halliburton, NASDAQ, Sears and Staples, but not all of these companies fit directly into the IoT segments HDS outlined. While Hitachi’s core businesses provide a fertile ground in which grow its business, Pentaho will need to develop integration with the large industrial control systems already in place in those organizations.

The integration of Pentaho into HDS is a key priority. The 2,000-strong global sales force of HDS is now incented to sell Pentaho, and it will be important for the reps to include it as they discuss their accounts’ needs. While Pentaho’s portfolio can potentially broaden sales opportunities for HDS, big data software is a more consultative sale than the price-driven hardware and systems that the sales force may be used to. Furthermore, the buying centers, which are shifting from IT to lines of business, can be significantly different based on the type of organization and their objectives. To address this will require significant training within the HDS sales force and with partner consulting channels. The joint sales efforts will be well served by emphasizing the “big data blueprints” developed by Pentaho over the last couple of years and developing of new ones that speak to IoT and the combined capabilities of the two companies.

HDS says it will begin to embed Pentaho into its product portfolio but has promised to leave Pentaho’s roadmap intact. This is important because Pentaho has done a good job of listening to its customers and addressing the complexities that exist in big data and open source environments. As the next chapter unfolds, I will be looking at how the company integrates its platform with the HDS portfolio and expands it to deal with the complexities of IoT, which we will be investigating in upcoming benchmark research study.

For organizations that need to use large-scale integrated data sets, Pentaho provides one of the most flexible yet mature tools in the market, and they should consider it. The analytics tool provides an integrated and embeddable front end that should be of particular interest to analytics services providers and independent software vendors seeking to make information management and data analytics core capabilities. For existing HDS customers, the Pentaho portfolio will open conversations in new areas of those organizations and potentially add considerable value within accounts.


Ventana Research

PentahoWorld, the first user conference for this 10-year-old supplier of data integration and business intelligence that provides business analytics, attracted more than 400 customers in roles ranging from IT and database professionals to business analysts and end users. The diversity of the crowd reflects Pentaho’s broad portfolio of products. It covers the integration aspects of big data analytics with the Pentaho Data Integration tools and the front-end tools and visualization with the Pentaho Business Analytics. In essence its portfolio provides end-to-end data to analytics through what they introduced as Big Data Orchestration that brings governed data delivery and streamlined data refinery together on one platform.

vr_BDI_03_plans_for_big_data_technologyPentaho has made progress in business over the past year, picking up Fortune 1000 clients and moving from providing analytics to midsize companies to serving more major companies such as Halliburton, Lufthansa and NASDAQ. One reason for this success is Pentaho’s ability to integrate large scale data from multiple sources including enterprise data warehouses, Hadoop and other NoSQL approaches. Our research into big data integration shows that Hadoop is a key technology that 44 percent of organizations are likely to use, but it is just one option in the enterprise data environment. A second key for Pentaho has been the embeddable nature of its approach, which enables companies, especially those selling cloud-based software as a service (SaaS), to use analytics to gain competitive advantage by placing its tools within their applications. For more detail on Pentaho’s analytics and business intelligence tools please my previous analytic perspective.

A key advance for the company over the past year has been the development and refinement of what the company calls big data blueprints. These are general use cases in such areas as ETL offloading and customer analytics. Each approach includes design patterns for ETL and analytics that work with high-performance analytic databases including NoSQL variants such as Mongo and Cassandra.

The blueprint concept is important for several reasons. First, it helps Pentaho focus on specific market needs. Second, it shows customers and partners processes that enable them to get immediate return on the technology investment. The same research referenced above shows that organizations manage their information and technology better than their people and processes; to realize full value from spending on new technology, they need to pay more attention to how the technology fits with these cultural aspects.

vr_Info_Optimization_09_most_important_end_user_capabilitiesAt the user conference, the company announced release 5.2 of its core business analytics products and featured its Governed Data Delivery concept and Streamlined Data Refinery. The Streamlined Data Refinery provides a process for business analysts to access the already integrated data provided through PDI and create data models on the fly. The advantage is that this is not a technical task and the business analyst does not have to understand the underlying metadata or the data structures. The user chooses the dimensions of the analysis using menus that offer multiple combinations to be chosen in an ad hoc manner. Then the Streamlined Data Refinery automatically generates a data cube that is available for fast querying of an analytic database. Currently, Pentaho supports only the HP Vertica database, but its roadmap promises to add high-performance databases from other suppliers. The entire process can take only a few minutes and provides a much more flexible and dynamic process than asking IT to rebuild a data model every time a new question is asked.

While Pentaho Data Integration enables users to bring together all available data and integrate it to find new insights, Streamlined Data Refinery gives business users direct access to the blended data. In this way they can explore data dynamically without involving IT. The other important aspect is that it easily provides the lineage of the data. Internal or external auditors often need to understand the nature of the data and the integration, which data lineage supports. Such a feature should benefit all types of businesses but especially those in regulated industries. This approach addresses the two top needs of business end users, which according to our benchmark research into information optimization, are to drill into data (for 37%) and search for specific information (36%).

Another advance is Pentaho 5.2’s support for Kerberos security on Cloudera, Hortonworks and MapR. Cloudera, currently the largest Hadoop distribution, and Hortonworks, which is planning to raise capital via a public offering, hold the lion’s share of the commercial Hadoop market. Kerberos puts a layer of authentication security between the Pentaho Data Integration tool and the Hadoop data. This helps address security concerns which have dramatically increased over the past year after major breaches at retailers, banks and government institutions.

These announcements show results of Pentaho’s enterprise-centric customer strategy as well as the company’s investment in senior leadership. Christopher Dziekan, the new chief product officer, presented a three-year roadmap that focuses on data access, governance and data integration. It is good to see the company put its stake in the ground with a well-formed vision of the big data market. Given the speed at which the market is changing and the necessity for Pentaho to consider the needs of its open source community, it will be interesting to see how the company adjusts the roadmap going forward.

For enterprises grappling with big data integration and trying to give business users access to new information sources, Pentaho’s Streamlined Data Refinery deserves a look. For both enterprises and ISVs that want to apply integration and analytics in context of another application, Pentaho’s REST-based APIs allow embedding of end-to-end analytic capabilities. Together with the big data blue prints discussed above, Pentaho is able to deliver a targeted yet flexible approach to big data.


Ventana Research

We recently released our benchmark research on big data analytics, and it sheds light on many of the most important discussions occurring in business technology today. The study’s structure was based on the big data analytics framework that I laid out last year as well as the framework that my colleague Mark Smith put forth on the four types of discovery technology available. These frameworks view big data and analytics as part of a major change that includes a movement from designed data to organic data, the bringing together of analytics and data in a single system, and a corresponding move away from the technology-oriented three Vs of big data to the business-oriented three Ws of data. Our big data analytics research confirms these trends but also reveals some important subtleties and new findings with respect to this important emerging market. I want to share three of the most interesting and even surprising results and their implications for the big data analytics market.

First, we note that communication and knowledge sharing is a primary vr_Big_Data_Analytics_06_benefits_realized_from_big_data_analyticsbenefit of big data analytics initiatives, but it is a latent one. Among organizations planning to deploy big data analytics, the benefits most often anticipated are faster response to opportunities and threats (57%), improving efficiency (57%), improving the customer experience (48%) and gaining competitive advantage (43%). However, once a big data analytics system has moved into production, the benefits most often mentioned as achieved are better communication and knowledge sharing (51%), gaining competitive advantage (51%), improved efficiency in business processes (49%) and improved customer experience and satisfaction (46%). (The chart shows rankings of first choices as most important.) Although the last three of these benefits are predictable, it’s noteworthy that the benefit of communication and knowledge sharing, while not a priority before deployment, becomes one of the two most often cited later.

As for the implications, in our view, one reason why communication and knowledge sharing are more often seen as a key benefit after deployment rather than before is that agreement on big data analytics terminology is often lacking within organizations. Participants from fewer than half (44%) of organizations said that the people making business technology decisions mostly agree or completely agree on the meaning of big data analytics, while the same number said there are many different opinions about its meaning. To address this particular challenge, companies should pay more attention to setting up internal communication structures prior to the launch of a big data analytics project, and we expect collaborative technologies to play a larger role in these initiatives going forward.

vr_Big_Data_Analytics_02_defining_big_data_analyticsA second finding of our research is that integration of distributed data is the most important enabler of big data analytics. Asked the meaning of big data analytics in terms of capabilities, the largest percentage (76%) of participants said it involves analyzing data from all sources rather than just one, while for 55 percent it means analyzing all of the data rather than just a sample of it. (We allowed multiple responses.) More than half (56%) told us they view big data as finding patterns in large and diverse data sets in Hadoop, which indicates the continuing influence of this original big data technology. A second tier of percentages emphasizes timeliness as an aspect of big data: doing real-time processing on streams of data (44%), visualizing large structured data sets in seconds (40%) and doing real-time scoring against a database record (36%).

The implications here are that the primary characteristic of big data analytics technology is the ability to analyze data from many data sources. This shows that companies today are focused on bringing together multiple information sources and secondarily being able to process all data rather than just a sample, as well as being able to do machine learning on especially large data sets. Fast processing and the ability to analyze streams of data are relegated to third position in these priorities. That suggests that the so-called three Vs of big data are confusing the discussion by prioritizing volume, velocity and variety all at once. For companies engaged in big data analytics today, sourcing and integration of various data sources in an expedient manner is the top priority, followed by the ideas of size and then speed of arrival of data.

Third, we found that usage is not relegated to particular industries, vr_Big_Data_Analytics_09_use_cases_for_big_data_analyticscertain types of companies or certain functional areas. From among 25 uses for big data analytics those that participants are personally involved with, three of the four most often mentioned involve customers and sales: enabling cross-selling and up-selling (38%), understanding the customer better (32%) and optimizing pricing (28%). Meanwhile, optimizing IT operations ranked fifth (24%) though it was most often chosen by those in IT roles (76%). What is particularly fascinating, however, is that 17 of the 25 use cases were named by more than 10 percent, which indicates many uses for big data analytics.

The primary implication of this finding is that big data analytics is not following the famous technology adoption curves outlined in books such as Geoffrey Moore’s seminal work, “Crossing the Chasm.” That is, companies are not following a narrowly defined path that solves only one particular problem. Instead, they are creatively deploying technological innovations en route to a diverse set of outcomes. And this is occurring across organizational functions and industries, including conservative ones, which conflicts with conventional wisdom. For this reason, companies are more often looking across industries and functional disciplines as part of their due diligence on big data analytics to come up with unique applications that may yield competitive advantage or organizational efficiencies.

In summary, it has been difficult for companies to define what big data analytics actually means and how to prioritize their investments accordingly. Research such as ours can help organizations address this issue. While the above discussion outlines a few of the interesting findings of this research, it also yields many more insights, related to aspects as diverse as big data in the cloud, sandbox environments, embedded predictive analytics, the most important data sources in use, and the challenges of choosing an architecture and deploying big data analytic products. For a copy of the executive summary download it directly from the Ventana Research community.


Ventana Research

Ventana Research recently completed the most comprehensiveVRMobileBIVI evaluation of mobile business intelligence products and vendors available anywhere today. The evaluation includes 16 technology vendors’ offerings on smartphones and tablets and use across Apple, Google Android, Microsoft Surface and RIM BlackBerry that were assessed in seven key categories: usability, manageability, reliability, capability, adaptability, vendor validation and TCO and ROI. The result is our Value Index for Mobile Business Intelligence in 2014. The analysis shows that the top supplier is MicroStrategy, which qualifies as a Hot vendor and is followed by 10 other Hot vendors: IBM, SAP, QlikTech, Information Builders, Yellowfin, Tableau Software, Roambi, SAS, Oracle and arcplan.

Our expertise, hands on experience and the buyer research from our benchmark research on next-generation business intelligence and on information optimization informed our product evaluations in this new Value Index. The research examined business intelligence on mobile technology to determine organizations’ current and planned use and the capabilities required for successful deployment.

What we found was wide interest in mobile business intelligence and a desire to improve the use of information in 40 percent of organizations, though adoption is less pervasive than interest. Fewer than half of organizations currently access BI capabilities on mobile devices, but nearly three-quarters (71%) expect their mobile workforce to be able to access BI capabilities in the next 12 months. The research also shows strong executive support: Nearly half of executives said that mobility is very important to their BI processes.

Mobile_BI_Weighted_OverallEase of access and use are an important criteria in this Value Index because the largest percentage of organizations identified usability as an important factor in evaluations of mobile business intelligence applications. This is an emphasis that we find in most of our research, and in this case it also may reflect users’ experience with first-generation business intelligence on mobile devices; not all those applications were optimized for touch-screen interfaces and designed to support gestures. It is clear that today’s mobile workforce requires the ability to access and analyze data simply and in a straightforward manner, using an intuitive interface.

The top five companies’ products in our 2014 Mobile Business Intelligence Value Index all provide strong user experiences and functionality. MicroStrategy stood out across the board, finishing first in five categories and most notably in the areas of user experience, mobile application development and presentation of information. IBM, the second-place finisher, has made significant progress in mobile BI with six releases in the past year, adding support for Android, advanced security features and an extensible visualization library. SAP’s steady support for the mobile access to SAP BusinessObjects platform and support for access to SAP Lumira, and its integrated mobile device management software helped produce high scores in various categories and put it in third place. QlikTech’s flexible offline deployment capabilities for the iPad and its high ranking in assurance-related category of TCO and ROI secured it the fourth spot. Information Builders’ latest release of WebFOCUS renders content directly with HTML5 and its Active Technologies and Mobile Faves, the company delivers strong mobile capabilities and rounds out the top five ranked companies. Other noteworthy innovations in mobile BI include Yellowfin’s collaboration technology, Roambi’s use of storyboarding in its Flow application.

Although there is some commonality in how vendors provide mobile access to data, there are many differences among their offerings that can make one a better fit than another for an organization’s particular needs. For example, companies that want their mobile workforce to be able to engage in root-cause discovery analysis may prefer tools from Tableau and QlikTech. For large companies looking for a custom application approach, MicroStrategy or Roambi may be good choices, while others looking for streamlined collaboration on mobile devices may prefer Yellowfin. Many companies may base the decision on mobile business intelligence on which vendor they currently have installed. Customers with large implementations from IBM, SAP or Information Builders will be reassured to find that these companies have made mobility a critical focus.

To learn more about this research and to download a free executive summary, please visit


Tony Cosentino

Vice President and Research Director

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter


  • 72,942 hits
%d bloggers like this: