You are currently browsing the monthly archive for September 2013.

Pentaho recently announced Pentaho 5.0 which represents a major advancement for this supplier of business analytics and data integration software as well as for the open source community to which it contributes and supports. In fact, with 250 new features and enhancements in the 5.0 release, it’s important not to lose the forest for the trees. Some of the highlights are a new user interface that caters to specific roles within the organization, tight integration with emerging databases such as Mongo, and enhanced extensibility. With a funding round of $60 million coming less than a year ago and the growing market momentum around big data and analytics and it appears that Pentaho has doubled down at the right time in its efforts to balance the needs of the enterprise with those of the end user.

The 5.0 products have been completely redesigned for discovery analytics, content creation, accessibility and simplified administration. One key area of change is around roles, or what I  call personas. I recently discussed the different analytic personas that are emerging in today’s data-driven organization, and Pentaho has done a good job of addressing these at each level of the organization. In particular, the system addresses each part of the analytic value chain, from data integration through to analytic discovery and visualization.

I was surprised by the usability of the visual analytics tool, which offers a host of capabilities that enable easy data exploration and visual vr_ngbi_br_importance_of_bi_technology_considerationsdiscovery. Features such as drag-and-drop conditional formatting, including color coding, are simple, intuitive and powerful. Drop-down charting reveals an impressive list of visualizations that can be changed with a single click once. Users will need to understand the chart types by name, however, since no thumbnail visuals are revealed upon scrolling over and there is no chart recommendation engine. But overall, the release’s ease-of-use developments are a major improvement in an already usable system that our firm rated Hot in the 2012 Value Index on Business Intelligence, putting Pentaho on par with other best-in-class tools. According to our benchmark study of next-generation business intelligence systems, usability is becoming more important in business intelligence and is the key buying criterion 63 percent of the time.

Advances from an enterprise perspective include features that will help IT manage the large volumes of data being introduced into the environment through its support of big data sources and streamlining the automation of data integration. Capabilities such as job restart, rollback and load balancing are all included. For administrators, you can more easily configure and manage the system, including security levels, licensing and servers. In addition, new REST services APIs simplify the embedding of analytics and reporting into SaaS implementations. This last advancement in embedding is important, as I discussed in a recent piece that making analytics available anywhere is extremely important.

No discussion of big data integration and analytics is complete vr_infomgt_barriers_to_information_managementwithout the mention of Pentaho Data Integration (PDI), which I consider the crown jewel of the Pentaho portfolio. The value of PDI is derived from its ability to put big data integration and business analytics in the same workflow. The data integration through a user-friendly graphical paradigm helps a range of IT and analysts blend data from multiple platforms at the semantic layer rather than the user level. This enables centralized agreement around data definitions so companies can govern and secure their information environments. The Pentaho approach addresses the two biggest barriers to information management, as revealed in our benchmark research: data spread across too many systems (67%) and multiple versions of the truth (64%). While other tools on the market facilitate blending at the business-user level, there is an inherent danger in such an approach because each individual can create analysis according to the definition that best suits his or her argument. It is similar to the spreadsheet problem we have now, in which many analysts come together, each with a different understanding of the source data.

vr_bigdata_big_data_technologies_plannedIts depth in data integration is very robust and Pentaho  supports a range of big data which has been expanding rapidly to multiple data sources that are being used today and what our research found is planned to be used like Data Warehouse Appliances (35%), In-memory Database (34%), Specialized DBMS (33%) and Hadoop (32%) as found in our Big Data benchmark. Beyond these big data and RDBMS sources that are supported today, it has also expanded to non-SQL sources. The open source and pluggable nature of the Pentaho architecture allows community-driven evolution beyond traditional JDBC and ODBC drivers and gives an increasingly important leverage point for using its platform. For example, the just announced MongoDB Connector enables deep integration that includes replica sets, tag sets and read and write preferences, as well as first-of-its-kind reporting on the Mongo NoSQL database. MongoDB is a document database, which is a new class of database that allows a more flexible, object-oriented approach for accessing new sources of information. The emergence of MongoDB mirrors that of new, more flexible notation languages such as JavaScript Object Notation (JSON). While reporting is still basic, I expect the initial integration with MongoDB to be just a first step for the Pentaho community in optimizing information around this big data store. Additionally, Pentaho announced new integration with Splunk, Amazon Redshift and Cloudera Impala, as well as certifications including MongoDB, Cassandra, Cloudera, Intel, Hortonworks and MapR.

Currently the analytics and BI market is bifurcated, with the so-called stack vendors occupying entrenched positions in many organizations and visual discovery selling to business users through a viral bottom-up strategy. Both sides are moving to the middle in their development efforts and addressing the lack of data integration that is integrated in the Pentaho approach. The challenge for the traditional enterprise BI vendors is to build flexible, user-friendly visual platforms, while for the newcomers it’s applying structure and governance to their visually oriented information environment. Arguably, Pentaho is building its platform from the middle out. The company has done a good job of balancing usability aspects with the governance and security models needed for a holistic approach that both IT and end users can support. Organizations that are looking for a unified data integration and business analytics approach for business and IT, including advanced analytics and embedded approaches to information-driven applications, should consider Pentaho.


Tony Cosentino

VP and Research Director

In his keynote speech at the sixth annual Tableau Customer Conference, company co-founder and CEO Christian Chabot borrowed from Steve Jobs’ famous quote that the computer “is the equivalent of a bicycle for our minds,” to suggest that his company software is such a new bicycle. He went on to build an argument about the nature of invention and Tableau’s place in it. The people who make great discoveries, Chabot said, start with both intuition and logic. This approach allows them to look at ideas and information from different perspectives and to see things that others don’t see. In a similar vein, he went on, Tableau allows us to look at things differently, understand patterns and generate new ideas that might not arise using traditional tools. Cabot key point was profound: New technologies such as Tableau with its visual analytics software that use new and big data sources of information are enablers and accelerators of human understanding.

vr_bti_br_technology_innovation_prioritiesTableau represents a new class of business intelligence (BI) software that is designed for business analytics allowing users to visualize and interact on data in new ways and does not mandate that relationships in the data be predefined. This business analytics focus is critical as it is the top ranked technology innovation in business today as identified by 39 percent of organizations as found in our research. In traditional BI systems, data is modeled in so called cube or more defined structures which allow users to slice and dice data instantaneously and in a user friendly fashion. The cube structure solves the problem of abstracting the complexity of the structured query language (SQL) of the database and the inordinate amount of time it can take to read data from a row oriented database. However, with memory decreasing in cost significantly, the advent of new column oriented databases, and approaches such as VizQL (Tableau’s proprietary query language that allows for direct visual query of a database), the methods of traditional BI approaches are now challenged by new ones. Tableau has been able to effectively exploit the exploratory aspects of data through its technological approach, and even further with the advent of many of the new big data sources that require more discovery type methods.

After Chabot’s speech, Chris Stolte, Tableau’s co-founder and chief development officer, took the audience through the company’s product vision, which is centered around the themes of seamless access to data, visual analytics for everyone, ease of use and beauty, storytelling, and enterprise analytics everywhere. This is essential as we classify what Tableau is performing in their software as methods of discovery for business analytics for which my colleague points out the four types where Tableau currently has two of them with data and visual support. Most important is that Tableau is working to address a broader set of personas of users that I have outlined to the industry for its products and expanding further to analyst, publishers and data geeks. As part of Stolte’s address, the product team took the stage to discuss innovations coming in Tableau 8.1 scheduled for this fall of 2013 and 8.2 product releases due early in 2014 all of which have been publicly disclosed in their Internet broadcast of the keynote.

One of those innovations is a new connection interface that enables a workflow for connecting to data. It provides very light data integration capabilities with which users clean and reshape data in a number of ways. The software automatically detects inner join keys with a single click, and the new table shows up automatically. vr_infomgt_barriers_to_information_managementUsers can easily manipulate left and right joins as well as change the join field altogether. Once data is extracted and imported, new tools such as a data parser enable users to specify a specific date format. While these data integration capabilities are admittedly lightweight compared with tools such as Informatica or Pentaho (which just released its latest 5.0 platform) that is integrated with its business analytics offering, they are a welcome development for users who still spend the majority of their time cleaning, preparing and reviewing data compared to analyzing it. Our benchmark research on information management shows that dirty data is a barrier to information management 58 percent of the time. Tableau’s developments and others in the area of information management should continue to erode the entrenchment of tools inappropriately used for analytics, especially spreadsheets, which my colleague Robert Kugel has researched in the use and misuse of spreadsheets. These advancements in simpler and access to data are critical as 44 percent of organizations indicated more time is spent on data related activities compared to analytic tasks.

A significant development in the 8.1 release of Tableau is the integration of R, the open source programming language for statistics and predictive analytics. Advances in the R language through the R community have been robust and continue to gain steam, as I discussed recently in an analysis of opportunities and barriers for R. Tableau users will still need to know the details of R, but now output can be natively visualized in Tableau. Depending on their needs, use can gain a more integrated and advanced statistical experience with Tableau partner tools such as Alteryx, which both my colleague Richard Snow and I have written about this year. Alteryx integrates with R at a higher level of abstraction and also directly integrates with Tableau output. While R integration is important for Tableau to provide new capabilities, it should be noted that this is a single-threaded approach and will be limited to running in-memory. This will be a concern for those trying to analyze truly large data sets since a single thread approach limits the modeler to about a single terabyte of data. For now, Tableau likely will serve mostly as an R sandbox for sample data, but when users need to move algorithms into production for larger data, they probably will have to use a parallelized environment. Other BI vendors like Information Builders and WebFocus has already embedded R into its BI product that is designed for analysts and hides the complexities of R.

Beyond the R integration, Tableau showed useful descriptive analyst methods such as box plots and percentile aggregations. Forecast vr_predanalytics_top_predictive_techniques_usedimprovements facilitate change of prediction bands and adjustment of seasonality factors. Different ranking methods can be used, and two-pass totals provide useful data views. While these analytic developments are nice, they are not groundbreaking. Tools like BIRT Analytics, Datawatch Panopticon and Tibco Spotfire, are still ahead with their ability to visualize data models in many methods like decision trees and clustering methods. Meanwhile, SAP just acquired KXEN and will likely start to integrate predictive capabilities into SAP Lumira, its visual analytics platform. SAS is also integrating easy-to-use high-end analytics into its Visual Analytics tool, and IBM’s SPSS and Cognos Insight work together for advanced analytics. Our research on predictive analytics shows that classification trees (69%) followed by regression techniques and association rules (66% and 61%, respectively) are the statistical techniques most often used in organizations today. Tableau also indicated future investments into improving location and maps with visualization. This goal aligns with our Location Analytics research which found 48 percent of business has found that using location analytics significantly improves their business process.  Tableau advancements in visualizing analytics is great for the analysts and data geeks but it is still beyond competencies of information consumers. At the same time, Tableau has done what Microsoft has not done with Microsoft Excel: simplicity in preparing analytics for interactive visualization and discovery. In addition Tableau is easily accessible by mobile technology like Tablets which is definitely not a strong spot for Microsoft.

Making it easier for analysts and knowledge workers, Tableau has two-click copy and paste in dashboards between workbooks and is a significant development that allows users to organize visualizations in an expedient fashion. They store the collection of data and visualization in folders from the data window, where they access all the components that are used for discovery. They also can support search to find any detail easily. Transparency features and quick filter formatting allow users to match brand logos and colors. Presentation mode allows full-screen center display, and calendar controls let users select dates in ways they are familiar with from using other calendaring tools. What was surprising is that Tableau did not show how to present supporting information to the visualization like free form text that is what analysts do in using Microsoft Powerpoint and support the integration of content/documents that is not just structured data. My colleague has pointed to the failures of business intelligence and what analysts need to provide more context and information to the visualization, and it appears Tableau is starting to address them.

Tableau developer Robert Kosara showed a Tableau 8.2 feature called Storypoints, which puts a navigator at the top of the screen and pulls together different visualizations to support a logical argument. This is important to advance the potential issue my colleague has pointed out with what we have in visual anarchy today. Storypoints are linked directly to data visualizations from which you can navigate across and see varying states of the visualization. Storytelling is of keen interest to analysts, because it is the primary way in which they prepare information and analytics for review and in support of the decision making needs in the organization. Encapsulating the observations in telling a story though requires more than navigation across states of visualization with a descriptive caption. It should support embedding of descriptive information related to the visualization and not just to navigation across it. Tableau has more to offer with its canvas layout and embedding other presentation components but did not spend much time outlining what is fully possible today. The idea of storytelling and collaboration is a hot area of development, with multiple approaches coming to market including those from Datameer, Roambi, Yellowfin and QlikTech (with its acquisition of NcomVA, a Swedish visualization company). These approaches need to streamline the cumbersome process of copying data from Excel into PowerPoint, building charts and annotating slides. Tableau’s Storypoints and the ability to guide navigation on the visualizations and copy and paste dashboards together are good first steps that can be a superior alternative to just using a personal productivity approach with Microsoft Office, but Tableau will still need more depth to replicate the flexibility in particular of Microsoft Powerpoint.

The last area of development, and perhaps the most important for Tableau, is making the platform and tools more enterprise ready. Security, availability, scalability and manageability are the hallmarks of an enterprise-grade application, and Tableau is advancing in each of these areas. The Tableau 8.1 release includes external load-balancing support and more dynamic support for host names. Companies using the SAML standard can administer a single sign-on that delegates authentication to an identity provider. IPv6 support for next-generation Internet apps and, perhaps most important from a scalability perspective, 64-bit architecture have been extended across the product line. (Relevant to this last development, Tableau Desktop in version 8.2 will operate natively on Apple Mac, which garnered perhaps the loudest cheer of the day from the mostly youthful attendees). For proof of its scalability, Tableau pointed to Tableau Public, which fields more than 70,000 views each day. Furthermore, Tableau’s cloud implementation called Tableau Online offers a multi-tenant architecture and a columnar database for scalability, performance and unified upgrades for cloud users. Providing a cloud deployment is critical according to our research that found that a quarter of organizations prefer a cloud approach; however cloud BI applications have seen slower adoption.

Enterprise implementations are the battleground on which Tableau wants to compete and is making inroads through rapid adoption by the analysts who are responsible for analytics across the organization. During the keynote, Chabot took direct aim at legacy business intelligence vendors, suggesting that using enterprise BI platform tools is akin to tying a brick to a pen and trying to draw. Enterprise BI platforms, he argued, are designed to work in opposition to great thinking. They are not iterative and exploratory in nature but rather developed in a predefined fashion. While this may be true in some cases, those same BI bricks are often the foundation of many organizations and they are not always easy to remove. Last year I argued in analyzing Tableau 8.0 that the company was not ready to compete for larger BI implementations. New developments coming this year address some of these shortcomings, but there is still the lingering question of the entrenchment of BI vendors and the metrics that are deployed broadly in organizations. Companies such as IBM, Oracle and SAP have a range of applications including finance, planning and others that reside at the heart of most organizations, and these applications can dictate the metrics and key indicators to which people and organizations are held accountable. For Tableau to replace them would require more integration with the established metrics and indicators that are managed within these tools and associated databases. For large rollouts driven by well-defined parameterized reporting needs and interaction with enterprise applications, Tableau still has work to do. Furthermore, every enterprise BI vendor has its own visual offering and is putting money into catching up to Tableau.

In sum, Tableau’s best-in-class ease of use does serve as a bicycle for the analytical mind, and with its IPO this year, Tableau is pedaling as fast as ever to continue its innovations. We research thousands of BI deployments and recently awarded Cisco our Business Technology Leadership Award in Analytics for 2013 for its use of Tableau Software. VR_leadershipwinnerCisco, who has many business intelligence (BI) tools, uses Tableau to design analytics and visualize data in multiple areas of its business. Tableau’s ability to capture the hearts and minds of those analysts that are responsible for analytics, and demonstrate business value in short period of time, or what is called time-to-value (TTV) and even more important for big data as I have pointed out, is why they are growing rapidly building a community and passion towards its products. Our research finds that business is driving adoption of business analytics which helps Tableau avoid the politics of IT while addressing the top selection criteria of usability. In addition, the wave of business improvement initiatives are changing how 60 percent of organizations select technology with buyers no longer simply accepting the IT standard or existing technology approach. Buyers both on the IT and in business should pay close attention to these trends and for all organizations looking to compete with analytics through simple but elegant visualizations should consider Tableau’s offerings.


Tony Cosentino

VP and Research Director

R, the open source programming language for statistics and graphics, has now become established in academic computing and holds significant potential for businesses struggling to fill the analytics skills gap. The software industry has picked up on this potential, and the majority of business intelligence and analytics players have added an R-oriented strategy to their portfolio. In this context, it is relevant to look at some of the problems that R addresses and some of the challenges to its adoption.

As I mentioned, perhaps the most important potential for R is to address the analytic skills gap, which our research shows is a priority for vr_bigdata_obstacles_to_big_data_analytics (2)organizations. This is a serious and growing issue as more enterprises try to deal with the huge volumes of data they accumulate now, which continue to increase. Our benchmark research on big data identifies the biggest challenges to implementing big data as staffing (cited by 79% of organizations) and training (77%). Since R is a widely used statistical language used in academia today, current and future graduates may well help fill this gap with what they learned.

Another challenge facing companies is the lack of usability of advanced analytic languages and tools. Across our research, usability is rising in importance in just about every category. Analytical programming is not something the information consumer or the knowledge worker can do, as I outlined in a recent analysis on personas in business analytics, but those in the analyst community can readily learn the R language. R’s object-orientation is often put forth as providing an intuitive language that is easier to learn than conventional systems; this starts to explain its massive following, which already numbers in the millions of users.

On another front, R addresses the need for analytics to be part of larger analytic workflows. It is easier to embed into applications than other statistical languages, and unlike embeddable approaches such as Python, R does not require users to pull together a variety of elements to address a particular statistical problem. Fundamentally, R is more mature than Python from an algorithmic point of view, and its terminology is oriented more to the statistical user than the computer programmer.

Perhaps the broadest opportunity for R is to address new usevr_predanalytics_predictive_analytics_obstacles cases and the creation of innovative analytical assets for companies. The fact that it is open source means that each time a new analytical process is developed, it is released and tested almost immediately if submitted through the R project community. Furthermore, R does a nice job of using a diverse set of data which is an important part of doing predictive analytics on big data in today’s highly distributed environments. As I often mention, new information sources are more important than tools and techniques. R does not directly address the largest obstacle found in our predictive analytics research that over half (55%) of organizations found which is difficulty integrating into information architecture that is the ad-hoc or needs to be automated to support the integration of data to support the analytic processes.

Last, but not least in terms of opportunities, R addresses the cost pressures that face business users and IT professionals alike. Some might argue that R is free in the way a puppy is free (requiring lots of effort after adoption), but in the context of an organization’s ability to bootstrap an analytic initiative, low startup cost is a critical element. With online courses and a robust community available, analysts can get up to speed quickly and begin to add value with little direct investment from their employers.

Despite all these positive aspects, there are others holding back adoption of R in the enterprise. The downside of being free is the perceived lack of support for enterprises that commit to an open source application. This can be a particularly high barrier in industries with established analytic agendas, such as some areas of banking, consumer products, and Pharmaceutical companies. (Ironically, these industries are some of the biggest innovators with R in other parts of their business.)

And we must note that ease of use for R still seems to stop with an experienced analyst used to a coding paradigm. No graphical user environment such as SPSS Modeler or SAS Data Miner has emerged yet as a standard approach for R, thus the level of user sophistication has to be higher and analytical processes are more difficult to troubleshoot. We have seen offerings that are maturing rapidly that I have already covered as stand-alone tools like Revolution Analytics and also embedded within business intelligence tools like Information Builders.

Finally, the scalability of R is limited to what is loaded into memory. How large the data sets being analyzed can go is a matter of debate; one LinkedIn group discussion claimed that an R analytic data set can scale to a terabytes in-memory, while my discussions with users suggest that large production implementations are not viable without parallelizing the code in some sort of distributed architecture. Generally speaking, as analytic data sets get into the terabyte range, parallelization is necessary.

vr_predanalytics_benifits_of_predictive_analyticsIn my next analysis, I will look at some of the industries and companies that are using R to achieve competitive advantage, which according to our benchmark research into predictive analytics is the number one biggest benefit of predictive analytics for more than two-thirds of companies. I will also highlight more updated on how enterprise software vendors’ strategies and where they are incorporating R into their software portfolios.


Tony Cosentino

VP and Research Director

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.


  • 73,715 hits
%d bloggers like this: