You are currently browsing the tag archive for the ‘Data Discovery’ tag.

In 2014, IBM announced Watson Analytics, which uses machine learning and natural language processing to unify and simplify the user experience in each step of the analytic processing: data acquisition, data preparation, analysis, dashboarding and storytelling.  After a relatively short beta testing period involving more than 22,000 users, IBM released Watson Analytics for general availability in December. There are two editions: the “freemium” trial version allows 500MB of data storage and access to file sizes less than 100,000 rows of data and 50 columns; the personal edition is a monthly subscription that enables larger files and more storage.

Its initial release includes functions to explore, predict and assemble data. Many of the features are based on IBM’s SPSS Analytic Catalyst, which I wrote about and which won the 2013 Ventana Research Technology Innovation Award for business analytics. Once data is uploaded, the explore function enables users to analyze data in an iterative fashion using natural language processing and simple point-and-click actions. Algorithms decide the best fit for graphics based on the data, but users may choose other graphics as needed. An “insight bar” shows other relevant data that may contain insights such as potential market opportunities.

The ability to explore data through visualizations with minimal knowledge is a primary aim of modern analytics tools. With the explore function incorporating natural language processing, which other tools in the market lack, IBM makes analytics accessible to users without the need to drag and drop dimensions and measures across the screen. This feature should not be underestimated; usability is the buying criterion for analytics tools most widely cited in our benchmark research on next-generation business intelligence (by 63% of organizations).

vr_ngbi_br_importance_of_bi_technology_considerations_updatedThe predict capability of Watson Analytics focuses on driver analysis, which is useful in a variety of circumstances such as sales win and loss, market lift analysis, operations and churn analysis. In its simplest form, a driver analysis aims to understand causes and effects among multiple variables. This is a complex process that most organizations leave to their resident statistician or outsource to a professional analyst. By examining the underlying data characteristics, the predict function can address data sets, including what may be considered big data, with an appropriate algorithm. The benefit for nontechnical users is that Watson Analytics makes the decision on selecting the algorithm and presents results in a relatively nontechnical manner such as spiral diagrams or tree diagrams. Having absorbed the top-level information, users can drill down into top key drivers. This ability enables users to see relative attribute influences and interactivity between attributes. Understanding interactivity is an important part of driver analysis since causal variables often move together (a challenge known as multicollinearity) and it is sometimes hard to distinguish what is actually causing a particular outcome. For instance, analysis may blame the customer service department for a product defect and point to it as the primary driver of customer defection. Accepting this result, a company may mistakenly try to fix customer service when a product issue needs to be addressed. This approach also overcomes the challenge of Simpson’s paradox, in which a trend that appears in different groups of data disappears or reverses when these groups are combined. This is a hindrance for some visualization tools in the market.

Once users have analyzed the data sufficiently and want to create and share their analysis, the assemble function enables them to bring together various dashboard visualizations in a single screen. Currently, Watson Analytics does such sharing (as well as comments related to the visualizations) via email. In the future, it would good to see capabilities such as annotation and cloud-based sharing in the product.

Full data preparation capabilities are not yet integrated into Watson Analytics. Currently, it includes a data quality report that gives confidence levels for the current data based on its cleanliness, and basic sort, transform and relabeling are incorporated as well. I assume that IBM has much more in the works here. For instance, its DataWorks cloud service offers APIs for some of the best data preparation and master data management available today. DataWorks can mask data at the source and do probabilistic matching against many sources including both cloud and on-premises addresses.  This is a major challenge organizations face when needing to conduct analytics across many data sets. For instance, in multichannel marketing, each individual customer may have many email addresses as well as different mailing addresses, phone numbers and identifiers for social media. A so-called “golden record” needs to be created so all such information can be linked together. Conceptually, the data becomes one long row of data related to that golden record, rather than multiple unassociated data in rows of shorter length. This data needs to be brought into a company’s own internal systems, and personally identifiable information must be stripped out before anything moves into a public domain. In a probabilistic matching system, data is matched not on one field but through associations of data which gives levels of certainty that records should be merged. This is different than past approaches and one of the reasons for significant innovation in the category. Multiple startups have been entering the data preparation space to address the need for a better user experience in data preparation. Such needs have been documented as one of the foundational issues facing the world of big data. Our benchmark research into information optimization shows that data preparation (47%) and quality and consistency (45%) are the most time-consuming tasks for organizations in analytics.

Watson Analytics is deployed on IBM’s SoftLayer cloud vr_Info_Optimization_04_basic_information_tasks_consume_timetechnology and is part of a push to move its analytic portfolio into the cloud. Early in 2015 the company plans to move its SPSS and Cognos products into the cloud via a managed service, thus offloading tasks such as setup, maintenance and disaster recovery management. Watson Analytics will be offered as a set of APIs much as the broader Watson cognitive computing platform has been. Last year, IBM said it would move almost all of its software portfolio to the cloud via its Bluemix service platform. These cloud efforts, coupled with the company’s substantial investment in partner programs with developers and universities around the world, suggest that Watson may power many next-generation cognitive computing applications, a market estimated to grow into the tens of billions of dollars in the next several years.

Overall, I expect Watson Analytics to gain more attention and adoption in 2015 and beyond. Its design philosophy and user experience are innovative, but work must be done in some areas to make it a tool that professionals use in their daily work. Given the resources IBM is putting into the product and the massive amounts of product feedback it is receiving, I expect initial release issues to be worked out quickly through the continuous release cycle. Once they are, Watson Analytics will raise the bar on self-service analytics.


Ventana Research

Qlik was an early pioneer in developing a substantial market for a visual discovery tool that enables end users to easily access and manipulate analytics and data. Its QlikView application uses an associative experience that takes  an in-memory, correlation-based approach to present a simpler design and user experience for analytics than previous tools. Driven by sales of QlikView, the company’s revenue has grown to more than $.5 billion, and originating in Sweden it has a global presence.

At its annual analyst event in New York the business intelligence and analytics vendor discussed recent product developments, in particular the release of Qlik Sense. It is a drag-and-drop visual analytics tool targeted at business users but scalable enough for enterprise use. Its aim is to give business users a simplified visual analytic experience that takes advantage of modern cloud technologies. Such a user experience is important; our benchmark research into next-generation business intelligence shows that usability is an important buying criterion for nearly two out of three (63%) companies. A couple of months ago, Qlik introduced Qlik Sense for desktop systems, and at the analyst event it announced general availability of the cloud and server editions.

vr_bti_br_technology_innovation_prioritiesAccording to our research into business technology innovation, analytics is the top initiative for new technology: 39 percent of organizations ranked it their number-one priority. Analytics includes exploratory and confirmatory approaches to analysis. Ventana Research refers to exploratory analytics as analytic discovery and segments it into four categories that my colleague Mark Smith has articulated. Qlik’s products belong in the analytic discovery category. Users can use the tool to investigate data sets in an intuitive and visual manner, often conducting root cause analysis and decision support functions. This software market is relatively young, and competing companies are evolving and redesigning their products to suit changing tastes. Tableau, one of Qlik’s primary competitors, which I wrote about recently, is adapting its current platform to developments in hardware and in-memory processing, focusing on usability and opening up its APIs. Others have recently made their first moves into the market for visual discovery applications, including Information Builders and MicroStrategy. Companies such as Actuate, IBM, SAP, SAS and Tibco are focused on incorporating more advanced analytics in their discovery tools. For buyers, this competitive and fragmented market creates a challenge when comparing offers in the analytic discovery market.

A key differentiator is Qlik Sense’s new modern architecture, which is designed for cloud-based deployment and embedding in other applications for specialized use. Its analytic engine plugs into a range of Web services. For instance, the Qlik Sense API enables the analytic engine to call to a data set on the fly and allow the application to manipulate data in the context of a business process. An entire table can be delivered to node.js, which extends the JavaScript API to offer server-side features and enables the Qlik Sense engine to take on an almost unlimited number of real-time connections  by not blocking input and output. Previously developers could write PHP script and pipe SQL to get the data, and the resulting application is viable but complex to build and maintain. Now all they need is JavaScript and HTML. The Qlik Sense architecture abstracts the complexity and allows JavaScript developers to make use of complex constructs without intricate knowledge of the database. The new architecture can decouple the Qlik engine from the visualizations themselves, so Web developers can define expressions and dimensions without going into the complexities of the server-side architecture. Furthermore, by decoupling the services, developers gain access to open source visualization technologies such as d3.js. Cloud-based business intelligence and extensible analytics are becoming a hot topic. I have written about this, including a glimpse of our newly announced benchmark research on the next generation of data and analytics in the cloud. From a business user perspective, these types of architectural changes may not mean much, but for developers, OEMs and UX design teams, it allows much faster time to value through a simpler component-based approach to utilizing the Qlik analytic engine and building visualizations.

vr_Big_Data_Analytics_06_benefits_realized_from_big_data_analyticsThe modern architecture of Qlik Sense together with the company’s ecosystem of more than 1,000 partners and a professional services organization that has completed more than 2,700 consulting engagements, gives Qlik a competitive position. The service partner relationships, including those with major systems integrators, are key to the company’s future since analytics is as much about change management as technology. Our research in analytics consistently shows that people and processes lag technology and information in performance with analytics. Furthermore, in our benchmark research into big data analytics, the benefits most often mentioned as achieved are better communication and knowledge sharing (24%), better management and alignment of business goals (18%), and gaining competitive advantage (17%).

As tested on my desktop, Qlik Sense shows an intuitive interface with drag-and-drop capabilities for building analysis. Formulas are easy to incorporate as new measures, and the palate offers a variety of visualization options which automatically fit to the screen. The integration with QlikView is straightforward in that a data model from QlikView can be saved seamlessly and opened intact in Qlik Sense. The storyboard function allows for multiple visualizations to build into narratives and for annotations to be added including linkages with data. For instance, annotations can be added to specific inflection points in a trend line or outliers that may need explanation. Since the approach is all HTML5-based, the visualizations are ready for deployment to mobile devices and responsive to various screen sizes including newer smartphones, tablets and the new class of so-called phablets. In the evaluation of vendors in our Mobile Business Intelligence Value Index Qlik ranked fourth overall.

In the software business, of course, technology advances alone don’t guarantee success. Qlik has struggled to clarify the position its next-generation product and it is not a replacement for QlikView. QlikView users are passionate about keeping their existing tool because they have already designed dashboards and calculations using this tool. Vendors should not underestimate user loyalty and adoption. Therefore Qlik now promises to support both products for as long as the market continues to demand them. The majority of R&D investment will go into Qlik Sense as developers focus on surpassing the capabilities of QlikView. For now, the company will follow a bifurcated strategy in which the tools work together to meet needs for various organizational personas. To me, this is the right strategy. There is no issue in being a two-product company, and the revised positioning of Qlik Sense complements QlikView both on the self-service side and the developer side. Qlik Sense is not yet as mature a product as QlikView, but from a business user’s perspective it is a simple and effective analysis tool for exploring data and building different data views. It is simpler because users no do not need to script the data in order to create the specific views they deem necessary. As the product matures, I expect it to become more than an end user’s visual analysis tool since the capabilities of Qlik Sense lends itself to web scale approaches. Over time, it will be interesting to see how the company harmonizes the two products and how quickly customers will adopt Qlik Sense as a stand-alone tool.

For companies already using QlikView, Qlik Sense is an important addition to the portfolio. It will allow business users to become more engaged in exploring data and sharing ideas. Even for those not using QlikView, with its modern architecture and open approach to analytics, Qlik Sense can help future-proof an organization’s current business intelligence architecture. For those considering Qlik for the first time, the choice may be whether to bring in one or both products. Given the proven approach of QlikView, in the near term a combination approach may be a better solution in some organizations. Partners, content providers and ISVs should consider Qlik Branch, which provides resources for embedding Qlik Sense directly into applications. The site provides developer tools, community efforts such as d3.js integrations and synchronization with Github for sharing and branching of designs. For every class of user, Qlik Sense can be downloaded for free and tested directly on the desktop. Qlik has made significant strides with Qlik Sense, and it is worth a look for anybody interested in the cutting edge of analytics and business intelligence.


Ventana Research

Tableau Software introduced its latest advancements in analytics and business intelligence software along with its future plan to more than 5,000 attendees at its annual user conference in its home town of Seattle. The enthusiasm of the primarily millennial-age crowd reflected not only the success of the young company but also its aspirations. The market for what Ventana Research calls visual and data discovery and Tableau have experienced rapid growth that is likely to continue.

vr_ngbi_br_importance_of_bi_technology_considerations_updatedThe company focuses on the mission of usability, which our benchmark research into next-generation business intelligence shows to be a top software buying criterion for more organizations (63%) than any other. Tableau introduced advances in this area including analytic ease of use, APIs, data preparation, storyboarding and mobile technology support as part of its user-centric product strategy. Without revealing specific timelines, executives said that the next major release, Tableau 9.0, likely will be available in the first half of 2015 as outlined by the CEO in his keynote.

Chief Development Officer and co-founder Chris Stolte showed upcoming ease-of-use features such as the addition of Excel-like functionality within workbooks. Users can type a formula directly into a field and use auto-completion or drag and drop to bring in other fields that are components of the calculation. The new calculation can be saved as a metric and easily added to the Tableau data model. Others announced included table calculations, geographic search capabilities and radial and lasso selection on maps. The live demonstration between users onstage was seamless and created flows that the audience could understand. The demonstration reflected impressive navigation capabilities.

Stolte also demonstrated upcoming statistical capabilities.  Box plots have been available since Tableau 8.1, but now the capabilities have been extended for comparative analysis across groups and to create basic forecast models. The comparative descriptive analytics has been improved with drill-down features and calculations within tables. This is important since analysis between and within groups is necessary to use descriptive statistics to reveal business insights. Our research into big data analytics shows that the some of the most important analytic approaches are descriptive in nature: Pivot tables (48%), classification or decision trees (39%) and clustering (37%) are the methods most widely used for big data analytics.

When it comes to predictive analytics, however, Tableau is still somewhat limited. Companies such as IBM, Information Builders, MicroStrategy, SAS and SAP have focused more resources on incorporating advanced analytics in their discovery tools; Tableau has to catch up in this area. Forecasting of basic trend lines is a first step, but if the tool is meant for model builders, then I’d like to see more families of curves and algorithms to fit different data sets such as seasonal variations. Business users, Tableau’s core audience, need automated modeling approaches that can evaluate the characteristics of the data and produce adequate models. How different stakeholders communicate around the statistical parameters and models is also unclear to me. Our research shows that summary statistics and model comparisons are important capabilities for administering and sharing predictive analytics. Overall, Tableau is making strides in both descriptive and predictive statistics and making this intuitive for users.

vr_Info_Optimization_04_basic_information_tasks_consume_timePresenters also introduced new data preparation capabilities on Excel imports including the abilities to view delimiters, split columns and even un-pivot data. The software also has some ability to clean up spreadsheets such as getting rid of headers and footers. Truly dirty data, such as survey data captured in spreadsheets or created with custom calculations and nesting, is not the target here. The data preparation capabilities can’t compare with those provided by companies such as Alteryx, Infromatica, Paxata, Pentaho, Tamr or Trifacta. However, it is useful to quickly upload and clean a basic Excel document and then visualize it in a single application. According to our benchmark research on information optimization, data preparation (47%) and checking data for quality and consistency (45%) are the primary tasks on which analysts spend their time.

Storytelling (which Tableau calls Storypoints), is an exciting area of development for the company. Introduced last year, it enables users to build a sequential narrative that includes graphics and text. New developments enable the user to view thumbnails of different visualizations and easily pull them into the story. Better control over calculations, fonts, colors and text positioning were also introduced. While these changes may seem minor, they are important to this kind of application. A major reason that most analysts take their data out of an analytic tool and put it into PowerPoint is to have this type of control and ease of use. While PowerPoint remains dominant when it comes to communicating analytic results in business, a new category of tools is challenging Microsoft’s preeminence in this area. Tableau Storypoints is one of the easiest to use in the market.

API advancements were discussed by Francois Ajenstat, senior director of product management, who suggested that in the future anything done on Tableau Server can be done through APIs. In this way different capabilities will be exposed so that other software can use them (Tableau visualizations, for example) within the context of their workflows. As well, Tableau has added REST APIs including JavaScript capabilities, which allow users to embed Tableau in applications to do such things as customize dashboards. The ability to visualize JSON data in Tableau is also a step forward since exploiting new data sources is the fastest way to gain business advantage from analytics. This capability was demonstrated using government data, which is commonly packaged in JSON format. As Tableau continues its API initiatives, we hope to see more advances in exposing APIs so Tableau can be integrated into governance workflows, which can be critical to enterprise implementations. APIs also can enable analytic workflow tools to more easily access the product so statistical modelers can understand the data prior to building models. While Tableau integrates on the back end with analytic tools such as Alteryx, the visual and data discovery aspects that must precede statistical model building are still a challenge. Having more open APIs will open up opportunities for Tableau’s customers and could broaden its partner ecosystem.

The company made other enterprise announcements such as Kerberos Security, easier content management, an ability to seamlessly integrate with, and a parallelized approach to accessing very large data sets. These are all significant developments. As Tableau advances its enterprise vision and continues to expand on the edges of the organization, I expect it to compete in more enterprise deals. The challenge the company faces is still one of the enterprise data model. Short of displacing the enterprise data models that companies have invested in over the years, it will continue to be an uphill battle for Tableau to displace large incumbent BI vendors. Our research into Information Optimization shows that the integration with security and user-access frameworks is the biggest technical challenge for optimizing information. For a deeper look at Tableau’s battle for the enterprise, please read my previous analysis.

VRMobileBIVIPerhaps the most excitement from the audience came from the introduction of Project Elastic, a new mobile application with which users can automatically turn an email attachment in Excel into a visualization. The capability is native so it works in offline mode and provides a fast and responsive experience. The new direction bifurcates Tableau’s mobile strategy which heretofore was a purely HTML5 strategy introduced with Tableau 6.1. Tableau ranked seventh in our 2014 Mobile Business Intelligence Value Index.

Tableau has a keen vision of how it can carve out a place in the analytics market. Usability has a major role in building a following among users that should help it continue to grow. The Tableau cool factor won’t go unchallenged, however. Major players are introducing visual discovery products amid discussion about the need for more governance of data into the enterprise and cloud computing; Tableau likely have to blend into the fabric of analytics and BI in organizations. To do so, the company will need to forge significant partnerships and open its platform for easy access.

Organizations considering visual discovery software in the context of business and IT should include Tableau. For enterprise implementations, consideration should be done to ensure Tableau can support the broader manageability and reliability requirements for larger scale deployments. Visualization of data continues to be a critical method to understand the challenges of global business but should not be the only analytic approach taken for all types of users. Tableau is on the leading edge of visual discovery and should not be overlooked.


Ventana Research

Our recently released benchmark research on information optimization shows that 97 percent of organizations find it important or very important to make information available to the business and customers, Ventana_Research_Benchmark_Research_Logoyet only 25 percent are satisfied with the technology they use to provide that access. This wide gap between importance and satisfaction reflects the complexity of preparing and presenting information in a world where users need to access many forms of data that exist across distributed systems.

Information optimization is a new focus in the enterprise software market. It builds on existing investments in business applications, business intelligence and information management and also benefits from recent advances in business analytics and big data, lifting information to higher levels of use and greater value in organizations. Information optimization also builds on information management and information applications, areas Ventana Research has previously researched. For more on the background and definition of information optimization, please see my colleague Mark Smith’s foundational analysis.

vr_Info_Optimization_01_whos_responsible_for_information_availabilityThe drive to improve information availability derives from a need for greater operational efficiency, according to two-thirds (67%) of organizations. The imperative is so strong that 43 percent of all organizations currently are making changes to how they design and deploy information, while another 37 percent plan to make changes in the next 12 months. The pressure for such change is being directed toward the IT group, which is involved with the task of optimizing information in more than four-fifths of organizations with or without line of business support. IT, however, is in an untenable position, as demands are far outstripping its available resources and technology to deal with the problem, which leads to dissatisfaction with the IT department in two out of five organizations, according to our research. Internally, many organizations try to optimize information using manual spreadsheet processes and are confident in their ability to get by 73% of the time. But when the focus turns to the ability to make information available to partners or customers, an increasingly important capability in today’s information-driven economy, the confidence rate drops dramatically to 62% and 55% respectively.

A large part of the information optimization challenge is users’ vr_Info_Optimization_09_most_important_end_user_capabilitiesdifferent requirements. For instance, the top needs of analysts are extracting information, designing and integrating metrics, and developing access policies. In contrast, the top needs of business users are drilling into information (37%), search capabilities (36%) and collaboration (27%). IT must also consider multiple points of integration such as security frameworks and information modeling, as well as integration with operational and content management systems. This is complicated further by multiple new standards coming into play as customer and financial data – still the most important information systems in the organization – append less structured sources of data that add context and value. SQL is still the dominant standard when it comes to information platforms, but less structured approaches such as XML and JSON are emerging fast. Furthermore, innovations in the collaborative and mobile workforce are driving standards such as HTML5 and must be considered carefully when optimizing information. Platform considerations are also affected by the increasing use of analytic databases, in-memory approaches and Hadoop. Traditional approaches like an RDBMS on standard hardware and flat files are still the most common, but the most growth is with in-memory systems and Hadoop. This is interesting because these technologies allow for multiple new approaches to analysis such as visual discovery and machine learning on large data sets.  Adding to the impetus for change is that organizations using an RDBMS on standard hardware and flat files are less satisfied than those using the more innovative approaches to big data.

Information optimization also encounters challenges associated with data preparation and data presentation. In our research, 47 percent of organizations said that  they spend the largest portion of their time in data preparation, but less than half said they are satisfied with their process of creating information. Contributing to this dissatisfaction are lack of resources, lack of flexibility and speed of integration. Lack of resources and speed of integration tend to move together. That is, when more financial and human resources are dedicated to the integration efforts, satisfaction is higher. Adding more human and financial resources does not necessarily increase flexibility. That is a function of both tools and processes, and we see it as a result of divergent data preparation workflows occurring in organizations. One is a more structured approach that follows more traditional ETL paths that can lead to timely integration of data once everything is defined and the system is in place, but is less flexible. Another data preparation approach is to merge internal and external information on the fly in a sandbox environment or in response to sudden market challenges. These different information flows ultimately have to support specific forms of information presentation for users, whether that be the creation of an analytic data set for a complex statistical procedure by a data scientist within the organization or a single number with qualitative context for an executive on a mobile device.

Thus it is clear that information optimization is a critical focus for organizations; it’s also an important area of study for Ventana Research in 2014. Our latest benchmark research shows that the challenges are complex and involve the entire organization. As new technologies come to market and information processes must be aligned with the needs of the lines of business and the functional roles within organizations, companies that are able to simplify access to information and analytics through the information optimization approaches discussed above will provide an edge on competitors.


Tony Cosentino

VP & Research Director

Paxata, a new data and analytics software provider says it wants to address one of the most pressing challenges facing today’s analyst performing analytics: simplifying data preparation. This trend toward simplification is well aligned with the market’s desire for improving usability, which our benchmark research into Next-Generation Business Intelligence shows is a primary buying consideration in two-thirds (64%) of companies. This trend is driving significant adoption of business-friendly-front-end visual and data discovery tools and is part of my research agenda for 2014.

On the back end, however, there is still considerable complexity. VR_Benchmark_Research_logoNon-traditional relational database systems such as Hadoop and big data appliances address the need to store and to some degree query massive amounts of structured and unstructured data. But the ability to efficiently and effectively blend these data sources and any third-party cloud-based data is still a challenge.

To address this challenge, the front end analytics tools that are being adopted by analysts and the multitude of back-end database systems must be integrated to deliver high quality analytic data sets. Today, this is no easy task. My latest benchmark research into Information Optimization recently released finds that when companies create and deploy information, the largest portions of time are spent on preparing data for analysis (49%) and reviewing data for quality and consistency issues (47%). In fact, our research shows that analysts consistently spend anywhere from 40 percent to 60 percent of their time in the data preparation phase that precedes actual analysis of the data.

Paxata and its Adaptive Data Preparation platform aims to solve the challenge of data preparation by improving the data vr_ss21_spreadsheets_arent_easily_replacedaggregation, enrichment, quality and governance processes. It does this using a spreadsheet paradigm, a choice of approach that should resonate well with business analysts; our research into spreadsheet use in today’s enterprises finds that the majority of them (56%) are resistant to a move away from spreadsheets.

In Paxata’s design, once the data is loaded the software displays the combined dataset in a spreadsheet format and the user then manipulates the rows and columns to accomplish the various data preparation tasks. For instance, to profile the data, the analyst can first use a search box and an autocomplete query to find the data of interest and then use color-coded cells and visualization techniques to highlight patterns in the data. For data that may include multiple duplicate records such as addresses, the company includes services that help to sort through these records and make suggestions on what records to combine. This last task may be of particular interest for marketers attempting to combine multiple third-party data sources that list several addresses and names for the same individual.

Another key aspect of Paxata’s software is a history function that allows users to return to any step in the data preparation process and make changes on the fly. This ability to explore the lineage of the data enables another interesting function: “Paxata Share.” This collaborative capability enables multiple users to collaboratively evaluate the differences between data sets by looking at different assumptions that went into the processing of the data. This function is particularly interesting as it has the potential to solve the challenge of “battling boardroom facts” – the situation in which people come to a meeting with different versions of the truth based on the same data sources but different data preparation assumptions.

Under the covers, Paxata’s offering boasts a cloud-based multi-tenant architecture hosted on Rackspace and leveraging the OpenStack platform. The company says its product can comfortably handle big data, processing millions of rows (or about a terabyte) of data in real time. If data sets are larger than this, a batch process can replace the real-time analysis.

In my view, the main value of Paxata’s technology lies in the data analyst time it potentially can save. Much of the functionality it offers involves data discovery driven by the kinds of machine learning algorithms that my colleague Mark Smith discussed Four types of Discovery Technology. For instance, the Paxata software will recommend data and metric definitions based on the business context in which the analyst is working – a customer versus a supply chain context, for example – and these recommendations will sharpen as more data runs through the system.

Paxata is off to a great start, though the data connectors its product offers currently are limited; this will improve as it builds out connectors for more data sources. The company will also need to sort through a very noisy marketplace of companies that provide similar services, on-premises or in the cloud, and that all are adapting their messages to address the data preparation challenge. On its website, Paxata lists Cloudera, Qlik Technologies and Tableau as technology partners. The company also lists dozens of information enrichment partners including government organizations and data companies such as Acxiom, DataSift, and Esri. The list of information partners is extensive, which reflects a thoughtful focus on the value of third-party data sources.

Utilizing efficient cloud computing technology, Paxata is able to come out of the gate with aggressive pricing listed on the company site that is about $300 per month which is pretty small amount for the time that is saved on daily, weekly and monthly basis. Such pricing should help adoption especially with business analysts that the company targets. Organizations that are struggling with the time they put into the data preparation phase of analytics and those that are looking to leverage outside data sources in new and innovative ways should look into Paxata.


Tony Cosentino

VP and Research Director

Like every large technology corporation today, IBM faces an innovator’s dilemma in at least some of its business. That phrase comes from Clayton Christensen’s seminal work, The Innovator’s Dilemma, originally published in 1997, which documents the dynamics of disruptive markets and their impacts on organizations. Christensen makes the key point that an innovative company can succeed or fail depending on what it does with the cash generated by continuing operations. In the case of IBM, it puts around US$6 billion a year into research and development; in recent years much of this investment has gone into research on big data and analytics, two of the hottest areas in 21st century business technology. At the company’s recent Information On Demand (IOD) conference in Las Vegas, presenters showed off much of this innovative portfolio.

At the top of the list is Project Neo, which will go into beta release early in 2014. Its purpose to fill the skills gap related to big data analytics, which our benchmark research into big data shows is held back most by lack of knowledgeable staff (79%) and lack of training (77%). The skills situation can be characterized as a three-legged stool of domain knowledge (that is, line-of-business knowledge), statistical knowledge and technological knowledge. With Project Neo, IBM aims to reduce the technological and statistical demands on the domain expert and empower that person to use big data analytics in service of a particular outcome, such as reducing customer churn or presenting the next best offer. In particular, Neo focuses on multiple areas of discovery, which my colleague Mark Smith outlined. Most of the industry discussion about simplifying analytics has revolved around visualization rather than data discovery, which applies analytics that go beyond visualization, or information discovery, which addresses how we find and access information in a highly distributed environment. These areas are the next logical steps after visualization for software vendors to address, and IBM takes them seriously with Neo.

At the heart of Neo are the same capabilities found in IBM’s SPSSUntitled 1 Analytic Catalyst, which won the 2013 Ventana Research Innovation Award for analytics and which I wrote about. It also includes IBM’s BLU acceleration against the DB2 database, an in-memory optimization technique, which I have discussed as well, that provides access to the analysis of large data sets. The company’s Vivisimo acquisition, which is now called InfoSphere Data Explorer, adds information discovery capabilities. Finally, the Rapid Adaptive Visualization Engine (RAVE), which is IBM’s visualization approach across its portfolio, is layered on top for fast, extensible visualizations. Neo itself is a work in progress currently offered only over the cloud and back-ended by the DB2 database. However, following the acquisition earlier this year of SoftLayer, which provides a cloud infrastructure platform. I would expect to also have IBM make Neo to allow it to access more sources than just loaded data into IBM DB2.

IBM also recently started shipping SPSS Modeler 16.0. IBM bought SPSS in 2009 and has invested in Modeler heavily. Modeler Untitled 2(formerly SPSS Clementine) is an analytic workflow tool akin to others in the market such as SAS Enterprise Miner, Alteryx and more recent entries such as SAP Lumira. SPSS Modeler enables analysts at multiple levels to interact on analytics and do both data exploration and predictive analytics. Analysts can move data from multiple sources and integrate it into one analytic workflow. These are critical capabilities as our predictive analytics benchmark research shows: The biggest challenges to predictive analytics are architectural integration (for 55% of organizations) and lack of access to necessary source data (35%).

IBM has made SPSS the centerpiece of its analytic portfolio and offers it at three levels, Professional, Premium and Gold. With the top-level Gold edition, Modeler 16.0 includes capabilities that are ahead of the market: run-time integration with InfoSphere Streams (IBM’s complex event processing product), IBM’s Analytics Decision Management (ADM) and the information optimization capabilities of G2, a skunks-works project by led by Jeff Jonas, chief scientist of IBM’s Entity Analytics Group.

Integration with InfoSphere Streams that won a Ventana Research Technology Innovation award in 2013 enables event processing to occur in an analytic workflow within Modeler. This is a particularly compelling capability as the so-called “Internet of things” begins to evolve and the ability to correlate multiple events in real time becomes crucial. In such real-time environments, often quantified in milliseconds, events cannot be pushed back into a database and wait to be analyzed.

Decision management is another part of SPSS Modeler. Once models are built, users need to deploy them, which often entails steps such as integrating with rules and optimizing parameters. In a next best offer situation in a retail banking environment, for instance, a potential customer may score highly on propensity want to take out a mortgage and buy a house, but other information shows that the person would not qualify for the loan. In this case, the model itself would suggest telling the customer about mortgage offers, but the rules engine would override it and find another offer to discuss. In addition, there are times when optimization exercises are needed such as Monte Carlo simulations to help to figure out parameters such as risk using “what-if” modelling. In many situations, to gain competitive advantage, all of these capabilities must be rolled into a production environment where individual records are scored in real time against the organization’s database and integrated with the front-end system such as a call center application. The net capability that IBM’s ADM  brings is the ability to deploy analytical models into the business without consuming significant resources.

G2 is a part of Modeler and developed in IBM’s Entity Analytics Group. The group is garnering a lot of attention both internally and externally for its work around “entity analytics” – the idea that each information entity has characteristics that are revealed only in contextual information – charting innovative methods in the areas of data integration and privacy. In the context of Modeler this has important implications for bringing together disparate data sources that naturally link together but otherwise would be treated separately. A core example is that an individual may have multiple email addresses in different databases, has changed addresses or changed names perhaps due to a new marital status. Through machine-learning processes and analysis of the surrounding data, G2 can match records and attach them with some certainty to one individual. The system also strips out personally identifiable information (PII) to meet privacy and compliance standards. Such capabilities are critical for business as our latest benchmark research on information optimization shows that two in five organizations have more than 10 different data sources that they need to integrate and that the ability to simplify access to these systems is important to virtually all organizations (97%).

With the above capabilities, SPSS Modeler Gold edition achieves  market differentiation, but IBM still needs to show the advantage of base editions such as Modeler Professional. The marketing issue for SPSS Modeler is that it is considered a luxury car in a market being infiltrated by compacts and kit cars. In the latter case there is the R programming language, which is open-source and ostensibly free, but the challenge here is that companies need R programmers to run everything. SPSS Modeler and other such visually oriented tools (many of which integrate with open source R) allow easier collaboration on analytics, and ultimately the path to value is shorter. Even at its base level Modeler is an easy-to-use and capable statistical analysis tool that allows for collaborative workgroups and is more mature than many others in the market.

Companies must consider predictive analytics capabilities or Untitledrisk being left behind. Our research into predictive analytics shows that two-thirds of companies see predictive analytics as providing competitive advantage (68%) and particularly important in revenue-generating functions such as marketing (for 70%) and forecasting (72%). Companies currently looking into discovery analytics may want to try Neo, which will be available in beta in early 2014. Those interested in predictive analytics should consider the different levels of SPSS 16.0 as well as IBM’s flagship Signature Solutions, which I have covered. IBM has documented use cases that can give users guidance in terms of leading-edge deployment patterns and leveraging analytics for competitive advantage. If you have not taken a look at the depth of the analytic technology portfolio at IBM, I would make sure to do so, as you might miss some fundamental advancements to the processing of data and analytics to provide the valuable insights required to operate effectively in the global marketplace.


Tony Cosentino

VP and Research Director

A few months ago, I wrote an article on the four pillars of big data analytics. One of those pillars is what is called discovery analytics or where visual analytics and data discovery combine together to meet the business and analyst needs. My colleague Mark Smith subsequently clarified the four types of discovery analytics: visual discovery, data discovery, information discovery and event discovery. Now I want to follow up with a discussion of three trends that our research has uncovered in this space. (To reference how I’m using these four discovery terms, please refer to Mark’s post.)

The most prominent of these trends is that conversations about visual discovery are beginning to include data discovery, and vendors are developing and delivering such tool sets today. It is well-known that while big data profiling and the ability to visualize data give us a broader capacity for understanding, there are limitations that can be vr_predanalytics_predictive_analytics_obstaclesaddressed only through data mining and techniques such as clustering and anomaly detection. Such approaches are needed to overcome statistical interpretation challenges such as Simpson’s paradox. In this context, we see a number of tools with different architectural approaches tackling this obstacle. For example, Information Builders, Datameer, BIRT Analytics and IBM’s new SPSS Analytic Catalyst tool all incorporate user-driven data mining directly with visual analysis. That is, they combine data mining technology with visual discovery for enhanced capability and more usability. Our research on predictive analytics shows that integrating predictive analytics into the existing architecture is the most pressing challenge (for 55% or organizations). Integrating data mining directly into the visual discovery process is one way to overcome this challenge.

The second trend is renewed focus on information discovery (i.e., search), especially among large enterprises with widely distributed systems as well as the big data vendors serving this market. IBM acquired Vivisimo and has incorporated the technology into its PureSystems and big data platform. Microsoft recently previewed its big data information discovery tool, Data Explorer. Oracle acquired Endeca and has made it a key component of its big data strategy. SAP added search to its latest Lumira platform. LucidWorks, an independent information discovery vendor that provides enterprise support for open source Lucene/Solr, adds search as an API and has received significant adoption. There are different levels of search, from documents to social media data to machine data,  but I won’t drill into these here. Regardless of the type of search, in today’s era of distributed computing, in which there’s a need to explore a variety of data sources, information discovery is increasingly important.

The third trend in discovery analytics is a move to more embeddable system architectures. In parallel with the move to the cloud, architectures are becoming more service-oriented, and the interfaces are hardened in such a way that they can integrate more readily with other systems. For example, the visual discovery market was born on the client desktop with Qlik and Tableau, quickly moved to server-based apps and is now moving to the cloud. Embeddable tools such as D3, which is essentially a visualization-as-a-service offering, allow vendors such as Datameer to include an open source library of visualizations in their products. Lucene/Solr represents a similar embedded technology in the information discovery space. The broad trend we’re seeing is with RESTful-based architectures that promote a looser coupling of applications and therefore require less custom integration. This move runs in parallel with the decline in Internet Explorer, the rise of new browsers and the ability to render content using JavaScript Object Notation (JSON). This trend suggests a future for discovery analysis embedded in application tools (including, but not limited to, business intelligence). The environment is still fragmented and in its early stage. Instead of one cloud, we have a lot of little clouds. For the vendor community, which is building more platform-oriented applications that can work in an embeddable manner, a tough question is whether to go after the on-premises market or the cloud market. I think that each will have to make its own decision on how to support customer needs and their own business model constraints.


Tony Cosentino

VP and Research Director

Actuate recently announced BIRT Analytics Version 4.2, part of its portfolio of business intelligence software. The new release includes several techniques used by analytics professionals placed behind a user-friendly interface that does not require advanced knowledge of statistics. Beyond the techniques themselves, release 4.2 focuses on guiding users through processes such as campaign analytics and targeting.

With the release, Actuate is focusing on what I have already assessed in BIRT Analytics and to support more advanced analytics within organizations like marketing. For these users, a handful of analytical techniques cover the majority of uses cases. Our benchmark research into predictive analytics shows that vr_predanalytics_top_predictive_techniques_usedclassification trees (used by 69% of participants), regression techniques (66%), association rules (49%) and k-nearest neighbor algorithms (36%) are the techniques used most often. While BIRT Analytics uses Holt-Winters exponential smoothing for forecasting rather than linear regression and k-means for clustering, the key point is that it addresses the most important uses in the organization through a nontechnical user interface. Using techniques like regression or supervised learning algorithms increases complexity, and such analysis often requires formidable statistical knowledge from the user. In addition to the techniques mentioned above, BIRT Analytics reduces complexity by offering Venn diagram set analysis, a geographic mapping function, and the ability to compare attributes using z-score analysis. A z-score is a standardized unit of measure (relative to the model parameters mu and sigma) that represents how far away from a model’s mean a particular measurement rests. The higher the absolute value of the z-score, the more significant the attribute. This analysis is a simple way of showing things such as the likelihood that a particular email campaign segment will respond to a particular offer; such knowledge helps marketers understand what drives response rates and build lift into a marketing campaign. With this analytical tool set, the marketer or front-line analyst is able to dive directly into cluster analysis, market basket analysis, next-best-offer analysis, campaign analysis, attribution modeling, root-cause analysis and target marketing analysis in order to impact outcome metrics such as new customer acquisition, share-of-wallet, customer loyalty and retention.

Actuate also includes the iWorkflow application in release 4.2. It enables users to set business rules based on constantly calculated measurements and their variance relative to optimal KPI values. If the value falls outside of the critical range, it can start an automated process or send a notification for manual effort to remedy a situation. For instance, if an important customer satisfaction threshold is not being met, the system can notify a customer experience manager to take action that corrects the situation . In the same way, the iWorkflow tool allows users to preprogram distribution of analytical results across the organization based on particular roles or security criteria. As companies work to link market insights with operational objectives, Actuate ought to integrate more tightly with applications from companies such as Eloqua, Marketo and Today this has to be done manually and prevents the automation of closed-loop workflows in areas such as campaign management and customer experience management. Once this is done, however, the tool becomes more valuable to users. The ability to embed analytics into the workflows of the applications themselves is the next challenge for vendors of tools for visual discovery and data discovery.

Other enhancements to BIRT Analytics address data loading and data preparation. The data loader adds a drag-and-drop capability for mapped fields, can incorporate both corporate data and personal data from the desktop and automates batch loading. New preprocessing techniques include scaling approaches and data mapping. The abilities to load data into the columnar store from different information sources and to manipulate the data in databases are important areas that Actuate should continue to develop. Information sources will always be more important than the tools themselves, and data preprocessing is still where most analysts spend the bulk of their time.

BIRT Analytics has been overlooked by many companies in the United States since the roots of the company are in Spain, but the vr_bti_br_whats_driving_change_to_technology_selectiontechnology offers capabilities on par with many of the leaders in the BI category, and some are even more advanced. According to our business technology innovation benchmark research, companies are instituting new technology because of bottom-line considerations such as improvements in business initiatives (60%) and in processes (57%). Furthermore, usability is the top evaluation criterion for business intelligence tools in almost two-thirds (64%) of companies, according to our research on next-generation business intelligence. These are among the reasons we are seeing mass adoption of discovery tools such as BIRT Analytics. Those looking into discovery tools, and especially marketing departments that want to put a portfolio of analytics directly into the hands of the marketing analyst and the data-savvy marketer, should consider BIRT Analytics 4.2.


Tony Cosentino

VP and Research Director

Big data analytics is being offered as the key to addressing a wide array of management and operational needs across business and IT. But the label “big data analytics” is used in a variety of ways, confusing people about its usefulness and value and about how best to implement to drive business value. The uncertainty this causes poses a challenge for organizations that want to take advantage of big data in order to gain competitive advantage, comply with regulations, manage risk and improve profitability.

Recently, I discussed a high-level framework for thinking about big data analytics that aligns with former Census Director Robert Groves’ ideas of designed data on the one hand and organic data on the other. This second article completes that picture by looking at four specific areas that constitute the practical aspects of big data analytics – topics that must be brought into any holistic discussion of big data analytics strategy. Today, these often represent point-oriented approaches, but architectures are now coming to market that promise more unified solutions.

Big Data and Information Optimization: the intersection of big data analytics and traditional approaches to analytics. Analytics performed by database professionals often differ significantly from analytics delivered by line-of-business staffers who work in more flat-file-oriented environments. Today, advancements in in-memory systems, vr_bigdata_obstacles_to_big_data_analyticsin-database analytics and workload-specific appliances provide scalable architectures that bring processing to the data source and allow organizations to push analytics out to a broader audience, but how to bridge the divide between the two kinds of analytics is still a key question. Given the relative immaturity of new technologies and the dominance of relational databases for information delivery, it is critical to examine how all analytical assets will interact with core database systems.  As we move to operationalizing analytics on an industrial scale, the current advanced analytical approaches break down because it requires pulling data into a separate analytic environment and does not leverage advances in parallel computing. Furthermore, organizations need to determine how they can apply existing skill sets and analytical access paradigms such as business intelligence tools, SQL, spreadsheets and visual analysis, to big data analytics. Our recent big data benchmark research shows that the skills gap is the biggest issue facing analytics initiatives with staffing and training as an obstacle in over three quarters of organizations.

Visual analytics and data discovery: Visualizing data is a hot topic, especially in big data analytics. Much of big data analysis is about finding patterns in data and visualizing them so that people can tell a story and give context to large and diverse sets of data. Exploratory analytics allows us to develop and investigate hypotheses, reduce data, do root-cause analysis and suggest modeling approaches for our predictive analytics. Until now the focus of these tools has been on descriptive statistics related to SQL or flat file environments, but now visual analytics vendors are bringing predictive capabilities into the market to drive usability, especially at the business user level. This is a difficult challenge because the inherent simplicity of these descriptive visual tools clashes with the inherent complexity that defines predictive analytics. In addition, companies are looking to apply visualization to the output of predictive models as well. Visual discovery players are opening up their APIs in order to directly export predictive model output.

New tools and techniques in visualization along with the proliferation of in-memory systems allow companies the means of sorting through and making sense of big data, but exactly how these tools work, the types of visualizations that are important to big data analytics and how they integrate into our current big data analytics architecture are still key questions, as is the issue of how search-based data discovery approaches fit into the architectural landscape.

Predictive analytics: Visual exploration of data cannot surface all patterns, especially the most complex ones. To make sense of enormous data sets, data mining and statistical techniques can find patterns, relationships and anomalies in the data and use them to predict future outcomes for individual cases. Companies need to investigate the use of advanced analytic approaches and algorithmic methods that can transform and analyze organic data for uses such as predicting security threats, uncovering fraud or targeting product offers to particular customers.

Commodity models (a.k.a. good-enough models) are allowing business users to drive the modeling process. How these models can be vr_predanalytics_benifits_of_predictive_analyticsbuilt and consumed at the front line of the organization with only basic oversight by a statistician data scientist is a key area of focus as organizations endeavor to bring analytics into the fabric of the organization. The increased load on the back end systems is another key consideration if the modeling is a dynamic software driven approach. How these models are managed and tracked is yet another consideration. Our research on predictive analytics shows that companies that update their models more frequently have much higher satisfaction ratings than those that update on a less frequent basis.  The research further shows that in over half of organizations that competitive advantage and revenue growth are the primary reasons that predictive analytics are deployed.

Right-time and real-time analytics: It’s important to investigate the intersection of big data analytics with right-time and real-time systems and learn how participants are using big data analytics in production on an industrial scale. This usage guides the decisions that we make today around how to begin the task of big data analytics. Another choice organizations must make is whether to capture and store all of their data and analyze it on the back end, attempt to process it on the fly, or do both. In this context, event processing and decision management technologies represent a big part of big data analytics since they can help examine data streams for value and deliver information to the front lines of the organization immediately. How traditionally batch-oriented big data technologies such as Hadoop fit into the broader picture of right-time consumption still needs to be answered as well. Ultimately, as happens with many aspects of big data analytics, the discussion will need to center on the use case and how to address the time to value (TTV) equation.

Organizations embarking on a big data strategy must not fail to consider the four areas above. Furthermore, their discussions cannot cover just the technological approaches, but must include people, processes and the entire information landscape. Often, this endeavor requires a fundamental rethinking of organizational processes and questioning of the status quo.  Only then can companies see the forest for the trees.


Tony Cosentino
VP and Research Director

Actuate this week announced BIRT Analytics, and thereby puts itself firmly into supporting a range of business analytics needs from data discovery and visualization to a range of data mining and predictive capabilities that allows itself new avenues of growth. Actuate has long been a staple of large Business Intelligence deployments; in fact the company says that ActuateOne delivers more insights to more people than all other BI applications combined. This is likely true, given that Actuate is embedded in major consumer applications across industries worldwide. This announcement builds and utilizes its advancements into big data that I already assessed last year that can help it further expand its technology value to business and IT.

Tools such as BIRT Analytics can change the organizational culture aroundvr_ngbi_br_importance_of_bi_technology_considerations data and analytics. They put the power of data discovery and data visualization into the hands of tool-savvy managers as well as business analysts.  While Actuate has allowed highly functional and interactive dashboards in the past, BIRT Analytics brings the usability dimension to a different level. Usability is of the highest importance to 63 percent of organizations for business intelligence software, according to our next-generation business intelligence benchmark research, and one where BIRT Analytics and other tools in its class really show their value. The technology allows not just for visual data exploration, but also for new sources of data to be connected and analyzed without a predefined schema. This fits well with the current world of distributed computing, where everything can no longer be nicely modeled in one place. The software can gather data from different sources, including big data sources, flat files and traditional relational databases, and mash these up through visually appealing toolsets, allowing end user analysts to bypass IT and avoid much of the data preparation that has been a hallmark of business intelligence in the past. In fact our recent business technology innovation benchmark research shows that only a little more than half of companies are satisfied with their analytic processes, and 44 percent of organizations indicate the most time-consuming part of the analytics process is data-related tasks that Actuate is addressing with their ability to handle data efficiently.

Some of the advantages of the BIRT Analytics product are its fast in-memory engine,vr_predanalytics_benifits_of_predictive_analytics its ability to handle large amounts of data, and the more advanced analytic capabilities in the system. The company’s web site says it offers the fastest data loading tool in the industry with the FastDB main memory database system and an ability to explore 6 billion records in less than a second. These are impressive numbers, especially as we look at big data analytics, which often runs against terabytes of data. The usability of this tool’s analytics features is particularly impressive. For instance, set analysis, clustering and predictive capabilities are all part of the software, allowing analysts who aren’t necessarily data scientists to conduct advanced data analysis. These capabilities give tools like BIRT Analytics an advantage in the market since they offer simple end-user-driven ways to produce market segmentation and forecasting reports. These advancements help Actuate provide new benefits of its BIRT Analytics that according to our benchmark research on predictive analytics, 68 percent of organizations see predictive analytics as a source of competitive advantage.

Actuate already ranked as a hot vendor in the 2012 Ventana Research Business Intelligence Value Index thanks to its enterprise-level reliability and validation of its deployments which this release will help it even more in its ratings.  In the short term, BIRT Analytics will certainly boost Actuate’s market momentum and allow it to compete in areas where it would not have been seen before and help it expand its value to its existing customers.


Tony Cosentino

VP and Research Director

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.


  • 73,049 hits
%d bloggers like this: