You are currently browsing the tag archive for the ‘Datameer’ tag.

A few months ago, I wrote an article on the four pillars of big data analytics. One of those pillars is what is called discovery analytics or where visual analytics and data discovery combine together to meet the business and analyst needs. My colleague Mark Smith subsequently clarified the four types of discovery analytics: visual discovery, data discovery, information discovery and event discovery. Now I want to follow up with a discussion of three trends that our research has uncovered in this space. (To reference how I’m using these four discovery terms, please refer to Mark’s post.)

The most prominent of these trends is that conversations about visual discovery are beginning to include data discovery, and vendors are developing and delivering such tool sets today. It is well-known that while big data profiling and the ability to visualize data give us a broader capacity for understanding, there are limitations that can be vr_predanalytics_predictive_analytics_obstaclesaddressed only through data mining and techniques such as clustering and anomaly detection. Such approaches are needed to overcome statistical interpretation challenges such as Simpson’s paradox. In this context, we see a number of tools with different architectural approaches tackling this obstacle. For example, Information Builders, Datameer, BIRT Analytics and IBM’s new SPSS Analytic Catalyst tool all incorporate user-driven data mining directly with visual analysis. That is, they combine data mining technology with visual discovery for enhanced capability and more usability. Our research on predictive analytics shows that integrating predictive analytics into the existing architecture is the most pressing challenge (for 55% or organizations). Integrating data mining directly into the visual discovery process is one way to overcome this challenge.

The second trend is renewed focus on information discovery (i.e., search), especially among large enterprises with widely distributed systems as well as the big data vendors serving this market. IBM acquired Vivisimo and has incorporated the technology into its PureSystems and big data platform. Microsoft recently previewed its big data information discovery tool, Data Explorer. Oracle acquired Endeca and has made it a key component of its big data strategy. SAP added search to its latest Lumira platform. LucidWorks, an independent information discovery vendor that provides enterprise support for open source Lucene/Solr, adds search as an API and has received significant adoption. There are different levels of search, from documents to social media data to machine data,  but I won’t drill into these here. Regardless of the type of search, in today’s era of distributed computing, in which there’s a need to explore a variety of data sources, information discovery is increasingly important.

The third trend in discovery analytics is a move to more embeddable system architectures. In parallel with the move to the cloud, architectures are becoming more service-oriented, and the interfaces are hardened in such a way that they can integrate more readily with other systems. For example, the visual discovery market was born on the client desktop with Qlik and Tableau, quickly moved to server-based apps and is now moving to the cloud. Embeddable tools such as D3, which is essentially a visualization-as-a-service offering, allow vendors such as Datameer to include an open source library of visualizations in their products. Lucene/Solr represents a similar embedded technology in the information discovery space. The broad trend we’re seeing is with RESTful-based architectures that promote a looser coupling of applications and therefore require less custom integration. This move runs in parallel with the decline in Internet Explorer, the rise of new browsers and the ability to render content using JavaScript Object Notation (JSON). This trend suggests a future for discovery analysis embedded in application tools (including, but not limited to, business intelligence). The environment is still fragmented and in its early stage. Instead of one cloud, we have a lot of little clouds. For the vendor community, which is building more platform-oriented applications that can work in an embeddable manner, a tough question is whether to go after the on-premises market or the cloud market. I think that each will have to make its own decision on how to support customer needs and their own business model constraints.

Regards,

Tony Cosentino

VP and Research Director

Datameer , a Hadoop-based analytics company, had a major presence at recent Hadoop Summit, led by CEO Stefan Groschupf’s keynote and panel appearance. Besides announcing its latest product release, which is an important advance for the company and its users, Datameer’s outspoken CEO put forth contrarian arguments about the current direction of some of the distributions in the Hadoop ecosystem.

The challenge for the growing ecosystem surrounding Hadoop, the open source processing paradigm, has been in accessing data and vr_bigdata_obstacles_to_big_data_analytics (2)building analytics that serve business uses in a straightforward manner. Our benchmark research into big data shows that the two most pressing challenges to big data analytics are staffing (79%) and training (77%). This so-called skills gap is at the heart of the Hadoop debate since it often takes someone with not just domain skills but also programming and statistical skills to derive value from data in a Hadoop cluster. Datameer is dedicated to addressing this challenge by integrating its software directly with the various Hadoop distributions to provide analytics and access tools, which include visualization and a spreadsheet interface. My coverage of Datameer from last year covers this approach in more detail.

At the conference, Datameer made the announcement of version 3.0 of its namesake product with a celebrity twist. Olympic athlete Sky Christopherson presented a keynote telling how the U.S. women’s cycling team, a heavy underdog, used Datameer to help it earn a silver medal in London. Following that introduction, Groschupf, one of the original contributors to Nutch (Hadoop’s predecessor), discussed features of Datameer 3.0 and what the company is calling “Smart” analytics, which include a variety of advanced analytic techniques such as clustering, decision trees, recommendations and column dependencies.

Our benchmark research into predictive analytics shows thatvr_predanalytics_top_predictive_techniques_used classification trees (used by 69% of participants), association rules (49%) are two of the techniques used most often; both are included in the Datameer product. (Note: Datameer utilizes K-means, an unsupervised clustering approach, rather than K-nearest neighbor which is a supervised clustering approach.) Both on stage and in a private briefing, company spokespeople downplayed the specific techniques in favor of the usability aspects and examples of business use for each of them. Clustering of Hadoop data allows marketing and business analytics professionals to view how data groups together naturally while decision trees help analysts see how sets group and deconstruct from a linear subset perspective rather than from a framed Venn diagram perspective. In this regard clustering is more of a bottom-up approach and decision trees more of a top-down approach. For instance, in a cluster analysis, the analyst combines multiple attributes at one time to understand the dimensions upon which the data group. This can inform broad decisions about strategic messaging and product development. In contrast, with a decision tree, one can look, for instance, at all sales data to see which industries are most likely to buy a product, then follow the tree to see what size of companies within the industry are the best prospects, and then the subset of buyers within those companies who are the best targets.

Datameer’s column dependencies can show analysts relationships between different column variables. The output appears much like a correlation matrix, but uses a technique called Mutual Information. The key benefit of this technique over a traditional correlation approach is that it allows comparison between different types of variables, such as continuous and categorical variables. However, there is a trade-off in usability: The numeric output is not represented by the correlation coefficient with which many analysts are familiar. (I encourage Datameer to give analysts a quick reference of some type to help interpret the numbers associated with this less-known output.) Once the output is understood, it can be useful in exploring specific relationships and testing hypotheses. For instance, a company can test the hypothesis that it is more vertically focused than competitors by looking at industry and deal close rates. If there is no relationship between the variables, the hypothesis may be dismissed and a more horizontal strategy pursued.

The other technique Datameer spoke of is recommendation, also known as next best offer analysis; it is a relatively well known technique that has been popularized by Amazon and other retailers. Recommendation engines can help marketing and sales teams increase share of wallet through cross-sell and up-sell opportunities. While none of these four techniques is new to the world of analytics, the novelty is that Datameer allows this analysis directly on Hadoop, which incorporates new forms of data including Web behavior data and social media data. While many in the Hadoop ecosystem focus on descriptive analysis related to SQL, Datameer’s foray into more advanced analytics pushes the Hadoop envelope.

Aside from the launch of Datameer 3.0, Groschupf and his team used Hadoop Summit to espouse the position that the SQL approach of many Hadoop vendors is a mistake. The crux of the argument is that Hadoop is a sequential access technology (much like a magnetic cassette tape) in which a large portion of the data must be read before the correct data can be pulled off the disk. Groschupf argues that this is fundamentally inefficient and that current MPP SQL approaches do a much better job of processing SQL-related tasks. To illustrate the difference he characterized Hadoop as a freight train and an analytic appliance database as a Ferrari; each, of course, has its proper uses. Customers thus should decide what they want to do with the data from a business perspective and then chose the appropriate technology.

This leads to another point Groschupf made to me: that the big data discussion is shifting away from the technical details to a business orientation. In support of this point, he  showed me a comparison of the Google search terms “big data” and “Hadoop.” The latter was more common in the past few years, when it was almost synonymous with big data, but now generic searches for big data are more common. Our benchmark research into business technology innovation shows a similar shift in buying criteria, with about two-thirds (64%) of buyers naming usability as the most important priority. By the way, a number of Ventana Research blogs including this one have focused on the trend of outcome based buying and decision making.

For organizations curious about big data and what they can do to take advantage of it, Datameer can be a low-risk place to start exploring. The company offers a free download version of its product so you can start looking at data immediately. The idea of time-to-value is critical with big data, and this is a key value proposition for Datameer. I encourage users to test the product with an eye to uncover interesting data that was never available for analysis before. This will help build the big data business use case especially in a bootstrap funding environment where money, skills and time are short.

Regards,

Tony Cosentino

VP and Research Director

Users of big data analytics are finally going public. At the Hadoop Summit last June, many vendors were still speaking of a large retailer or a big bank as users but could not publically disclose their partnerships. Companies experimenting with big data analytics felt that their proof of concept was so innovative that once it moved into production, it would yield a competitive advantage to the early mover. Now many companies are speaking openly about what they have been up to in their business laboratories. I look forward to attending the 2013 Hadoop Summit in San Jose to see how much things have changed in just a single year for Hadoop centered big data analytics.

Our benchmark research into operational intelligence, which I argue is another name for real-time big data analytics, shows diversity in big data analytics use cases by industry. The goals of operational intelligence are an interesting mix as the research shows relative parity among managing performance (59%), detecting fraud and security (59%), complying with regulations (58%) and managing risk (58%), but when we drill down into different industries there are some interesting nuances. For instance, healthcare and banking are driven much more by risk and regulatory compliance, services such as retail are driven more by performance, and manufacturing is driven more by cost reduction. All of these make sense given the nature of the businesses. Let’s look at them in more detail.

vr_oi_goals_of_using_operational_intelligenceThe retail industry, driven by market forces and facing discontinuous change, is adopting big data analytics out of competitive necessity. The discontinuity comes in the form of online shopping and the need for traditional retailers to supplement their brick-and-mortar locations. JCPenney and Macy’s provide a sharp contrast in how two retailers approached this challenge. A few years ago, the two companies eyed a similar competitive space, but since that time, Macy’s has implemented systems based on big data analytics and is now sourcing locally for online transactions and can optimize pricing of its more than 70 million SKUs in just one hour using SAS High Performance Analytics. The Macy’s approach has, in Sun-Tzu like fashion, made the “showroom floor” disadvantage into a customer experience advantage. JCPenney, on the other hand, used gut-feel management decisions based on classic brand merchandising strategies and ended up alienating its customers and generating law suits and a well-publicized apology to its customers. Other companies including Sears are doing similarly innovative work with suppliers such as Teradata and innovative startups like Datameer in data hub architectures build around Hadoop.

Healthcare is another interesting market for big data, but the dynamics that drive it are less about market forces and more about government intervention and compliance issues. Laws around HIPPA, the recent Healthcare Affordability Act, OC-10 and the HITECH Act of 2009 all have implications for how these organizations implement technology and analytics. Our recent benchmark research on governance, risk and compliance indicates that many companies have significant concerns about compliance issues: 53 percent of participants said they are concerned about them, and 42 percent said they are very concerned. Electronic health records (EHRs) are moving them to more patient-centric systems, and one goal of the Affordable Care Act is to use technology to produce better outcomes through what it calls meaningful use standards.  Facing this title wave of change, companies including IBM analyze historical patterns and link it with real-time monitoring, helping hospitals save the lives of at-risk babies. This use case was made into a now-famous commercial by advertising firm Ogilvy about the so-called data babies. IBM has also shown how cognitive question-and-answer systems such as Watson assist doctors in diagnosis and treatment of patients.

Data blending, the ability to mash together different data sources without having to manipulate the underlying data models, is another analytical technique gaining significant traction. Kaiser Permanente is able to use tools from Alteryx, which I have assessed, to consolidate diverse data sources, including unstructured data, to streamline operations to improve customer service. The two organizations made a joint presentation similar to the one here at Alteryx’s user conference in March.

vr_grc_worried_about_grcFinancial services, which my colleague Robert Kugel covers, is being driven by a combination of regulatory forces and competitive market forces on the sales end. Regulations produce a lag in the adoption of certain big data technologies, such as cloud computing, but areas such as fraud and risk management are being revolutionized by the ability, provided through in-memory systems, to look at every transaction rather than only a sampling of transactions through traditional audit processes. Furthermore, the ability to pair advanced analytical algorithms with in-memory real-time rules engines helps detect fraud as it occurs, and thus criminal activity may be stopped at the point of transaction. On a broader scale, new risk management frameworks are becoming the strategic and operational backbone for decision-making in financial services.

On the retail banking side, copious amounts of historical customer data from multiple banking channels combined with government data and social media data are providing banks the opportunity to do microsegmentation and create unprecedented customer intimacy. Big data approaches to micro-targetting and pricing algorithms, which Rob recently discussed in his blog on Nomis, enable banks and retailers alike to target individuals and customize pricing based on an individual’s propensity to act. While partnerships in the financial services arena are still held close to the vest, the universal financial services providers – Bank of America, Citigroup, JPMorgan Chase and Wells Fargo – are making considerable investments into all of the above-mentioned areas of big data analytics.

Industries other than retail, healthcare and banking are also seeing tangible value in big data analytics. Governments are using it to provide proactive monitoring and responses to catastrophic events. Product and design companies are leveraging big data analytics for everything from advertising attribution to crowdsourcing of new product innovation. Manufacturers are preventing downtime by studying interactions within systems and predicting machine failures before they occur. Airlines are recalibrating their flight routing systems in real time to avoid bad weather. From hospitality to telecommunications to entertainment and gaming, companies are publicizing their big data-related success stories.

Our research shows that until now, big data analytics has primarily been the domain of larger, digitally advanced enterprises. However, as use cases make their way through business and their tangible value is accepted, I anticipate that the activity around big data analytics will increase with companies that reside in the small and midsize business market. At this point, just about any company that is not considering how big data analytics may impact its business faces an unknown and uneasy future. What a difference a year makes, indeed.

Regards,

Tony Cosentino

VP and Research Director

I had a refreshing call this morning with a vendor that did not revolve around integration of systems, types of data, and the intricacies of NoSQL approaches. Instead, the discussion was about how its business users analyze an important and complex problem and how the company’s software enables that analysis. The topic of big data never came up, and it was not needed, because the conversation was business-driven and issue-specific.

By contrast, we get a lot of briefings that start with big data’s impact on business, but devolve into details about how data is accessed and the technology architecture. Data access and integration are important, but when we talk about big data analytics, focusing on the business issues is even more critical. Our benchmark research into big data shows that companies employ storage (95%) and reporting (94%) of big data, but very few use it for data mining (55%) and what-if scenario modeling (49%). That must change. Descriptive analysis on big data is quickly turning into table stakes; the real competitive value of big data analytics is in the latter two categories.

Not every big data vendor drowns its message in technospeak. IBM, for instance, stokes the imagination with analytical systems such as Watson and does a good job of bringing its business-focused story to a diverse audience through its Global Business Services arm. Some newer players paint compelling pictures as well. Companies such as PivotLink, PlanView and SuccessFactors (now part of SAP) deliver analytics stories from different organizational perspectives. Part of their advantage is that they start from a cloud and application perspective, but they also tell the analytics story in context of business, not in context of technology.

Providing that business perspective is a more difficult task for BI companies that have been pitching their software to IT departments for years, but even some of these have managed to buck this trend.  Alteryx, for instance, differentiates itself by putting forward compelling industry-specific use cases, and espousing the concept of the data artisan. This right-brain/left-brain approach appeals to both the technical and business sides of the house. Datameer also does a good job of producing solid business use cases. Its recent advancements in visualization help the company paint the analytical picture from a business perspective. Unfortunately, other examples seem few and far between. Most companies are still caught pitching technology-centric solutions, despite the fact that, in the new world of analytics, it’s about business solutions, not features on a specification sheet.

This focus on business issues over technology is important because the business side of the house today controls more and more of the technology spending. While business managers understand business and often have a firm grasp of analytics, they don’t always understand or care about the intricacies of different processing techniques and data models. In our upcoming benchmark research on next-generation BI systems, the data from which I’m currently analyzing, we see this power shift clearly. While IT still has veto power, decisions are being driven by business users and being ratified at the top of the organization.

The Ventana Research Maturity Model from our business analytics benchmark research shows that the analytics category is still immature, with only 15 percent of companies reaching the innovative level. So how do we begin to change this dialog from a technology-driven discussion to a business-driven discussion? From the client perspective, it starts with a blue sky approach, since the technological limitations that drove the old world of analytics no longer exist. This blank canvas may be framed by metrics such as revenue, profit and share of wallet, but the frame is now extending itself into less tangible and forward-looking areas such as customer churn and brand equity. If these output metrics are the frame, it’s the people, process, information and tools that are our brushes with which we paint. The focal point of the piece is always the customer.

If a business has a hard time thinking in terms of a blank canvass, it can examine a number of existing cases that show the value of utilizing big data analytics to help illuminate customer behavior, web usage, security, location, fraud, regulation and compliance. Some of the bigger ones are briefly discussed in my recent blog entry on predictive analytics.

The big data industry, if we can call it that, is quickly moving from a focus on the technology stack to a focus on tangible business outcomes and time-to-value (TTV). The innovations of the last few years have enabled companies to take a blue sky perspective and do things that they have never thought possible. The key is to start with the business problem you are looking to solve; the technology will work itself out from there.

Regards,

Tony Cosentino

VP and Research Director

As volumes of data grow in organizations, so do the number of deployments of Hadoop, and as Hadoop becomes widespread, more organizations demand data analysis, ease of use and visualization of large data sets. In our benchmark research on Hadoop, 88 percent of organizations said analyzing Hadoop data is important, and in our research on business analytics 89 percent said it is important to make it simpler to provide analytics and metrics to all users who need them. As my colleague Mark Smith has noted, Datameer has an ambitious plan to tackle these issues. It aims to provide a single solution in lieu of the common three-step process involving data integration, data warehouse and BI, giving analysts the ability to apply analytics and visualization to find the dynamic “why” behind data rather than just the static “what.”

The Datameer approach places Hadoop at the center of the computing environment rather than looking at it as simply another data source. This, according to company officers, allows Datameer to analyze large, diverse data sets in ways that traditional approaches cannot, which in turn enables end users to answer questions that may have fallen outside of the purview of the standard information architecture. However, Datameer does not offer its software as a replacement for traditional systems but as a complement to them. The company positions its product to analyze interaction data and data relationships to supplement transactional data analysis of which both are key types of big data that need analysis. Of course, given that most companies are not likely to rip and replace years of system investment and user loyalty, this coexistence strategy is a pragmatic one.

Datameer approaches analytics via a spreadsheet environment. This, too, is pragmatic because, as our business analytics benchmark research shows, spreadsheets are the number-one tool used to generate analytics (by 60% of organizations). Datameer provides descriptive analysis and an interactive dialog box for nested joins of large data sets, but the tool moves beyond traditional analysis with its ability to provide analytics for unstructured data. Path and pattern analyses enable discovery of patterns in massive data sets. Relational statistics, including different cluster techniques, allow for data reduction and latent variable groupings. Data parsing technology is a big part of unstructured data analysis, and Datameer provides prebuilt algorithms for social media text analytics and blogs, among other sources. In all, more than 200 prebuilt algorithms come standard in the Datameer tool set. In addition, users can access spreadsheet macros, open APIs to integrate functions and use the Predictive Model Markup Language (PMML) for model exchange.

In Datameer’s latest version 2.0 it has advanced in providing business infographics tool that provides a visualization layer that enables exploratory data analysis (EDA) through a standard library of widgets, including graphs, charts, diagrams, maps and word clouds. Visualization is one of the key areas lacking in big data deployments today. Analysts work in a free-form layout environment with an easy-to-use drag-and-drop paradigm. Datameer’s WYSIWYG editor provides real-time management of the creation and layout of infographics, allowing analysts to see exactly what the end design will look like as they create it. It also now distributes through HTML5, which allows cross-platform delivery to multiple environments. This is particularly important as Datameer is targeting the enterprise environment, and HTML5 provides a low-maintenance “build once, deploy anywhere” model for mobile platforms.

Datameer is an innovative company, but its charter is a big one, given that it is in a competitive environment at multiple levels of the value delivery chain. Its ability to seamlessly integrate analytics and visualization tools on the Hadoop platform is a unique value proposition; at the same time, it will likely need to put more effort into visualization that is available from other data discovery players. All in all, for enterprises looking to take advantage of large-scale data in the near term that don’t want to wait for other vendors to provide integrated tools on top of Hadoop, Datameer is a company to consider.

Regards,

Tony Cosentino – VP & Research Director

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Stats

  • 73,277 hits
%d bloggers like this: