You are currently browsing the tag archive for the ‘Hadoop’ tag.

Ventana Research recently completed the most comprehensive evaluation of analytics and business intelligence products and vendors available anywhere. As I discussed recently, such research is necessary and timely as analytics and business intelligence is now a fast-changing market. Our Value Index for Analytics and Business Intelligence in 2015 scrutinizes 15 top vendors and their product offerings in seven key
categories: Usability, Manageability, Reliability, Capability, Adaptability, Vendor Validation and TCO/ROI. The analysis shows that the top supplier is Information Builders, which qualifies as a Hot vendor and is followed by 10 other Hot vendors: SAP, IBM, MicroStrategy, Oracle, vr_VI_BI_2015_Weighted_OverallSAS, Qlik, Actuate (now part of OpenText) and Pentaho.

The evaluations drew on our research and analysis of vendors’ and products along with their responses to our detailed RFI or questionnaire, our own hands-on experience and the buyer-related findings from our benchmark research on next-generation business intelligence, information optimization and big data analytics. The benchmark research examines analytics and business intelligence from various perspectives to determine organizations’ current and planned use of these technologies and the capabilities they require for successful deployments.

We find that the processes that comprise business intelligence today have expanded beyond standard query, reporting, analysis and publishing capabilities. They now include sourcing and integration of data and at later stages the use of analytics for planning and forecasting and of capabilities utilizing analytics and metrics for collaborative interaction and performance management. Our research on big data analytics finds that new technologies collectively known as big data vr_Big_Data_Analytics_15_new_technologies_enhance_analyticsare influencing the evolution of business intelligence as well; here in-memory systems (used by 50% of participating organizations), Hadoop (42%) and data warehouse appliances (33%) are the most important innovations. In-memory computing in particular has changed BI because it enables rapid processing of even complex models with very large data sets. In-memory computing also can change how users access data through data visualization and incorporate data mining, simulation and predictive analytics into business intelligence systems. Thus the ability of products to work with big data tools figured in our assessments.

In addition, the 2015 Value Index includes assessments of their self-service tools and cloud deployment options. New self-service approaches can enable business users to reduce their reliance on IT to access and use data and analysis. However, our information optimization research shows that this change is slow to proliferate. In four out of five organizations, IT currently is involved in making information available to end users vr_Info_Optimization_01_whos_responsible_for_information_availabilityand remains entrenched in the operations of business intelligence systems.

Similarly, our research, as well as the lack of maturity of the cloud-based products evaluated, shows that organizations are still in the early stages of cloud adoption for analytics and business intelligence; deployments are mostly departmental in scope. We are exploring these issues further in our benchmark research into data and analytics in the cloud, which will be released in the second quarter of 2015.

The products offered by the five top-rated com­pa­nies in the Value Index provide exceptional functionality and a superior user experi­ence. However, Information Builders stands out, providing an excep­tional user experience and a completely integrated portfolio of data management, predictive analytics, visual discovery and operational intelligence capabilities in a single platform. SAP, in second place, is not far behind, having made significant prog­ress by integrating its Lumira platform into its BusinessObjects Suite; it added pre­dictive analytics capabilities, which led to higher Usability and Capability scores. IBM, MicroStrategy and Oracle, the next three, each provide a ro­bust integrated platform of capabilities. The key differentiator between them and the top two top is that they do not have superior scores in all of the seven categories.

In evaluating products for this Value Index we found some noteworthy innovations in business intelligence. One is Qlik Sense, which has a modern architecture that is cloud-ready and supports responsive design on mobile devices. Another is SAS Visual Analytics, which combines predictive analytics with visual discovery in ways that are a step ahead of others currently in the market. Pentaho’s Automated Data Refinery concept adds its unique Pentaho Data Integration platform to business intelligence for a flexible, well-managed user experience. IBM Watson Analytics uses advanced analytics and VR_AnalyticsandBI_VI_2015natural language processing for an interactive experience beyond the traditional paradigm of business intelligence. Tableau, which led the field in the category of Usability, continues to innovate in the area of user experience and aligning technology with people and process. MicroStrategy’s innovative Usher technology addresses the need for identity management and security, especially in an evolving era in which individuals utilize multiple devices to access information.

The Value Index analysis uncovered notable differences in how well products satisfy the business intelligence needs of employees working in a range of IT and business roles. Our analysis also found substantial variation in how products provide development, security and collaboration capabilities and role-based support for users. Thus, we caution that similar vendor scores should not be taken to imply that the packages evaluated are functionally identical or equally well suited for use by every organization or for a specific process.

To learn more about this research and to download a free executive summary, please visit.

Regards,

Ventana Research

Ventana Research recently completed the most comprehensive evaluation of analytics and business intelligence products and vendors available anywhere. As I discussed recently, such research is necessary and timely as analytics and business intelligence is now a fast-changing market. Our Value Index for Analytics and Business Intelligence in 2015 scrutinizes 15 top vendors and their product offerings in seven keyvr_VI_BI_2015_Weighted_Overall categories: Usability, Manageability, Reliability, Capability, Adaptability, Vendor Validation and TCO/ROI. The analysis shows that the top supplier is Information Builders, which qualifies as a Hot vendor and is followed by 10 other Hot vendors: SAP, IBM, MicroStrategy, Oracle, SAS, Qlik, Actuate (now part of OpenText) and Pentaho.

The evaluations drew on our research and analysis of vendors’ and products along with their responses to our detailed RFI or questionnaire, our own hands-on experience and the buyer-related findings from our benchmark research on next-generation business intelligence, information optimization and big data analytics. The benchmark research examines analytics and business intelligence from various perspectives to determine organizations’ current and planned use of these technologies and the capabilities they require for successful deployments.

We find that the processes that comprise business intelligence today have expanded beyond standard query, reporting, analysis and publishing capabilities. They now include sourcing and integration of data and at later stages the use of analytics for planning and forecasting and of capabilities utilizing analytics and metrics for collaborative interaction and performance management. Our research on big data analytics finds that new technologies collectively known as big data vr_Big_Data_Analytics_15_new_technologies_enhance_analyticsare influencing the evolution of business intelligence as well; here in-memory systems (used by 50% of participating organizations), Hadoop (42%) and data warehouse appliances (33%) are the most important innovations. In-memory computing in particular has changed BI because it enables rapid processing of even complex models with very large data sets. In-memory computing also can change how users access data through data visualization and incorporate data mining, simulation and predictive analytics into business intelligence systems. Thus the ability of products to work with big data tools figured in our assessments.

In addition, the 2015 Value Index includes assessments of their self-service tools and cloud deployment options. New self-service approaches can enable business users to reduce their reliance on IT to access and use data and analysis. However, our information optimization research shows that this change is slow to proliferate. In four out of five organizations, IT currently is involved in making information available to end users vr_Info_Optimization_01_whos_responsible_for_information_availabilityand remains entrenched in the operations of business intelligence systems.

Similarly, our research, as well as the lack of maturity of the cloud-based products evaluated, shows that organizations are still in the early stages of cloud adoption for analytics and business intelligence; deployments are mostly departmental in scope. We are exploring these issues further in our benchmark research into data and analytics in the cloud, which will be released in the second quarter of 2015.

The products offered by the five top-rated com­pa­nies in the Value Index provide exceptional functionality and a superior user experi­ence. However, Information Builders stands out, providing an excep­tional user experience and a completely integrated portfolio of data management, predictive analytics, visual discovery and operational intelligence capabilities in a single platform. SAP, in second place, is not far behind, having made significant prog­ress by integrating its Lumira platform into its BusinessObjects Suite; it added pre­dictive analytics capabilities, which led to higher Usability and Capability scores. IBM, MicroStrategy and Oracle, the next three, each provide a ro­bust integrated platform of capabilities. The key differentiator between them and the top two top is that they do not have superior scores in all of the seven categories.

In evaluating products for this Value Index we found some noteworthy innovations in business intelligence. One is Qlik Sense, which has a modern architecture that is cloud-ready and supports responsive design on mobile devices. Another is SAS Visual Analytics, which combines predictive analytics with visual discovery in ways that are a step ahead of others currently in the market. Pentaho’s Automated Data Refinery concept adds its unique Pentaho Data Integration platform to business intelligence for a flexible, well-managed user experience. IBM Watson Analytics uses advanced analytics and VR_AnalyticsandBI_VI_2015natural language processing for an interactive experience beyond the traditional paradigm of business intelligence. Tableau, which led the field in the category of Usability, continues to innovate in the area of user experience and aligning technology with people and process. MicroStrategy’s innovative Usher technology addresses the need for identity management and security, especially in an evolving era in which individuals utilize multiple devices to access information.

The Value Index analysis uncovered notable differences in how well products satisfy the business intelligence needs of employees working in a range of IT and business roles. Our analysis also found substantial variation in how products provide development, security and collaboration capabilities and role-based support for users. Thus, we caution that similar vendor scores should not be taken to imply that the packages evaluated are functionally identical or equally well suited for use by every organization or for a specific process.

To learn more about this research and to download a free executive summary, please visit.

Regards,

Ventana Research

Just a few years ago, the prevailing view in the software industry was that the category of business intelligence (BI) was mature and without room for innovation. Vendors competed in terms of feature parity and incremental advancements of their platforms. But since then business intelligence has grown to include analytics, data discovery tools and big data capabilities to process huge volumes and new types of data much faster. As is often the case with change, though, this one has created uncertainty. For example, only one in 11 participants in our benchmark research on big data analytics said that their organization fully agrees on the meaning of the term “big data analytics.”

There is little question that clear definitions of analytics and business intelligence as they are used in business today would be of value. But some IT analyst firms have tried to oversimplify the process of updating these definitions by merely combining a market basket of discovery capabilities under the label of analytics. In our estimation, this attempt is neither accurate nor useful. Discovery tools are only components of business intelligence, and their capabilities cannot accomplish all the tasks comprehensive BI systems can do. Some firms seem to want to reduce the field further by overemphasizing the visualization aspect of discovery. While visual discovery can help users solve basic business problems, other BI and analytic tools are available that can attack more sophisticated and technically challenging problems. In our view, visual discovery is one of four types of analytic discovery that can help organizations identify and understand the masses of data they accumulate today. But for many organizations visualization alone cannot provide them with the insights necessary to help make critical decisions, as interpreting the analysis requires expertise that mainstream business professionals lack.

In Ventana Research’s view, business intelligence is a technology managed by IT that is designed to produce information and reports from business data to inform business about the performance of activities, people and processes. It has provided and will continue to provide great value to business, but in itself basic BI will not meet the new generation of requirements that businesses face; they need not just information but guidance on how to take advantage of opportunities, address issues and mitigate the risks of subpar performance. Ventana_Research_Value_Index_LogoAnalytics is a component of BI that is applied to data to generate information, including metrics. It is a technology-based set of methodologies used by analysts as well as the information gained through the use of tools designed to help those professionals. These thoughtfully crafted definitions inform the evaluation criteria we apply in our new and comprehensive 2015 Analytics and Business Intelligence Value Index, which we will publish soon. As with all business tools, applications and systems we assess in this series of indexes, we evaluate the value of analytic and business intelligence tools in terms of five functional categories – usability, manageability, reliability, capability and adaptability – and two customer assurance categories – validation of the vendor and total cost of ownership and return on investment (TCO/ROI). We feature our findings in these seven areas of assessment in our Value Index research and reports. In the Analytics and Business Intelligence Value Index for 2015 we assess in depth the products of 15 of the leading vendors in today’s BI market.

The Capabilities category examines the breadth of functionality that products offer and assesses their ability to deliver the insights today’s enterprises need. For our analysis we divide this category into three subcategories for business intelligence: data, analytics and optimization. We explain each of them below.

The data subcategory of Capabilities examines data access and preparation along with supporting integration and modeling. New data sources are coming into being continually; for example, data now is generated in sensors in watches, smartphones, cars, airplanes, homes, utilities and an assortment of business, network, medical and military equipment. In addition, organizations increasingly are interested in behavioral and attitudinal data collected through various communication platforms. Examples include Web browser behavior, data mined from the Internet, social media and various survey and community polling data. The data access and integration process identifies each type of data, integrates it with all other relevant types, checks it all for quality issues, maps it back to the organization’s systems of record and master data, and manages its lineage. Master data management in particular, including newer approaches such as probabilistic matching, is a key component for creating a system that can combine data types across the organization and in the cloud to create a common organizational vernacular for the use of data.

Ascertaining which systems must be accessed and how is a primary challenge for today’s business intelligence platforms. A key part of data access is the user interface. Whether it appears in an Internet browser, a laptop, a smartphone, a tablet or a wearable device, data must be presented in a manner optimized for the interface. Examining the user interface for business intelligence systems was a primary interest of our 2014 Mobile Business Intelligence Value Index. In that research, we learned that vendors are following divergent paths and that it may be hard for some to change course as they continue. Therefore how a vendor manages mobile access and other new means impacts its products’ value for particular organizations.

Once data is accessed, it must be modeled in a useful way. Data models in the form of OLAP cubes and predefined relationships of data sometimes grow overly complex, but there is value in premodeling data in ways that make sense to business people, most of whom are not up to modeling it for themselves. Defining data relationships and transforming data through complex manipulations is often needed, for instance, to define performance indicators that align with an organization’s business initiatives. These manipulations can include business rules or what-if analysis within the context of a model or external to it. Finally, models must be flexible so they do not hinder the work of organizational users. The value of premodeling data is that it provides a common view for business users so they need not redefine data relationships that have already been thoroughly considered.

The analytics subcategory includes analytic discovery, prediction and integration. Discovery and prediction roughly map to the ideas of exploratory and confirmatory analytics, which I have discussed. Analytic discovery includes calculation and visualization processes that enable users to move quickly and easily through data to create the types of information they need for business purposes. Complementing it is prediction, which typically follows discovery. Discovery facilitates root-cause and historical analysis, but to look ahead and make decisions that produce desired business outcomes, organizations need to track various metrics and make informed predictions. Analytic integration encompasses customization of both discovery and predictive analytics and embedding them in other systems such as applications and portals.

The optimization subcategory includes collaboration, organizational management, information optimization, action and automation. Collaboration is a key consideration for today’s analytic platforms. It includes the ability to publish, share and coordinate various analytic and business intelligence functions. Notably, some recently developed collaboration platforms incorporate many of the characteristics of social platforms such as Facebook or LinkedIn. Organizational management attempts to manage to particular outcomes and sometimes provides performance indicators and scorecard frameworks. Action assesses how technology directly assists decision-making in an operational context. This includes gathering inputs and outputs for collaboration before and after a decision, predictive scoring that prescribes action and delivery of the information in the correct form to the decision-maker. Finally, automation triggers alerts in circumstances based on statistical triggers or rules and should be managed as part of a workflow. Agent technology takes automation to a level that is more proactive and autonomous.

vr_Info_Optim_Maturity_06_oraganization_maturity_by_dimensionsThis broad framework of data, analytics and optimization fits with a process orientation to business analytics that I have discussed. Our benchmark research on information optimization indicates that the people and process dimensions of performance are less well developed than the information and technology aspects, and thus a focus on these aspects of business intelligence and analytics will be beneficial.

In our view, it’s important to consider business intelligence software in a broad business context rather than in artificially separate categories that are designed for IT only. We advise organizations seeking to gain a competitive edge to adopt a multifaceted strategy that is business-driven, incorporates a complete view of BI and analytics, and uses the comprehensive evaluation criteria we apply.

Regards,

Ventana Research

In many organizations, advanced analytics groups and IT are separate, and there often is a chasm of understanding between them, as I have noted. A key finding in our benchmark research on big data analytics is that communication and knowledge sharing is a top benefit of big data analytics initiatives,vr_Big_Data_Analytics_06_benefits_realized_from_big_data_analytics but often it is a latent benefit. That is, prior to deployment, communication and knowledge sharing is deemed a marginal benefit, but once the program is deployed it is deemed a top benefit. From a tactical viewpoint, organizations may not spend enough time defining a common vocabulary for big data analytics prior to starting the program; our research shows that fewer than half of organizations have agreement on the definition of big data analytics. It makes sense therefore that, along with a technical infrastructure and management processes, explicit communication processes at the beginning of a big data analytics program can increase the chance of success. We found these qualities in the Chorus platform of Alpine Data Labs, which received the Ventana Research Technology Innovation Award for Predictive Analytics in September 2014.

VR2014_TechInnovation_AwardWinnerAlpine Chorus 5.0, the company’s flagship product, addresses the big data analytics communication challenge by providing a user-friendly platform for multiple roles in an organization to build and collaborate on analytic projects. Chorus helps organizations manage the analytic life cycle from discovery and data preparation through model development and model deployment. It brings together analytics professionals via activity streams for rapid collaboration and workspaces that encourage projects to be managed in a uniform manner. While activity streams enable group communication via short messages and file sharing, workspaces allow each analytic project to be managed separately with capabilities for project summary, tracking and data source mapping. These functions are particularly valuable as organizations embark on multiple analytic initiatives and need to track and share information about models as well as the multitude of data sources feeding the models.

The Alpine platform addresses the challenge of processing big data by parallelizing algorithms to run across big data platforms such as Hadoop and making it accessible by a wide audience of users. The platform supports most analytic databases and all major Hadoop distributions. Alpine was vr_Big_Data_Analytics_13_advanced_analytics_on_big_dataan early adopter of Apache Spark, an open source in-memory data processing framework that one day may replace the original map-reduce processing paradigm of Hadoop. Alpine Data Labs has been certified by Databricks, the primary contributor to the Spark project, which is responsible for 75 percent of the code added in the past year. With Spark, Alpine’s analytic models such as logistic regression run in a fraction of the time previously possible and new approaches, such as one the company calls Sequoia Forest, a machine learning approach that is a more robust version of random forest analysis. Our big data analytics research shows that predictive analytics is a top priority for about two-thirds (64%) of organizations, but they often lack the skills to deploy a fully customized approach. This is likely a reason that companies now are looking for more packaged approaches to implementing big data analytics (44%) than custom approaches (36%), according to our research. Alpine taps into this trend by delivering advanced analytics directly in Hadoop and the HDFS file system with its in-cluster analytic capabilities that address the complex parallel processing tasks needed to run in distributed environments such as Hadoop.

A key differentiator for Alpine is usability. Its graphical user interface provides a visual analytic workflow experience built on popular algorithms to deliver transformation capabilities and predictive analytics on big data. The platform supports scripts in the R language, which can be cut and pasted into the workflow development studio; custom operators for more advanced users; and Predictive Model Markup Language (PMML), which enables extensible model sharing and scoring across different systems. The complexities of the underlying data stores and databases as well as the orchestration of the analytic workflow are abstracted from the user. Using it an analyst or statistician does not need to know programming languages or the intricacies of the database technology to build analytic models and workflows.

It will be interesting to see what direction Alpine will take as the big data industry continues to evolve; currently there are many point tools, each strong in a specific area of the analytic process. For many of the analytic tools currently available in the market, co-opetition among vendors prevails in which partner ecosystems compete with stack-oriented approaches. The decisions vendors make in terms of partnering as well as research and development are often a function of these market dynamics, and buyers should be keenly aware of who aligns with whom.  For example, Alpine currently partners with Qlik and Tableau for data visualization but also offers its own data visualization tool. Similarly, it offers data transformation capabilities, but its toolbox could be complimented by data preparation and master data solutions. This emerging area of self-service data preparation is important to line-of-business analysts, as my colleague Mark Smith recently discussed.

Alpine Labs is one of many companies that have been gaining traction in the booming analytics market. With a cadre of large clients and venture capital backing of US$23 million in series A and B, Alpine competes in an increasingly crowded and diverse big data analytics market. The management team includes industry veterans Joe Otto and Steve Hillion. Alpine seems to be particularly well suited for customers that have a clear understanding of the challenges of advanced analytics vr_predanalytics_benefits_of_predictive_analytics_updatedand are committed to using it with big data to gain a competitive advantage. This benefit is what organizations find most in over two thirds (68%) of organizations according to our predictive analytics benchmark research. A key differentiator for Alpine Labs is the collaboration platform, which helps companies clear the communication hurdle discussed above and address the advanced analytics skills gap at the same time. The collaboration assets embedded into the application and the usability of the visual workflow process enable the product to meet a host of needs in predictive analytics. This platform approach to analytics is often missing in organizations grounded in individual processes and spreadsheet approaches. Companies seeking to use big data with advanced analytics tools should include Alpine Labs in their consideration.

Regards,

Ventana Research

Organizations should consider multiple aspects of deploying big data analytics. These include the type of analytics to be deployed, how the analytics will be deployed technologically and who must be involved both internally and externally to enable success. Our recent big data analytics benchmark research assesses each of these areas. How an organization views these deployment considerations may depend on the expected benefits of the big data analytics program and the particular business case to be made, which I discussed recently.

According to the research, the most important capability of big data analytics is predictive analytics (64%), but among companies vr_Big_Data_Analytics_08_top_capabilities_of_big_data_analyticsthat have deployed big data analytics, descriptive analytic approaches of query and reporting (74%) and data discovery (64%) are more readily available than predictive capabilities (57%). Such statistics may be a function of big data technologies such as Hadoop, and their associated distributions having prioritized the ability to run descriptive statistics through standard SQL, which is the most common method for implementing analysis on Hadoop. Cloudera’s Impala, Hortonworks’ Stinger (an extension of Apache Hive), MapR’s Drill, IBM’s Big SQL, Pivotal’s HAWQ and Facebook’s open-source contribution of Presto SQL all focus on accessing data through an SQL paradigm. It is not surprising then that the technology research participants use most for big data analytics is business intelligence (75%) and that the most-used analytic methods — pivot tables (46%), classification (39%) and clustering (37%) — are descriptive and exploratory in nature. Similarly, participants said that visualization of big data allows analysts to perform faster analysis (49%), understand context better (48%), perform root-cause analysis (40%) and display multiple result sets (40%), but visualization does not provide more advanced analytic capabilities. While various vendors now offer approaches to run advanced analytics on big data, the research shows that in terms of big data, organizational capabilities still revolve around more basic analytic access.

For companies that are implementing advanced analytic capabilities on big data, there are further analytic process considerations, and many have not yet tackled those. Model building and model deployment should be manageable and timely, involve specialized personnel, and integrate into the broader enterprise architecture. While our research provides an in-depth look at adoption of the different types of in-database analytics, deployment of advanced analytic sandboxes, data mining, model management, integration with business processes and overall model deployment, that is beyond the topic here.

Beyond analytic considerations, a host of technological decisionsvr_Big_Data_Analytics_13_advanced_analytics_on_big_data must be made around big data analytics initiatives. One of these is the degree of customization necessary. As technology advances, customization is giving way to more packaged approaches to big data analytics. According to our research, the majority (54%) of companies that have already implemented big data analytics did custom builds using big data-specific languages and interfaces. The most of those that have not yet deployed are likely to purchase a dedicated or packaged application (44%), followed by a custom build (36%). We think that this pre- and post-deployment comparison reflects a maturing market.

The move from custom approaches to standardized ones has important implications for the skills sets needed for a big data vr_Big_Data_Analytics_14_big_data_analytics_skillsanalytics initiative. In comparing the skills that organizations said they currently have to the skills they need to be successful with big data analytics, it is clear that companies should spend more time building employees’ statistical, mathematical and visualization skills. On the flip side, organizations should make sure their tools can support skill sets that they already have, such as use of spreadsheets and SQL. This is convergent with other findings about training needs, which include applying analytics to business problems (54%), training on big data analytics tools (53%), analytic concepts and techniques (46%) and visualizing big data (41%). The data shows that as approaches become more standardized and the market focus shifts toward them from customized implementations, skill needs are shifting as well. This is not to say that demand is moving away from the data scientist completely. According to our research, organizations that involve cross-functional teams or data scientists in the deployment process are realizing the most significant impact. It is clear that multiple approaches for personnel, departments and current vendors play a role in deployments and that some approaches will be more effective than others.

Cloud computing is another key consideration with respect to deploying analytics systems as well as sandbox modelling and testing environments. For deployment of big data analytics, 27 percent of companies currently use a cloud-based method, while 58 percent said they do not and 16 percent do not know what is used. Not surprisingly, far fewer IT professionals (19%) than business users (40%) said they use cloud-based deployments for big data analytics. The flexibility and capability that cloud resources provide is particularly attractive for sandbox environments and for organizations that lack big data analytic expertise. However, for big data model building, most organizations (42%) still utilize a dedicated internal sandbox environment to build models while fewer (19%) use a non-dedicated internal sandbox (that is, a container in a data warehouse used to build models) and others use a cloud-based sandbox either as a completely separate physical environment (9%) or as a hybrid approach (9%). From this last data we infer that business users are sometimes using cloud-based systems to do big data analytics without the knowledge of IT staff. Among organizations that are not using cloud-based systems for big data analytics, security (45%) is the primary reason that they do not.

Perhaps the most important consideration for big data analytics is choosing vendors to partner with to achieve organizational objectives. When we understand the move from custom technological approaches to more packaged ones and the types of analytics currently being implemented for big data, it is not surprising that a majority of research participants (52%) are looking to their business intelligence systems providers to supply their big data analytics solution. However, a significant number of companies (35%) said they will turn to a specialist analytics provider or their database provider (34%). When evaluating big data analytics, usability is the most important vendor consideration but not by as wide a margin as in categories such as business intelligence. A look at criteria rated important and very important by research participants reveals usability is the highest ranked (94%), but functionality (92%) and reliability (90%) follow closely. Among innovative new technologies, collaboration is important (78%) while mobile access (46%) is much less so. Coupled with the finding that communication and knowledge sharing combined is an important benefit of big data analytics, it is clear that organizations are cognizant of the collaborative imperative when choosing a big data analytics product.

Deployment of big data analytics starts with forethought and a well-defined business case that includes the expected benefits I discussed in my previous analysis. Once the outcome-driven framework is established, organizations should consider the types of analytics needed, the enabling technologies and the people and processes necessary for implementation. To learn more about our big data analytics research, download a copy of the executive summary here.

Regards,

Tony Cosentino

VP & Research Director

SAS Institute, a long-established provider analytics software, showed off its latest technology innovations and product road maps at its recent analyst conference. In a very competitive market, SAS is not standing still, and executives showed progress on the goals introduced at last year’s conference, which I coveredSAS’s Visual Analytics software, integrated with an in-memory analytics engine called LASR, remains the company’s flagship product in its modernized portfolio. CEO Jim Goodnight demonstrated Visual Analytics’ sophisticated integration with statistical capabilities, which is something the company sees as a differentiator going forward. The product already provides automated charting capabilities, forecasting and scenario analysis, and SAS probably has been doing user-experience testing, since the visual interactivity is better than what I saw last year. SAS has put Visual Analytics on a six-month release cadence, which is a fast pace but necessary to keep up with the industry.

Visual discovery alone is becoming an ante in the analytics market,vr_predanalytics_benefits_of_predictive_analytics_updated since just about every vendor has some sort of discovery product in its portfolio. For SAS to gain on its competitors, it must make advanced analytic capabilities part of the product. In this regard, Dr. Goodnight demonstrated the software’s visual statistics capabilities, which can switch quickly from visual discovery into regression analysis running multiple models simultaneously and then optimize the best model. The statistical product is scheduled for availability in the second half of this year. With the ability to automatically create multiple models and output summary statistics and model parameters, users can create and optimize models in a more timely fashion, so the information can be come actionable sooner. In our research on predictive analytics, the most participants (68%) cited competitive advantage as a benefit of predictive analytics, and companies that are able to update their models daily or more often, our research also shows, are very satisfied with their predictive analytics tools more often than others are. The ability to create models in an agile and timely manner is valuable for various uses in a range of industries.

There are three ways that SAS allows high performance computing. The first is the more traditional grid approach which distributes processing across multiple nodes. The second is the in-database approach that allows SAS to run as a process inside of the database. vr_Big_Data_Analytics_08_top_capabilities_of_big_data_analyticsThe third is extracting data and running it in-memory. The system has the flexibility to run on different large-scale database types such as MPP as well Hadoop infrastructure through PIG and HIVE. This is important because for 64 percent of organizations, the ability to run predictive analytics on big data is a priority, according to our recently released research on big data analytics. SAS can run via MapReduce or directly access the underlying Hadoop Distributed File System and pull the data into LASR, the SAS in-memory system. SAS works with almost all commercial Hadoop implementations, including Cloudera, Hortonworks, EMC’s Pivotal and IBM’s InfoSphere BigInsights. The ability to put analytical processes into the MapReduce paradigm is compelling as it enables predictive analytics on big data sets in Hadoop, though the immaturity of initiatives such as YARN may relegate the jobs to batch processing for the time being. The flexibility of LASR and the associated portfolio can help organizations overcome the challenge of architectural integration, which is the most widespread technological barrier to predictive analytics (for 55% of participants in that research). Of note is that the SAS approach provides purely analytical engine, and since there is no SQL involved in the algorithms, its overhead related to SQL is non-existent and it runs directly on the supporting system’s resources.

As well as innovating with Visual Analytics and Hadoop, SAS has a clear direction in its road map, intending to integrate the data integration and data quality aspects of the portfolio in a singlevr_Info_Optimization_04_basic_information_tasks_consume_time workflow with the Visual Analytics product. Indeed, data preparation is still a key sticking point for organizations. According to our benchmark research on information optimization, time spent in analytic tasks is still consumed most by data preparation (for 47%) and data quality and consistency (45%). The most valuable task, interpretation of the data, ranks fourth at 33 percent of analytics time. This is a big area of opportunity in the market, as reflected by the flurry of funding for data preparation software companies in the fourth quarter of 2013. For further analysis of SAS’s data management and big data efforts, please read my colleague Mark Smith’s analysis.

Established relationships with companies like Teradata and a reinvigorated relationship with SAP position SAS to remain at the heart of enterprise analytic architectures. In particular, the co-development effort that allow the SAS predictive analytic workbench to run on top of SAP HANA is promising, which raises the question of how aggressive SAP will be in advancing its own advanced analytic capabilities on HANA. One area where SAS could learn from SAP is in its developer ecosystem. While SAP has thousands of developers building applications for HANA, SAS could do a better job of providing the tools developers need to extend the SAS platform. SAS has been able to prosper with a walled-garden approach, but the breadth and depth of innovation across the technology and analytics industry puts this type of strategy under pressure.

Overall, SAS impressed me with what it has accomplished in the past year and the direction it is heading in. The broad-based development efforts raise a final question of where the company should focus its resources. Based on its progress in the past year, it seems that a lot has gone into visual analytics, visual statistics, LASR and alignment with the Hadoop ecosystem. In 2014, the company will continue horizontal development, but there is a renewed focus on specific analytic solutions as well. At a minimum, the company has good momentum in retail, fraud and risk management, and manufacturing. I’m encouraged by this industry-centric direction because I think that the industry needs to move away from the technology-oriented V’s toward the business-oriented W’s.

For customers already using SAS, the company’s road map is designed to capture market advantage with minimal disruption to existing environments. In particular, focusing on solutions as well as technological depth and breadth is a viable strategy. While it still may make sense for customers to look around at the innovation occurring in analytics, moving to a new system will often incur high switching costs in productivity as well as money. For companies just starting out with visual discovery or predictive analytics, SAS Visual Analytics provides a good point of entry, and SAS has a vision for more advanced analytics down the road.

Regards,

Tony Cosentino

VP and Research Director

While covering providers of business analytics software, it is also interesting for me to look at some that focus on the people, process and implementation aspects in big data and analytics. One such company is Nuevora, which uses a flexible platform to provide customized analytic solutions. I recently met the company’s founder, Phani Nagarjuna, when we appeared on a panel at the Predictive Analytics World conference in San Diego.

Nuevora focuses on big data and analytics from the perspective of the analytic life cycle; that is, it helps companies bring together data and process, visualize and model the data to reach specific business outcomes. Nuevora aims to package implementations of analytics for vertical industries by putting together data sources and analytical techniques, and designing the package to be consumed by a target user group. While the core of the analytic service may be the same within an industry category, each solution is customized to the particulars of the client and its view of the market. Using particular information sources and models depending on their industry, customers can take advantage of advances in big data and analytics including new data sources and technologies. For its part Nuevora does not have to reinvent the wheel for each engagement. It has established patterns of data processing and prebuilt predictive analytics apps that are based on best practices and designed to solve specific problems within industry segments.

The service is currently delivered via a managed service on Nuevora servers called the Big Data Analytics & Apps Platform (nBAAP), but the company’s roadmap calls for more of a software as a service (SaaS) delivery model. Currently nBAAP uses Hadoop for data processing, R for predictive analytics and Tableau for visualizations. This approach brings together best-of-breed point solutions to address specific business issues. As a managed service, it has flexibility in design, and the company can reuse existing SAS and SPSS code for predictive models and can integrate with different BI tools depending on the customer’s environment.

Complementing the nBAAP approach is the Big Data & Analytics Maturity (nBAM) Assessment Framework. This is an industry-based consulting framework that guides companies through their analytic planning process by looking at organizational goals and objectives, establishing a baseline of the current environment, and putting forward a plan that aligns with the analytical frameworks and industry-centric approaches in nBAAP.

From an operating perspective, Nagarjuna, a native of India, taps analytics talent from universities there and places strategic solution vr_predanalytics_usage_of_predictive_analyticsconsultants in client-facing roles in the United States. The company focuses primarily on big data analytics in marketing, which makes sense since, according to our benchmark research on predictive analytics, revenue-generating functions such as forecasting (cited by 72% of organizations) and marketing (67%) are the two primary use cases for predictive analytics. Nuevora has mapped multiple business processes related to processes such as gaining a 360-degree view of the customer. For example, at a high-level, it divides marketing into areas such as retention, cross-sell and up-sell, profitability and customer lifetime value. These provide building blocks for the overall strategy of the organization, and each can be broken down into finer divisions, linkages and algorithms based on the industry. These building blocks also serve as the foundation for the deployment patterns of raw data and preselected data variables, metrics, models, visuals, model update guidelines and expected outcomes.

By providing preprocessing capabilities that automatically produce the analytic data set, then providing updated and optimized models, and finally enabling consumption of these models through the relevant user paradigm, Nuevora addresses some of the key challenges in analytics today. The first is data preparation, which our research shows takes from 40 to 60 percent of analysts’ time. The second is addressing outdated models. Our research on predictive analytics shows that companies that update their models often are much more satisfied with them than are those that do not. While the appropriate timing of model updates is relative to the business context and market changes, our research shows that about one month is optimal.

Midsize or larger companies looking to take advantage of big data and analytics matched with specific business outcomes, without having to hire data scientists and build a full solution internally, should consider Nuevora.

Regards,

Tony Cosentino

VP and Research Director

Datameer , a Hadoop-based analytics company, had a major presence at recent Hadoop Summit, led by CEO Stefan Groschupf’s keynote and panel appearance. Besides announcing its latest product release, which is an important advance for the company and its users, Datameer’s outspoken CEO put forth contrarian arguments about the current direction of some of the distributions in the Hadoop ecosystem.

The challenge for the growing ecosystem surrounding Hadoop, the open source processing paradigm, has been in accessing data and vr_bigdata_obstacles_to_big_data_analytics (2)building analytics that serve business uses in a straightforward manner. Our benchmark research into big data shows that the two most pressing challenges to big data analytics are staffing (79%) and training (77%). This so-called skills gap is at the heart of the Hadoop debate since it often takes someone with not just domain skills but also programming and statistical skills to derive value from data in a Hadoop cluster. Datameer is dedicated to addressing this challenge by integrating its software directly with the various Hadoop distributions to provide analytics and access tools, which include visualization and a spreadsheet interface. My coverage of Datameer from last year covers this approach in more detail.

At the conference, Datameer made the announcement of version 3.0 of its namesake product with a celebrity twist. Olympic athlete Sky Christopherson presented a keynote telling how the U.S. women’s cycling team, a heavy underdog, used Datameer to help it earn a silver medal in London. Following that introduction, Groschupf, one of the original contributors to Nutch (Hadoop’s predecessor), discussed features of Datameer 3.0 and what the company is calling “Smart” analytics, which include a variety of advanced analytic techniques such as clustering, decision trees, recommendations and column dependencies.

Our benchmark research into predictive analytics shows thatvr_predanalytics_top_predictive_techniques_used classification trees (used by 69% of participants), association rules (49%) are two of the techniques used most often; both are included in the Datameer product. (Note: Datameer utilizes K-means, an unsupervised clustering approach, rather than K-nearest neighbor which is a supervised clustering approach.) Both on stage and in a private briefing, company spokespeople downplayed the specific techniques in favor of the usability aspects and examples of business use for each of them. Clustering of Hadoop data allows marketing and business analytics professionals to view how data groups together naturally while decision trees help analysts see how sets group and deconstruct from a linear subset perspective rather than from a framed Venn diagram perspective. In this regard clustering is more of a bottom-up approach and decision trees more of a top-down approach. For instance, in a cluster analysis, the analyst combines multiple attributes at one time to understand the dimensions upon which the data group. This can inform broad decisions about strategic messaging and product development. In contrast, with a decision tree, one can look, for instance, at all sales data to see which industries are most likely to buy a product, then follow the tree to see what size of companies within the industry are the best prospects, and then the subset of buyers within those companies who are the best targets.

Datameer’s column dependencies can show analysts relationships between different column variables. The output appears much like a correlation matrix, but uses a technique called Mutual Information. The key benefit of this technique over a traditional correlation approach is that it allows comparison between different types of variables, such as continuous and categorical variables. However, there is a trade-off in usability: The numeric output is not represented by the correlation coefficient with which many analysts are familiar. (I encourage Datameer to give analysts a quick reference of some type to help interpret the numbers associated with this less-known output.) Once the output is understood, it can be useful in exploring specific relationships and testing hypotheses. For instance, a company can test the hypothesis that it is more vertically focused than competitors by looking at industry and deal close rates. If there is no relationship between the variables, the hypothesis may be dismissed and a more horizontal strategy pursued.

The other technique Datameer spoke of is recommendation, also known as next best offer analysis; it is a relatively well known technique that has been popularized by Amazon and other retailers. Recommendation engines can help marketing and sales teams increase share of wallet through cross-sell and up-sell opportunities. While none of these four techniques is new to the world of analytics, the novelty is that Datameer allows this analysis directly on Hadoop, which incorporates new forms of data including Web behavior data and social media data. While many in the Hadoop ecosystem focus on descriptive analysis related to SQL, Datameer’s foray into more advanced analytics pushes the Hadoop envelope.

Aside from the launch of Datameer 3.0, Groschupf and his team used Hadoop Summit to espouse the position that the SQL approach of many Hadoop vendors is a mistake. The crux of the argument is that Hadoop is a sequential access technology (much like a magnetic cassette tape) in which a large portion of the data must be read before the correct data can be pulled off the disk. Groschupf argues that this is fundamentally inefficient and that current MPP SQL approaches do a much better job of processing SQL-related tasks. To illustrate the difference he characterized Hadoop as a freight train and an analytic appliance database as a Ferrari; each, of course, has its proper uses. Customers thus should decide what they want to do with the data from a business perspective and then chose the appropriate technology.

This leads to another point Groschupf made to me: that the big data discussion is shifting away from the technical details to a business orientation. In support of this point, he  showed me a comparison of the Google search terms “big data” and “Hadoop.” The latter was more common in the past few years, when it was almost synonymous with big data, but now generic searches for big data are more common. Our benchmark research into business technology innovation shows a similar shift in buying criteria, with about two-thirds (64%) of buyers naming usability as the most important priority. By the way, a number of Ventana Research blogs including this one have focused on the trend of outcome based buying and decision making.

For organizations curious about big data and what they can do to take advantage of it, Datameer can be a low-risk place to start exploring. The company offers a free download version of its product so you can start looking at data immediately. The idea of time-to-value is critical with big data, and this is a key value proposition for Datameer. I encourage users to test the product with an eye to uncover interesting data that was never available for analysis before. This will help build the big data business use case especially in a bootstrap funding environment where money, skills and time are short.

Regards,

Tony Cosentino

VP and Research Director

Hadoop Summit is the biggest event on the West Coast centered on Hadoop, the open source technology for large-scale data processing. The conference organizers, Hortonworks, estimated that more than 2,400 people attended, which if true would be double-digit growth from last year. Growth on the supplier side was even larger, which indicates the opportunity this market represents. Held in Silicon Valley, the event attracts enterprise customers, industry innovators, thought leaders and venture capitalists. Many announcements were made – too many to cover here. But I want to comment on a few important ones and explain what they mean to the emerging Hadoop ecosystem and the broader market.

Hortonworks is a company spun off by the architects of Yahoo’s Hadoop implementation. Flush with $50 million in new venture funding, the company announced the preview distribution of Apache Hadoop 2.0. This represents a fundamental shift away from the batch-only approach to processing big data of the previous generation. In particular, YARN (Yet Another Resource Manager; Yahoo roots are evident in this name) promises to solve the challenge of multiple workloads running on one cluster. YARN replaces the Hadoop Data Platform (HDP) job scheduler. In that system, a MapReduce job sees itself as the only tenant on HDFS, the Hadoop file system, and precludes any other workload. In YARN, MapReduce becomes a client of the resource manager, which can allocate resources according to differing workload needs. According to Bob Page, product manager at Hortonworks, and Shaun Connolly, VP of corporate strategy, this mixed workload capability opens the door to additional ISV plug-ins including advanced analytics and vr_predanalytics_predictive_analytics_obstaclesstream processing. Integrating workloads is an important step forward for advanced analytics; our benchmark research into predictive analytics shows that the biggest challenge to predictive analytics for more than half (55%) of companies is integrating it into the enterprise architecture. Furthermore, stream processing opens the door to a variety of uses in operational intelligence such as fraud prevention and network monitoring that have not been possible with Hadoop. The company plans general availability of Apache Hadoop 2.0 in the fall. Beyond, YARN, the new version of Hadoop will bring YARN, Hive on Tez for SQL query support, high availability, snapshots, disaster recovery and better rolling upgrade support. Hortonworks simultaneously announced a certification program that allows application providers to be certified on the new version. This next major release of Hadoop is a significant step to the enterprise readiness of Hadoop, and Hortonworks who depends on the open source releases for commercializing and licensing it to customers will now be able to better compete against some of its competitors who have built their own proprietary extensions to Hadoop as part of their offerings.

As noted, various vendors announced their own Hadoop advances at the summit. For one, Teradata continues to expand its Hadoop-based product portfolio and its Unified Data Architecture that I covered recently. The company introduced the Teradata Appliance for Hadoop as well as support for Hadoop utilizing Dell’s commodity hardware. While another Hadoop appliance in the market may not be big news, the commitment of Teradata to the Hadoop community is important. Its professional services work in close partnership with Hortonworks, and now they will offer full scoping and integration services along with the current support services. This enables Teradata to maintain its trusted advisor role within accounts while Hortonworks can take advantage of a robust services and account management structure to help create new business.

Quentin Clark, Microsoft’s VP for SQL Server, gave a keynote addressvr_ss21_spreadsheets_arent_easily_replaced acknowledging the sea change that is occurring as a result of Hadoop. But he emphasized Microsoft’s entrenched position with Excel and SQL Server and the ability to use them alongside Hortonworks for big data. It’s a sound if unfortunate argument that spreadsheets are not going away soon; in our latest benchmark research into spreadsheets 56 percent of participants said that user resistance is the biggest obstacle to change. At the same time, Microsoft has challenges in big data such as a truly useable interface beyond Microsoft Excel, which I recently discussed. At Hadoop Summit, Microsoft reiterated announcements already made in May and were covered by my colleague. They included Hortonworks Data Platform for Windows, in which Hadoop becomes a key priority operating system alongside of SQLServer, and HDsight running on Azure, Microsoft’s cloud platform. The relationship will help Hortonworks overcome objections about the security and manageability of its platform, while Microsoft should benefit from increased sales of its System Center, Active Directory and Windows software. Microsoft also announced the HDP Management Packs for Systems Center that makes Hadoop easier to manage on Windows or Linux and utilizes Ambari API for integration. Perhaps the most interesting demonstration from Microsoft was the preview of Data Explorer. This application provides text-based search across multiple data sources, after which the system can import the various data sources automatically, independent of their type or location. Along with companies like Lucidworks (which my colleague Mark Smith recently discussed, and Splunk, Microsoft is advancing in the important area of information discovery, one of the four types of big data discovery Mark follows.

Datameer made the important announcement of version 3.0 of its namesake flagship product with a celebrity twist. Olympic athlete Sky Christopherson presented a keynote telling how the U.S. women’s cycling team, a heavy underdog, used Datameer to help it earn a silver medal in London. Following that, Stefan Groschupf, CEO of Datameer and one of the original contributors to Nutch (Hadoop’s predecessor), discussed advances in 3.0, which include a variety of advanced analytic techniques such as clustering, decision trees, recommendations and column dependencies. The ability to do these types of advanced analytics and visualize the data natively on Hadoop is not currently available in the market. My coverage of Datameer from last year can be found here.

Splunk announced Hunk, a tool that integrates exploration and visualization in Hadoop and will enable easier access for ‘splunking’ Hadoop clusters. In this tool, Splunk introduces a virtual indexing technology in which indexing occurs in an ad-hoc fashion as it is fed into a columnar data store. This enables analysts to test hypotheses through a “slice and dice” approach once the initial search discovery phase is completed. Sanjay Meta, VP of marketing for Splunk, explained to me how such a tool enables faster time-to-value for Hadoop. Currently there are multiple requests for data resting in Hadoop, but it takes a data scientist to access them. By applying Splunk’s tools to the Hadoop world, the data scientists can move on to more valuable tasks while users trained in Splunk can register and address such requests. Hunk is still somewhat technical in nature and requires specific Splunk training, but the demonstration showed a no-code approach that through the user-friendly Splunk interface returns robust descriptive data in visual form, which can then be worked with in an iterative fashion. My most recent analysis of Splunk can be found here.

Pentaho also made several announcements. The biggest in terms of market impact is that it has become the sole ETL provider for Rackspace’s Hadoop-as-a-service initiative, which aims to deliver a full big data platform in the cloud. Pentaho also announced the Pentaho Labs initiative which will be the R&D arm for the Pentaho open source community. This move should lift both the enterprise and the community, especially in the context of Pentaho’s recent acquisition of Webdetails, a Portuguese analytics and visualization company active in Pentaho’s open source community. The company also announced Adaptive Big Data Layer, which provides a series of plug-ins across the Hadoop ecosystem including all of the major distributions. And a new partnership with Splunk enables read/write access to the Splunk data fabric. Pentaho also is providing tighter integration with MongoDB (including aggregation frameworks) and the Cassandra DBMS. Terilyn Palanca, director of Pentaho’s big data product marketing, and Rebecca Shomair, corporate communications director, made the point that companies need to hedge their bets within the increasingly divergent Hadoop ecosystem and that Pentaho can help them reduce risk in this regard. Mark Smith’s most recent analysis of Pentaho can be found here.

In general what struck me most about this exciting week in the world of Hadoop are the divergent philosophies and incentives at work in the market. The distributions of Map R, Hortonworks, Cloudera and Pivotal (Greenplum) continue to compete for dominance with varying degrees of proprietary and open source approaches. Teradata is also becoming a subscription reseller of Hortonworks HDP to provide even more options to its customers. Datameer and Platfora are taking pure-play integrated approaches, and Teradata, Microsoft and Pentaho are looking at ways to marry the old with the new by guarding current investments and adding new hadoop based capabilities. Another thing that struck me was that no Business intelligence vendors had a presence outside of visual discovery provider, Tableau. This is curious given that many vendors this week talked about responding to the demands of business users for easier access to Hadoop data. This is something our research shows to be a buying trend in today’s environment: Usability is the most important buying criterion in almost two out of three (64%) organizations. Use cases say a lot about usability and people talking on stage and off about their Hadoop experiences increased dramatically this year. I recently wrote about how much has changed in the use of big data in one year, and the discussions at Hadoop Summit confirmed my thoughts. Hortonworks and it ecosystem of partners are now able to further gain opportunity to meet a new generation of big data and information optimization needs. At the same time, we are still in the early stages of turning this technology to business use that requires a focus on use cases and gaining benefits on a continuous basis. Disruptive innovations often take decades to be fully embraced by organizations and society at large. Keep in mind that it was only in December 2004 that Google Labs published its groundbreaking paper on MapReduce.

Regards,

Tony Cosentino

VP and Research Director

Users of big data analytics are finally going public. At the Hadoop Summit last June, many vendors were still speaking of a large retailer or a big bank as users but could not publically disclose their partnerships. Companies experimenting with big data analytics felt that their proof of concept was so innovative that once it moved into production, it would yield a competitive advantage to the early mover. Now many companies are speaking openly about what they have been up to in their business laboratories. I look forward to attending the 2013 Hadoop Summit in San Jose to see how much things have changed in just a single year for Hadoop centered big data analytics.

Our benchmark research into operational intelligence, which I argue is another name for real-time big data analytics, shows diversity in big data analytics use cases by industry. The goals of operational intelligence are an interesting mix as the research shows relative parity among managing performance (59%), detecting fraud and security (59%), complying with regulations (58%) and managing risk (58%), but when we drill down into different industries there are some interesting nuances. For instance, healthcare and banking are driven much more by risk and regulatory compliance, services such as retail are driven more by performance, and manufacturing is driven more by cost reduction. All of these make sense given the nature of the businesses. Let’s look at them in more detail.

vr_oi_goals_of_using_operational_intelligenceThe retail industry, driven by market forces and facing discontinuous change, is adopting big data analytics out of competitive necessity. The discontinuity comes in the form of online shopping and the need for traditional retailers to supplement their brick-and-mortar locations. JCPenney and Macy’s provide a sharp contrast in how two retailers approached this challenge. A few years ago, the two companies eyed a similar competitive space, but since that time, Macy’s has implemented systems based on big data analytics and is now sourcing locally for online transactions and can optimize pricing of its more than 70 million SKUs in just one hour using SAS High Performance Analytics. The Macy’s approach has, in Sun-Tzu like fashion, made the “showroom floor” disadvantage into a customer experience advantage. JCPenney, on the other hand, used gut-feel management decisions based on classic brand merchandising strategies and ended up alienating its customers and generating law suits and a well-publicized apology to its customers. Other companies including Sears are doing similarly innovative work with suppliers such as Teradata and innovative startups like Datameer in data hub architectures build around Hadoop.

Healthcare is another interesting market for big data, but the dynamics that drive it are less about market forces and more about government intervention and compliance issues. Laws around HIPPA, the recent Healthcare Affordability Act, OC-10 and the HITECH Act of 2009 all have implications for how these organizations implement technology and analytics. Our recent benchmark research on governance, risk and compliance indicates that many companies have significant concerns about compliance issues: 53 percent of participants said they are concerned about them, and 42 percent said they are very concerned. Electronic health records (EHRs) are moving them to more patient-centric systems, and one goal of the Affordable Care Act is to use technology to produce better outcomes through what it calls meaningful use standards.  Facing this title wave of change, companies including IBM analyze historical patterns and link it with real-time monitoring, helping hospitals save the lives of at-risk babies. This use case was made into a now-famous commercial by advertising firm Ogilvy about the so-called data babies. IBM has also shown how cognitive question-and-answer systems such as Watson assist doctors in diagnosis and treatment of patients.

Data blending, the ability to mash together different data sources without having to manipulate the underlying data models, is another analytical technique gaining significant traction. Kaiser Permanente is able to use tools from Alteryx, which I have assessed, to consolidate diverse data sources, including unstructured data, to streamline operations to improve customer service. The two organizations made a joint presentation similar to the one here at Alteryx’s user conference in March.

vr_grc_worried_about_grcFinancial services, which my colleague Robert Kugel covers, is being driven by a combination of regulatory forces and competitive market forces on the sales end. Regulations produce a lag in the adoption of certain big data technologies, such as cloud computing, but areas such as fraud and risk management are being revolutionized by the ability, provided through in-memory systems, to look at every transaction rather than only a sampling of transactions through traditional audit processes. Furthermore, the ability to pair advanced analytical algorithms with in-memory real-time rules engines helps detect fraud as it occurs, and thus criminal activity may be stopped at the point of transaction. On a broader scale, new risk management frameworks are becoming the strategic and operational backbone for decision-making in financial services.

On the retail banking side, copious amounts of historical customer data from multiple banking channels combined with government data and social media data are providing banks the opportunity to do microsegmentation and create unprecedented customer intimacy. Big data approaches to micro-targetting and pricing algorithms, which Rob recently discussed in his blog on Nomis, enable banks and retailers alike to target individuals and customize pricing based on an individual’s propensity to act. While partnerships in the financial services arena are still held close to the vest, the universal financial services providers – Bank of America, Citigroup, JPMorgan Chase and Wells Fargo – are making considerable investments into all of the above-mentioned areas of big data analytics.

Industries other than retail, healthcare and banking are also seeing tangible value in big data analytics. Governments are using it to provide proactive monitoring and responses to catastrophic events. Product and design companies are leveraging big data analytics for everything from advertising attribution to crowdsourcing of new product innovation. Manufacturers are preventing downtime by studying interactions within systems and predicting machine failures before they occur. Airlines are recalibrating their flight routing systems in real time to avoid bad weather. From hospitality to telecommunications to entertainment and gaming, companies are publicizing their big data-related success stories.

Our research shows that until now, big data analytics has primarily been the domain of larger, digitally advanced enterprises. However, as use cases make their way through business and their tangible value is accepted, I anticipate that the activity around big data analytics will increase with companies that reside in the small and midsize business market. At this point, just about any company that is not considering how big data analytics may impact its business faces an unknown and uneasy future. What a difference a year makes, indeed.

Regards,

Tony Cosentino

VP and Research Director

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Stats

  • 73,106 hits
%d bloggers like this: