You are currently browsing the tag archive for the ‘Big data analytics’ tag.

Organizations should consider multiple aspects of deploying big data analytics. These include the type of analytics to be deployed, how the analytics will be deployed technologically and who must be involved both internally and externally to enable success. Our recent big data analytics benchmark research assesses each of these areas. How an organization views these deployment considerations may depend on the expected benefits of the big data analytics program and the particular business case to be made, which I discussed recently.

According to the research, the most important capability of big data analytics is predictive analytics (64%), but among companies vr_Big_Data_Analytics_08_top_capabilities_of_big_data_analyticsthat have deployed big data analytics, descriptive analytic approaches of query and reporting (74%) and data discovery (64%) are more readily available than predictive capabilities (57%). Such statistics may be a function of big data technologies such as Hadoop, and their associated distributions having prioritized the ability to run descriptive statistics through standard SQL, which is the most common method for implementing analysis on Hadoop. Cloudera’s Impala, Hortonworks’ Stinger (an extension of Apache Hive), MapR’s Drill, IBM’s Big SQL, Pivotal’s HAWQ and Facebook’s open-source contribution of Presto SQL all focus on accessing data through an SQL paradigm. It is not surprising then that the technology research participants use most for big data analytics is business intelligence (75%) and that the most-used analytic methods — pivot tables (46%), classification (39%) and clustering (37%) — are descriptive and exploratory in nature. Similarly, participants said that visualization of big data allows analysts to perform faster analysis (49%), understand context better (48%), perform root-cause analysis (40%) and display multiple result sets (40%), but visualization does not provide more advanced analytic capabilities. While various vendors now offer approaches to run advanced analytics on big data, the research shows that in terms of big data, organizational capabilities still revolve around more basic analytic access.

For companies that are implementing advanced analytic capabilities on big data, there are further analytic process considerations, and many have not yet tackled those. Model building and model deployment should be manageable and timely, involve specialized personnel, and integrate into the broader enterprise architecture. While our research provides an in-depth look at adoption of the different types of in-database analytics, deployment of advanced analytic sandboxes, data mining, model management, integration with business processes and overall model deployment, that is beyond the topic here.

Beyond analytic considerations, a host of technological decisionsvr_Big_Data_Analytics_13_advanced_analytics_on_big_data must be made around big data analytics initiatives. One of these is the degree of customization necessary. As technology advances, customization is giving way to more packaged approaches to big data analytics. According to our research, the majority (54%) of companies that have already implemented big data analytics did custom builds using big data-specific languages and interfaces. The most of those that have not yet deployed are likely to purchase a dedicated or packaged application (44%), followed by a custom build (36%). We think that this pre- and post-deployment comparison reflects a maturing market.

The move from custom approaches to standardized ones has important implications for the skills sets needed for a big data vr_Big_Data_Analytics_14_big_data_analytics_skillsanalytics initiative. In comparing the skills that organizations said they currently have to the skills they need to be successful with big data analytics, it is clear that companies should spend more time building employees’ statistical, mathematical and visualization skills. On the flip side, organizations should make sure their tools can support skill sets that they already have, such as use of spreadsheets and SQL. This is convergent with other findings about training needs, which include applying analytics to business problems (54%), training on big data analytics tools (53%), analytic concepts and techniques (46%) and visualizing big data (41%). The data shows that as approaches become more standardized and the market focus shifts toward them from customized implementations, skill needs are shifting as well. This is not to say that demand is moving away from the data scientist completely. According to our research, organizations that involve cross-functional teams or data scientists in the deployment process are realizing the most significant impact. It is clear that multiple approaches for personnel, departments and current vendors play a role in deployments and that some approaches will be more effective than others.

Cloud computing is another key consideration with respect to deploying analytics systems as well as sandbox modelling and testing environments. For deployment of big data analytics, 27 percent of companies currently use a cloud-based method, while 58 percent said they do not and 16 percent do not know what is used. Not surprisingly, far fewer IT professionals (19%) than business users (40%) said they use cloud-based deployments for big data analytics. The flexibility and capability that cloud resources provide is particularly attractive for sandbox environments and for organizations that lack big data analytic expertise. However, for big data model building, most organizations (42%) still utilize a dedicated internal sandbox environment to build models while fewer (19%) use a non-dedicated internal sandbox (that is, a container in a data warehouse used to build models) and others use a cloud-based sandbox either as a completely separate physical environment (9%) or as a hybrid approach (9%). From this last data we infer that business users are sometimes using cloud-based systems to do big data analytics without the knowledge of IT staff. Among organizations that are not using cloud-based systems for big data analytics, security (45%) is the primary reason that they do not.

Perhaps the most important consideration for big data analytics is choosing vendors to partner with to achieve organizational objectives. When we understand the move from custom technological approaches to more packaged ones and the types of analytics currently being implemented for big data, it is not surprising that a majority of research participants (52%) are looking to their business intelligence systems providers to supply their big data analytics solution. However, a significant number of companies (35%) said they will turn to a specialist analytics provider or their database provider (34%). When evaluating big data analytics, usability is the most important vendor consideration but not by as wide a margin as in categories such as business intelligence. A look at criteria rated important and very important by research participants reveals usability is the highest ranked (94%), but functionality (92%) and reliability (90%) follow closely. Among innovative new technologies, collaboration is important (78%) while mobile access (46%) is much less so. Coupled with the finding that communication and knowledge sharing combined is an important benefit of big data analytics, it is clear that organizations are cognizant of the collaborative imperative when choosing a big data analytics product.

Deployment of big data analytics starts with forethought and a well-defined business case that includes the expected benefits I discussed in my previous analysis. Once the outcome-driven framework is established, organizations should consider the types of analytics needed, the enabling technologies and the people and processes necessary for implementation. To learn more about our big data analytics research, download a copy of the executive summary here.

Regards,

Tony Cosentino

VP & Research Director

We recently released our benchmark research on big data analytics, and it sheds light on many of the most important discussions occurring in business technology today. The study’s structure was based on the big data analytics framework that I laid out last year as well as the framework that my colleague Mark Smith put forth on the four types of discovery technology available. These frameworks view big data and analytics as part of a major change that includes a movement from designed data to organic data, the bringing together of analytics and data in a single system, and a corresponding move away from the technology-oriented three Vs of big data to the business-oriented three Ws of data. Our big data analytics research confirms these trends but also reveals some important subtleties and new findings with respect to this important emerging market. I want to share three of the most interesting and even surprising results and their implications for the big data analytics market.

First, we note that communication and knowledge sharing is a primary vr_Big_Data_Analytics_06_benefits_realized_from_big_data_analyticsbenefit of big data analytics initiatives, but it is a latent one. Among organizations planning to deploy big data analytics, the benefits most often anticipated are faster response to opportunities and threats (57%), improving efficiency (57%), improving the customer experience (48%) and gaining competitive advantage (43%). However, once a big data analytics system has moved into production, the benefits most often mentioned as achieved are better communication and knowledge sharing (51%), gaining competitive advantage (51%), improved efficiency in business processes (49%) and improved customer experience and satisfaction (46%). (The chart shows rankings of first choices as most important.) Although the last three of these benefits are predictable, it’s noteworthy that the benefit of communication and knowledge sharing, while not a priority before deployment, becomes one of the two most often cited later.

As for the implications, in our view, one reason why communication and knowledge sharing are more often seen as a key benefit after deployment rather than before is that agreement on big data analytics terminology is often lacking within organizations. Participants from fewer than half (44%) of organizations said that the people making business technology decisions mostly agree or completely agree on the meaning of big data analytics, while the same number said there are many different opinions about its meaning. To address this particular challenge, companies should pay more attention to setting up internal communication structures prior to the launch of a big data analytics project, and we expect collaborative technologies to play a larger role in these initiatives going forward.

vr_Big_Data_Analytics_02_defining_big_data_analyticsA second finding of our research is that integration of distributed data is the most important enabler of big data analytics. Asked the meaning of big data analytics in terms of capabilities, the largest percentage (76%) of participants said it involves analyzing data from all sources rather than just one, while for 55 percent it means analyzing all of the data rather than just a sample of it. (We allowed multiple responses.) More than half (56%) told us they view big data as finding patterns in large and diverse data sets in Hadoop, which indicates the continuing influence of this original big data technology. A second tier of percentages emphasizes timeliness as an aspect of big data: doing real-time processing on streams of data (44%), visualizing large structured data sets in seconds (40%) and doing real-time scoring against a database record (36%).

The implications here are that the primary characteristic of big data analytics technology is the ability to analyze data from many data sources. This shows that companies today are focused on bringing together multiple information sources and secondarily being able to process all data rather than just a sample, as well as being able to do machine learning on especially large data sets. Fast processing and the ability to analyze streams of data are relegated to third position in these priorities. That suggests that the so-called three Vs of big data are confusing the discussion by prioritizing volume, velocity and variety all at once. For companies engaged in big data analytics today, sourcing and integration of various data sources in an expedient manner is the top priority, followed by the ideas of size and then speed of arrival of data.

Third, we found that usage is not relegated to particular industries, vr_Big_Data_Analytics_09_use_cases_for_big_data_analyticscertain types of companies or certain functional areas. From among 25 uses for big data analytics those that participants are personally involved with, three of the four most often mentioned involve customers and sales: enabling cross-selling and up-selling (38%), understanding the customer better (32%) and optimizing pricing (28%). Meanwhile, optimizing IT operations ranked fifth (24%) though it was most often chosen by those in IT roles (76%). What is particularly fascinating, however, is that 17 of the 25 use cases were named by more than 10 percent, which indicates many uses for big data analytics.

The primary implication of this finding is that big data analytics is not following the famous technology adoption curves outlined in books such as Geoffrey Moore’s seminal work, “Crossing the Chasm.” That is, companies are not following a narrowly defined path that solves only one particular problem. Instead, they are creatively deploying technological innovations en route to a diverse set of outcomes. And this is occurring across organizational functions and industries, including conservative ones, which conflicts with conventional wisdom. For this reason, companies are more often looking across industries and functional disciplines as part of their due diligence on big data analytics to come up with unique applications that may yield competitive advantage or organizational efficiencies.

In summary, it has been difficult for companies to define what big data analytics actually means and how to prioritize their investments accordingly. Research such as ours can help organizations address this issue. While the above discussion outlines a few of the interesting findings of this research, it also yields many more insights, related to aspects as diverse as big data in the cloud, sandbox environments, embedded predictive analytics, the most important data sources in use, and the challenges of choosing an architecture and deploying big data analytic products. For a copy of the executive summary download it directly from the Ventana Research community.

Regards,

Ventana Research

Users of big data analytics are finally going public. At the Hadoop Summit last June, many vendors were still speaking of a large retailer or a big bank as users but could not publically disclose their partnerships. Companies experimenting with big data analytics felt that their proof of concept was so innovative that once it moved into production, it would yield a competitive advantage to the early mover. Now many companies are speaking openly about what they have been up to in their business laboratories. I look forward to attending the 2013 Hadoop Summit in San Jose to see how much things have changed in just a single year for Hadoop centered big data analytics.

Our benchmark research into operational intelligence, which I argue is another name for real-time big data analytics, shows diversity in big data analytics use cases by industry. The goals of operational intelligence are an interesting mix as the research shows relative parity among managing performance (59%), detecting fraud and security (59%), complying with regulations (58%) and managing risk (58%), but when we drill down into different industries there are some interesting nuances. For instance, healthcare and banking are driven much more by risk and regulatory compliance, services such as retail are driven more by performance, and manufacturing is driven more by cost reduction. All of these make sense given the nature of the businesses. Let’s look at them in more detail.

vr_oi_goals_of_using_operational_intelligenceThe retail industry, driven by market forces and facing discontinuous change, is adopting big data analytics out of competitive necessity. The discontinuity comes in the form of online shopping and the need for traditional retailers to supplement their brick-and-mortar locations. JCPenney and Macy’s provide a sharp contrast in how two retailers approached this challenge. A few years ago, the two companies eyed a similar competitive space, but since that time, Macy’s has implemented systems based on big data analytics and is now sourcing locally for online transactions and can optimize pricing of its more than 70 million SKUs in just one hour using SAS High Performance Analytics. The Macy’s approach has, in Sun-Tzu like fashion, made the “showroom floor” disadvantage into a customer experience advantage. JCPenney, on the other hand, used gut-feel management decisions based on classic brand merchandising strategies and ended up alienating its customers and generating law suits and a well-publicized apology to its customers. Other companies including Sears are doing similarly innovative work with suppliers such as Teradata and innovative startups like Datameer in data hub architectures build around Hadoop.

Healthcare is another interesting market for big data, but the dynamics that drive it are less about market forces and more about government intervention and compliance issues. Laws around HIPPA, the recent Healthcare Affordability Act, OC-10 and the HITECH Act of 2009 all have implications for how these organizations implement technology and analytics. Our recent benchmark research on governance, risk and compliance indicates that many companies have significant concerns about compliance issues: 53 percent of participants said they are concerned about them, and 42 percent said they are very concerned. Electronic health records (EHRs) are moving them to more patient-centric systems, and one goal of the Affordable Care Act is to use technology to produce better outcomes through what it calls meaningful use standards.  Facing this title wave of change, companies including IBM analyze historical patterns and link it with real-time monitoring, helping hospitals save the lives of at-risk babies. This use case was made into a now-famous commercial by advertising firm Ogilvy about the so-called data babies. IBM has also shown how cognitive question-and-answer systems such as Watson assist doctors in diagnosis and treatment of patients.

Data blending, the ability to mash together different data sources without having to manipulate the underlying data models, is another analytical technique gaining significant traction. Kaiser Permanente is able to use tools from Alteryx, which I have assessed, to consolidate diverse data sources, including unstructured data, to streamline operations to improve customer service. The two organizations made a joint presentation similar to the one here at Alteryx’s user conference in March.

vr_grc_worried_about_grcFinancial services, which my colleague Robert Kugel covers, is being driven by a combination of regulatory forces and competitive market forces on the sales end. Regulations produce a lag in the adoption of certain big data technologies, such as cloud computing, but areas such as fraud and risk management are being revolutionized by the ability, provided through in-memory systems, to look at every transaction rather than only a sampling of transactions through traditional audit processes. Furthermore, the ability to pair advanced analytical algorithms with in-memory real-time rules engines helps detect fraud as it occurs, and thus criminal activity may be stopped at the point of transaction. On a broader scale, new risk management frameworks are becoming the strategic and operational backbone for decision-making in financial services.

On the retail banking side, copious amounts of historical customer data from multiple banking channels combined with government data and social media data are providing banks the opportunity to do microsegmentation and create unprecedented customer intimacy. Big data approaches to micro-targetting and pricing algorithms, which Rob recently discussed in his blog on Nomis, enable banks and retailers alike to target individuals and customize pricing based on an individual’s propensity to act. While partnerships in the financial services arena are still held close to the vest, the universal financial services providers – Bank of America, Citigroup, JPMorgan Chase and Wells Fargo – are making considerable investments into all of the above-mentioned areas of big data analytics.

Industries other than retail, healthcare and banking are also seeing tangible value in big data analytics. Governments are using it to provide proactive monitoring and responses to catastrophic events. Product and design companies are leveraging big data analytics for everything from advertising attribution to crowdsourcing of new product innovation. Manufacturers are preventing downtime by studying interactions within systems and predicting machine failures before they occur. Airlines are recalibrating their flight routing systems in real time to avoid bad weather. From hospitality to telecommunications to entertainment and gaming, companies are publicizing their big data-related success stories.

Our research shows that until now, big data analytics has primarily been the domain of larger, digitally advanced enterprises. However, as use cases make their way through business and their tangible value is accepted, I anticipate that the activity around big data analytics will increase with companies that reside in the small and midsize business market. At this point, just about any company that is not considering how big data analytics may impact its business faces an unknown and uneasy future. What a difference a year makes, indeed.

Regards,

Tony Cosentino

VP and Research Director

Our benchmark research found in business technology innovation that analytics is the most important new technology for improving their organization’s performance; they ranked big data only fifth out of six choices. This and other findings indicate that the best way for big data to contribute value to today’s organizations is to be paired with analytics. Recently, I wrote about what I call the four pillars of big data analytics on which the technology must be built. These areas are the foundation of big data and information optimization, predictive analytics, right-time analytics and the discovery and visualization of analytics. These components gave me a framework for looking at Teradata’s approach to big data analytics during the company’s analyst conference last week in La Jolla, Calif.

The essence of big data is to optimize the information used by the business for whatever type of need as my colleague has identified as a key value of these investmentsVR_2012_TechAward_Winner_LogoData diversity presents a challenge to most enterprise data warehouse architectures. Teradata has been dealing with large, complex sets of data for years, but today’s different data types are forcing new modes of processing in enterprise data warehouses. Teradata is addressing this issue by focusing on a workload-specific architecture that aligns with MapReduce, statistics and SQL. Its Unified Data Architecture (UDA) incorporates the Hortonworks Hadoop distribution, the Aster Data platform and Teradata’s stalwart RDBMS EDW. The Big Data Analytics appliance that encompasses the UDA framework won our annual innovation award in 2012. The system is connected through Infiniband and accesses Hadoop’s metadata layer directly through Hcatalog. Bringing these pieces together represents the type of holistic thinking that is critical for handling big data analytics; at the same time there are some costs as the system includes two MapReduce processing environments. For more on the UDA architecture, read my previous post on Teradata as well as my colleague Mark Smith’s piece.

Predictive analytics is another foundational piece of big data analytics and one of the top priorities in organizations. However, according to our vr_bigdata_big_data_capabilities_not_availablebig data research, it is not available in 41 percent of organizations today. Teradata is addressing it in a number of ways and at the conference Stephen Brobst, Teradata’s CTO, likened big data analytics to a high-school chemistry classroom that has a chemical closet from which you pull out the chemicals needed to perform an experiment in a separate work area. In this analogy, Hadoop and the RDBMS EDW are the chemical closet, and Aster Data provides the sandbox where the experiment is conducted. With mulitple algorithms currently written into the platform and many more promised over the coming months, this sandbox provides a promising big data lab environment. The approach is SQL-centric and as such has its pros and cons. The obvious advantage is that SQL is a declarative language that is easier to learn than procedural languages, and an established skills base exists within most organizations. The disadvantage is that SQL is not the native tongue of many business analysts and statisticians. While it may be easy to call a function within the context of the SQL statement, the same person who can write the statement may not know when and where to call the function. One way for Teradata to expediently address this need is through its existing partnerships with companies like Alteryx, which I wrote about recently. Alteryx provides a user-friendly analytical workflow environment and is establishing a solid presence on the business side of the house. Teradata already works with predictive analytics providers like SAS but should further expand with companies like Revolution Analytics that I assessed that are using R technology to support a new generation of tools.

Teradata is exploiting its advantage with algorithms such as nPath, which shows the path that a customer has taken to a particular outcome such as buying or not buying. According to our big data benchmark research, being able to conduct what-if analysis and predictive analytics are the two most desired capabilities not currently available with big data, as the chart shows. The algorithms that Teradata is building into Aster help address this challenge, but despite customer case studies shown at the conference, Teradata did not clearly demonstrate how this type of algorithm and others seamlessly integrate to address the overall customer experience or other business challenges. While presenters verbalized it in terms of improving churn and fraud models, and we can imagine how the handoffs might occur, the presentations were more technical in nature. As Teradata gains traction with these types of analytical approaches, it will behoove the company to show not just how the algorithm and SQL works but how it works in the use by business and analysts who are not as technically savvy.

Another key principle behind big data analytics is timeliness of the analytics. Given the nature of business intelligence and traditional EDW architectures, until now timeliness of analytics has been associated with how quickly queries run. This has been a strength of the Teradata MPP share-nothing architecture, but other appliance architectures, such as those of Netezza and Greenplum, now challenge Teradata’s hegemony in this area. Furthermore, trends in big data make the situation more complex. In particular, with very large data sets, many analytical environments have replaced the traditional row-level access with column access. Column access is a more natural way for data to be accessed for analytics since it does not have to read through an entire row of data that may not be relevant to the task at hand. At the same time, column-level access has downsides, such as the reduced speed at which you can write to the system; also, as the data set used in the analysis expands to a high number of columns, it can become less efficient than row-level access. Teradata addresses this challenge by providing both row and column access through innovative proprietary access and computation techniques.

Exploratory analytics on large, diverse data sets also has a timeliness imperative. Hadoop promises the ability to conduct iterative analysis on such data sets, which is the reason that companies store big data in the first place according to our big data benchmark research. Iterative analysis is akin to the way the human brain naturally functions, as one question naturally leads to another question. However, methods such as Hive, which allows an SQL-like method to access Hadoop data, can be very slow, sometimes taking hours to return a query. Aster enables much faster access and therefore provides a more dynamic interface for iterative analytics on big data.

Timeliness also has to do with incorporating big data in a stream-oriented environment and only 16 percent of organizations are very satisfied with timeliness of events according to our operational intelligence benchmark research. In a use case such as fraud and security, rule-based systems work with complex algorithmic functions to uncover criminal activity. While Teradata itself does not provide the streaming or complex event processing (CEP) engines, it can provide the big data analytical sandbox and algorithmic firepower necessary to supply the appropriate algorithms for these systems. Teradata partners with major players in this space already, but would be well served to further partner with CEP and other operational intelligence vendors to expand its footprint. By the way, these vendors will be covered in our upcoming Operational Intelligence Value Index, which is based on our operational intelligence benchmark research. This same research showed that analyzing business and IT events together was very important in 45 percent of organizations.

The visualization and discovery of analytics is the last foundational pillarvr_ngbi_br_importance_of_bi_technology_considerations and here Teradata is still a work in progress. While some of the big data visualizations Aster generates show interesting charts, they lack a context to help people interpret the chart. Furthermore, the visualization is not as intuitive and requires the writing and customization of SQL statements. To be fair, most visual and discovery tools today are relationally oriented and Teradata is trying to visualize large and diverse sets of data. Furthermore, Teradata partners with companies including MicroStrategy and Tableau to provide more user-friendly interfaces. As Teradata pursues the big data analytics market, it will be important to demonstrate how it works with its partners to build a more robust and intuitive analytics workflow environment and visualization capability for the line-of-business user. Usability (63%) and functionality (49%) are the top two considerations when evaluating business intelligence systems according to our research on next-generation business intelligence.

Like other large industry technology players, Teradata is adjusting to the changes brought by business technology innovation in just the last few years. Given its highly scalable databases and data modeling – areas that still represent the heart of most company’s information architectures –  Teradata has the potential to pull everything together and leverage their current deployed base. Technologists looking at Teradata’s new and evolving capabilities will need to understand the business use cases and share these with the people in charge of such initiatives. For business users, it is important to realize that big data is more than just visualizing disparate data sets and that greater value lies in setting up an efficient back end process that applies the right architecture and tools to the right business problem.

Regards,

Tony Cosentino
VP and Research Director

Just about all the CIOs I speak with are at an inflection point in their careers. Some are just biding time before retirement, but many are emerging CIOs who are driven more by a business imperative than a technological one. Today, market and cultural pressures are forcing CIOs to move quickly and be flexible. In many ways, this is antithetical to the posture of IT, which can often be described as slow and methodical. This posture however is no longer sustainable in the era of the six forces of business technology innovation that Ventana Research tracks in our BTI benchmark research.

Well-read publications such as the Harvard Business Review, the New York Times and Wired Magazine are espousing the virtues of big data and analytics. CEOs are listening and demanding that their organizations be adaptive and flexible – and most have iPads. This fact should not be underestimated, because everything you do on an iPad is easy. To bring social, local and mobile intelligence together, organizations face the challenge of slow descriptive analytics and what my colleague Mark Smith rightly calls pathetic dashboard environments. Our next-generation business intelligence benchmark research shows organizations rate the importance of business intelligence as very high, but satisfaction levels are low and have a declining trend line. At the same time, usability is growing to be the top buying criteria for business intelligence by an astounding margin over functionality (64% vs. 49%). The driver of these numbers is that expectations are being set at the consumer level by apps such as Google and Yelp!, and business intelligence applications are not living up to these expectations.

vr_ngbi_br_importance_of_bi_technology_considerationsThis is a difficult situation for CIOs because IT has heavy investments in business intelligence tools and in the SQL skills that often underlie support for them. At the same time, they need to think about what the business needs as much as what they currently have in their environment.

In the 1990s, CIOs faced a similar situation with ERP systems and the OLAP tools that were being deployed on the client side of organizations’ technology architectures. In that case, CIOs often needed to think from the perspective of the CFO and form a close partnership with the operational finance team. This was a natural partnership because both areas are number-driven and tools-oriented. In today’s environment, however, the CIO must partner with the CMO and iPad-carrying executives to drive competitive advantage through analytics and big data. The rapid revolution of big data technologies and what I have described as the four pillars of big data analytics are being adopted by business in many cases outside the scope of the CIO. Much to the chagrin of IT, those executives and marketers do not want spreadsheets and “pathetic” dashboards. They want visual discovery, they want search, they want prescriptive analytics, and they want results.  

Finally, CIOs must grapple with the fact that the business must be involved in building out IT since he can no longer have tight centralized control of all technology. Organizations have many different applications sprouting up, from visual discovery tools and business analytics that are also becoming part of the growing use of cloud computing. CIOs cannot even get a baseline on the company’s current technology environment because they often have no idea what’s happening outside of the data center. CIOs need to start learning from business users about the new technologies they’re using and collaborate with them on how they can put the business tools together with what is running in the data center. 

CIOs have been metaphorically barricaded in the data center, and the data center been a cost center, not a profit center, and certainly not an investment center.  CIOs must now reach out to business users to show value. In order to fulfill on the value promise, they will help business users make sure they can deliver on the table stakes: trusted data, secure data and governance around the data. This requires an organizational cadence that can result only from a marriage of the business side with the IT side.

Regards,

Tony Cosentino

VP and Research Director

Last week, IBM brought industry analysts to its famed Almaden Research Center, where the company outlined its big data analytics strategy and introduced a number of new innovations. Big data is no new topic to IBM, which has for decades helped organizations store and use data. But technology has changed over those decades, and IBM is working hard to ensure it is part of the future and not just the past. Our latest business technology innovation research into big data technology finds that retaining and analyzing more data is the first-ranked priority in 29 percent of organizations. From both an IT and a business perspective, big data is critical to IBM’s future success.

On the strategy side, there was much discussion at the event around use cases and the different patterns of deployment for big data analytics. Inhi Cho Suh, vice president of strategy, outlined five compelling use cases for big data analytics:

  1. Discovery and visualization. These types of exploratory analytics in a federated environment are a big part of big data analytics, since they can unlock patterns that can be useful in areas as diverse as determining a root cause of an airline issue or understanding relationships among buyers. IBM is working hard to ensure that products such as IBM Cognos Insight can evolve to support a new generation of visual discovery for big data.
  2. 360-degree view of the customer. By bringing together data sources and applying analytics to increase such things as customer loyalty and share-of-wallet, companies can gain more revenue and market share with fewer resources. IBM needs to ensure it can actually support a broad array of information about customers – not just transactional or social media data but also voice as well as mobile interactions that also use text.
  3. Security and intelligence. This area includes areas around fraud and real-time cyber security, where companies leverage big data to predict anomalies and contain risk. IBM has been enhancing its ability to process real-time streams and transactions across any network. This is an important area for the company as it works to drive competitive advantage.
  4. Operational analysis. This is the ability to leverage networks of instrumented data sources to enable proactive monitoring through baseline analysis and real-time feedback mechanisms. The need for better operational analytics continues to increase. Our latest research on operational intelligence finds that organizations that use dedicated tools to handle this need will be more satisfied and gain better outcomes than those that do not.
  5. Data warehouse augmentation.  Big data stores can replace some traditional data stores and archival systems to allow larger sets of data to be analyzed, providing better information and leading to more precise decision-making capabilities. It should be no surprise that IBM has customers with some of the larger data warehouse deployments. The company can help customers evaluate their technology and improve or replace existing investments.

Prior to Inhi taking the stage, Dave Laverty, vice president of marketing, went through the new technologies being introduced. The first announcement was the BLU Accelerator – dynamic in-memory technology that promises to improve both performance and manageability on DB2 10.5. In tests, IBM says it achieved better than 10,000x performance on queries. The secret sauce lies in the ability to do column store data retrieval, maximize CPU processing, and provide skipping of data that is not needed for the particular analysis at hand. The benefits to the user are much faster performance across very large data sets and a reduction in manual SQL optimization. Our latest research into business technology innovation finds that in-memory technology is the technology most planned for use with big data in the next two years (22%), ahead of RDBMS (10%), data warehouse appliance (19%), specialized database (19%) and  Hadoop (20%).

vr_bigdata_obstacles_to_big_data_analytics (2)An intriguing comment from one of IBM’s customers was “What is bad SQL in a world with BLU?” An important extension of that question might be “What is the future role for database administrators, given new advancements around databases, and how do we leverage that skill set to fill the big data analytics gap?” According to our business technology innovations research, staffing (79%) and training (77%) are the two biggest challenges to implementing big data analytics.

One of IBM’s answers to the question of the skills gap comes in the form of BigSQL. A newly announced feature of InfoSphere BigInsights 2.1, BigSQL layers on top of BigInsights to provide accessibility through industry-standard SQL and SQL-based applications. Providing access to Hadoop has been a sticking point for organizations, since they have traditionally needed to write procedural code to access Hadoop data. BigSQL is similar in function to Greenplum’s Pivotal, Teradata Aster and Cloudera’s Impala, where SQL is used to mine data out of Hadoop. All of these products aim to provide access for SQL-trained users and for SQL-based applications, which represent the predominance of BI tools currently deployed in industry. The challenge for IBM, with a product portfolio that includes BigInsights and Cognos Insight, is to offer a clear message about what products meet what types of analytic needs for what types of business and IT professional needs. In addition further clarity from IBM on when to use big data analytics software partners like Datameer who was on an industry panel at the event and part of IBM global educational tour that I have also analyzed.

Another IBM announcement was the PureData System for Hadoop. This appliance approach to Hadoop provides a turnkey solution that can be up and running in a matter of hours. As you would expect in an appliance approach, it allows for consistent administration, workflow, provisioning and security with BigInsights. It also allows access to Hadoop through BigSheets, which presents summary information about the unstructured data in Hadoop, and which was already part of the BigInsights platform. Phil Francisco, vice president of big data product management and strategy, pointed out use cases around archival capabilities and the ability to do cold storage analysis as well as the ability to bring many unstructured sources together. The PureData System for Hadoop, due out in the second half of the year, adds a third version to the BigInsights lineup, which also includes the free web-based version and the Enterprise version. Expanding to support Hadoop with its appliances is critical as more organizations look to exploit the processing power of Hadoop technology for their database and information management needs.

Other announcements included new versions of InfoSphere Streams and Informix TimeSeries for reporting and analytics using smart meter and sensor technology. They help with real-time analytics and big data depending on the business and architectural needs of an organization. The integration of database and streaming analytics are key areas where IBM differentiates itself in the market.

Late in the day, Les Rechan, general manager for business analytics, told the crowd that he and Bob Picciano, general manager for information management, had recently promised the company $20 billion in revenue. That statement is important because in the age of big data, information management and analytics must be considered together, and the company needs a strong relationship between these two leaders to meet this ambitious objective. In an interview, Rechan told me that the teams realize this and are working hand-in-glove across strategy, product development and marketing. The camaraderie between the gentlemen was clear during the event, and bodes well for the organization. Ultimately, IBM will need to articulate why it should be considered for big data, as our technology innovation research finds organizations today are less worried about validation of a vendor from a size perspective (23%) compared to usability of the technology (64%).

IBM’s big data platform seems to be less a specific offer and more of an ethos of how to think about big data and big data analytics in a common-sense way. The focus on five well-thought-out use cases provides customers a frame for thinking through the benefits of big data analytics and gives them a head start with their business cases. Given the confusion in the market around big data, that common-sense approach serves the market well, and it is very much aligned with our own philosophy of focusing on what we call the business-oriented Ws rather than the technology-oriented Vs.

Big data analytics, and in particular predictive analytics, is complex and difficult to integrate into current architectures. Our benchmark research into predictive analytics shows that architectural integration is the biggest inhibitor with 55 percent of companies, which should be a message IBM takes to heart about integration of its predictive analytics tools with its big data technology options. Predictive analytics is the most important capability (49%) for business analytics, according to our technology innovation research, and IBM needs to  show more solutions that integrate predictive analytics with big data.

H.L. Mencken once said, “For every complex problem there is an answer that is clear, simple and wrong.” Big data analytics is a complex problem, and the market is still early. The latent benefit of IBM’s big data analytics strategy is that it allows IBM to continue to innovate and deliver without playing all of its chips at one time. In today’s environment, many supplier companies don’t have the same luxury.

As I pointed out in my blog post on the four pillars of big data analytics,vr_predanalytics_predictive_analytics_obstacles our research and clients are moving toward addressing big data and analytics in a more holistic and integrated manner. The focus shift is less about how organizations store or process information than how they use it. Some may argue that the IBM’s cadence is reflective of company size and is actually a competitive disadvantage, but I would argue that size and innovation leadership are not mutually exclusive. As companies grapple with the onslaught of big data and analytics, no one should underestimate IBM’s outcomes-based and services-driven approach, but in order to succeed IBM also needs to ensure it can meet the needs of organizations at a price they can afford.

Regards,

Tony Cosentino

VP and Research Director

SAS Institute held its 24th annual analyst summit last week in Steamboat Springs, Colorado. The 37-year-old privately held company is a key player in big data analytics, and company executives showed off their latest developments and product roadmaps. In particular, LASR Analytical Server and Visual Analytics 6.2, which is due to be released this summer, are critical to SAS’ ability to secure and expand its role as a preeminent analytics vendor in the big data era.

For SAS, the competitive advantage in Big Data rests in predictive vr_predanalytics_predictive_analytics_obstaclesanalytics, and according to our benchmark research into predictive analytics, 55 percent of businesses say the challenge of architectural integration is a top obstacle to rolling out predictive analytics in the organization. Integration of analytics is particularly daunting in a big-data-driven world, since analytics processing has traditionally taken place on a platform separate from where the data is stored, but now they must come together. How data is moved into parallelized systems and how analytics are consumed by business users are key questions in the market today that SAS is looking to address with its LASR and Visual Analytics.

Jim Goodnight, the company’s founder and plainspoken CEO, says he saw the industry changing a few years ago. He speaks of a large bank doing a heavy analytical risk computation that took upwards of 18 hours, which meant that the results of the computation were not ready in time for the next trading day. To gain competitive advantage, the time window needed to be reduced, but running the analytics in a serialized fashion was a limiting factor. This led SAS to begin parallelizing the company’s workhorse procedures, some of which were first developed upwards of 30 years ago. Goodnight also discussed the fact that building these parallelizing statistical models is no easy task. One of the biggest hurdles is getting the mathematicians and data scientists that are building these elaborate models to think in terms of the new parallelized architectural paradigm.

Its Visual Analytics software is a key component of the SAS Big Data Analytics strategy. Our latest business technology innovation benchmark research [http://www.ventanaresearch.com/bti/] found that close to half (48%) of organizations present business analytics visually. Visual Analytics, which was introduced early last year, is a cloud-based offering running off of the LASR in-memory analytic engine and the Amazon Web Services infrastructure. This web-based approach allows SAS to iterate quickly without worrying a great deal about revision management while giving IT a simpler server management scenario. Furthermore, the web-based approach provides analysts with a sandbox environment for working with and visualizing in the cloud big data analytics; the analytic assets can then be moved into a production environment. This approach will also eventually allow SAS to combine data integration capabilities with the data analysis capabilities.

With descriptive statistics being the ante in today’s visual discovery world, SAS is focusing Visual Analytics to take advantage of the vr_bigdata_obstacles_to_big_data_analytics (2)company’s predictive analytics history and capabilities. Visual Analytics 6.2 integrates predictive analytics and rapid predictive modeling (RPM) to do, among other things, segmentation, propensity modeling and forecasting. RPM makes it possible for models to be generated via sophisticated software that runs through multiple algorithms to find the best fit based on the data involved. This type of commodity modeling approach will likely gain significant traction as companies look to bring analytics into industrial processes and address the skills gap in advanced analytics. According to our BTI research, the skills gap is the biggest challenge facing big data analytics today, as participants identified staffing (79%) and training (77%) as the top two challenges.

Visual Analytics’ web-based approach is likely a good long-term bet for SAS, as it marries data integration and cloud strategies. These factors, coupled with the company’s installed base and army of loyal users, give SAS a head start in redefining the world of analytics. Its focus on integrating visual analytics for data discovery, integration and commodity modeling approaches also provides compelling time-to-value for big data analytics. In specific areas such as marketing analytics, the ability to bring analytics into the applications themselves and allow data-savvy marketers to conduct a segmentation and propensity analysis in the context of a specific campaign can be a real advantage. Many of SAS’ innovations cannibalize its own markets, but such is the dilemma of any major analytics company today.

The biggest threat to SAS today is the open source movement, which offers big data analytic approaches such as Mahout and R. For instance, the latest release of R includes facilities for building parallelized code. While academics working in R often still build their models in a non-parallelized, non-industrial fashion, the current and future releases of R promise more industrialization. As integration of Hadoop into today’s architectures becomes more common, staffing and skillsets are often a larger obstacle than the software budget. In this environment the large services companies loom larger because of their role in defining the direction of big data analytics. Currently, SAS partners with companies such as Accenture and Deloitte, but in many instances these companies have split loyalties. For this reason, the lack of a large in-house services and education arm may work against SAS.

At the same time, SAS possesses blueprints for major analytic processes across different industries as well as horizontal analytic deployments, and it is working to move these to a parallelized environment. This may prove to be a differentiator in the battle versus R, since it is unclear how quickly the open source R community, which is still primarily academic, will undertake the parallelization of R’s algorithms.

SAS partners closely with database appliance vendors such as Greenplum and Teradata, with which it has had longstanding development relationships. With Teradata, it integrates into the BYNET messaging system, allowing for optimized performance between Teradata’s relational database and the LASR Analytic Server. Hadoop is also supported in the SAS reference architecture. LASR accesses HDFS directly and can run as a thin memory layer on top of the Hadoop deployment. In this type of deployment, Hadoop takes care of everything outside the analytic processing, including memory management, job control and workload management.

These latest developments will be of keen interest to SAS customers. Non-SAS customers who are exploring advanced analytics in a big data environment should consider SAS LASR and its MPP approach. Visual Analytics follows the “freemium” model that is prevalent in the market, and since it is web-based, any instances downloaded today can be automatically upgraded when the new version arrives in the summer. For the price, the tool is certainly worth a test drive for analysts. For anyone looking into such tools and foresee the need for inclusion predictive analytics, it should be of particular interest.

Regards,

Tony Cosentino
VP and Research Director

SiSense gained a lot of traction last week at the Strata conference vr_ngbi_br_importance_of_bi_technology_considerationsin San Jose as it broke records in the so-called 10x10x10 Challenge – analyzing 10 terabytes of data in 10 seconds on a $10,000 commodity machine – and earned the company’s Prism product the Audience Choice Award. The Israel-based company, founded in 2005, has venture capital backing and is currently running at a profit with customers in more than 50 countries and marquee customers such as Target and Merck. Prism, its primary product, provides the entire business analytics stack, from ETL capabilities through data analysis and visualization. From the demonstrations I’ve seen, the toolset appears relatively user-friendly, which is important because customers say usability is the top criterion in 63 percent of organizations according to our next-generation business intelligence.

Prism comprises three main components: ElastiCube Manager, Prism BI Studio and Prism Web. ElastiCube Manager provides a visual environment with which users can ingest data from multiple sources, define relationship between data and do transforms via SQL functions. Prism Studio provides a visual environment that lets customers build visual dashboards that link data and provide users with dynamic charts and interactivity. Finally, Prism Web provides web-based functionality for interacting, sharing dashboards and managing user profiles and accounts.

At the heart of Prism is the ElasticCube technology, which can query large volumes of data quickly. ElasticCube uses a columnar approach, which allows for fast compression. With SiSense, queries are optimized by the CPU itself. That is, the system decides in an ad hoc manner the most efficient way to use both disk and memory. Most other approaches on the market lean either to a pure-play in-memory system or toward a columnar approach.

The company’s approach to big data analytics reveals the chasm that exists in big data analytics understanding between descriptive analytics and more advanced analytics such as we see with R, SAS and SPSS. When SiSense speaks of big data analytics, it is speaking of the ability to consume and explore very large data sets without predefining schemas. By doing away with schemas, the software does away with the  need for a statistician, a data mining engineer or an IT person for that matter. Instead, organizations need analysts with a good understanding of the business, the data sources with which they are working and the basic characteristics of those data sets. SiSense does not do sophisticated predictive modeling or data mining, but rather root-cause and contextual analysis across diverse and potentially very large data sets.

SiSense today has a relatively small footprint, and is facing an uphill battle against entrenched BI and analytics players for enterprise deployments but it’s easy to download and try approach will help it get traction with analysts who are less loyal to the BI that IT departments have purchased. SiSense Vice President of Marketing Bruno Aziza, formerly with Microsoft’s Business Intelligence group, and CEO Amit Bendov have been in the industry for a fair amount of time and understand this challenge. Their platform’s road into the organization is more likely through business groups rather than IT. For this reason, SiSense’s competitors are Tableau and QlikView on the discovery side and products likes SAP HANA and Actuate’s BIRT Analytics in-memory plus columnar approaches are likely the closest competitors in terms of the technological approach to accessing and visualizing large data sets. This ability to access large data sets in a timely manner without the need for data scientists can help overcome the top challenges BI users have in the areas of staffing and real-time access, which we uncovered in our recent business technology innovation research.

vr_bigdata_obstacles_to_big_data_analyticsSiSense has impressive technology, and is getting some good traction. It bears consideration by departmental and mid-market organizations that need to perform analytics across growing volumes of data without the need for an IT department to support their needs.

Regards,

Tony Cosentino
VP and Research Director

The challenge with discussing big data analytics is in cutting through the ambiguity that surrounds the term. People often focus on the 3 Vs of big data – volume, variety and velocity – which provides a good lens for big data technology, but only gets us part of the way to understanding big data analytics, and provides even less guidance on how to take advantage of big data analytics to unlock business value.

Part of the challenge of defining big data analytics is a lack of clarity vr_bigdata_big_data_capabilities_not_availablearound the big data analytics value chain – from data sources, to analytic scalability, to analytic processes and access methods. Our recent research on big data find many capabilities still not available including predictive analytics (41%) to visualization (37%). Moreover, organizations are unclear on how best to initiate changes in the way they approach data analysis to take advantage of big data and what processes and technologies they ought to be using. The growth in use of appliances, Hadoop and in-memory databases and the growing footprints of RDBMSes all add up to pressure to have more intelligent analytics, but the most direct and cost-effective path from here to there is unclear. What is certain is that as business analytics and big data increasingly merge, the potential for increased value is building expectations.

To understand the organizational chasm that exists with respect to big data analytics, it’s important to understand two foundational analytic approaches that are used in organizations today. Former Census Bureau Director Robert Grove’s ideas around designed data and organic data give us a great jumping off point for this discussion, especially as it relates to big data analytics.

In Grove’s estimation, the 20th century was about designed data, or what might be considered hypothesis-driven data. With designed data we engage in analytics by establishing a hypothesis and collecting data to prove or disprove it. Designed data is at the heart of confirmatory analytics, where we go out and collect data that are relevant to the assumptions we have already made. Designed data is often considered the domain of the statistician, but it is also at the heart of structured databases, since we assume that all of our data can fit into columns and rows and be modeled in a relational manner.

In contrast to the designed data approach of the 20th century, the 21st century is about organic data. Organic data is data that is not limited by a specific frame of reference that we apply to it, and because of this it grows without limits and without any structure other than that structure provided by randomness and probability. Organic data represents all data in the world, but for pragmatic reasons we may think of it as all the data we are able to instrument. RFID, GPS data, sensor data, sentiment data and various types of machine data are all organic data sources that may be characterized by context or by attributes such data sparsity (also known as low-density data). Much like the interpretation of silence in a conversation, analyzing big data is as much about interpreting that which exists between the lines as it is about what we can put on the line itself.

vr_predanalytics_adequacy_of_predictive_analytics_supportThese two types of data and the analytics associated with them reveal the chasm that exists within organizations and shed light on the skills gap that our predictive analytics benchmark research shows to be the primary challenge for analytics in organizations today. This research finds inadequate support in many areas including product training (26%) and how to apply to business problems (23%).

On one side of the chasm are the business groups and the analysts who are aligned with Grove’s idea of designed data. These groups may encompass domain experts in areas such as finance or marketing, advanced Excel users, and even Ph.D.-level statisticians. These analysts serve organizational decision-makers and are tied closely to actionable insights that lead to specific business outcomes. The primary way they get work done is through a flat file environment, as was outlined in some detail last week by my colleague Mark Smith. In this environment, Excel is often the lowest common denominator.

On the other side of the chasm exist the IT and database professionals, where a different analytical culture and mindset exist. The priority challenge for this group is dealing with the three Vs and simply organizing data into a legitimate enterprise data set. This group is often more comfortable with large data sets and machine learning approaches that are the hallmark of the organic data of 21st century. Their analytical environment is different from that of their business counterparts; rather than Excel, it is SQL that is often the lowest common denominator.

As I wrote in a recent blog post, database professionals and business analytics practitioners have long lived in parallel universes. In technology, practitioners deal with tables, joins and the ETL process. In business analysis, practitioners deal with datasets, merges and data preparation. When you think about it, these are the same things. The subtle difference is that database professionals have had a data mining mindset, or, as Grove calls it, an organic data mindset, while the business analyst has had a designed data or statistic-driven mindset.  The bigger differences revolve around the cultural mindset, and the tools that are used to carry out the analytical objectives. These differences represent the current conundrum for organizations.

In a world of big data analytics, these two sides of the chasm are being pushed together in a shotgun wedding because the marriage of these groups is how competitive advantage is achieved. Both groups have critical contributions to make, but need to figure out how to work together before they can truly realize the benefits of big data analytics. The firms that understand that the merging of these different analytical cultures is the primary challenge facing the analytics organization, and that develop approaches that deal with this challenge, will take the lead in big data analytics. We already see this as a primary focus area for leading professional services organizations.

In my next analyst perspective on big data I will lay out some pragmatic approaches companies are using to address this big data analytics chasm; these also represent the focus of the benchmark research we’re currently designing to understand organizational best practices in big data analytics.

Regards,

Tony Cosentino

VP and Research Director

Revolution Analytics is a commercial provider of software and services related to enterprise implementations of the open source language R. At its base level, R is a programming language built by statisticians for statistical analysis, data mining and predictive analytics. In a broader sense, it is data analysis software used by data scientists to access data, develop and perform statistical modeling and visualize data. The R community has a growing user base of more than two million worldwide, and more than 4,000 available applications cover specific problem domains across industries. Both the R Project and Revolution Analytics have significant momentum in the enterprise and in academia.

Revolution Analytics provides value by taking the most recent release from the R community and adding scalability and other functionality so that R can be implemented and seamlessly work in a commercial environment. Revolution R provides a development environment so that data scientists can write and debug R code more effectively, and web service APIs that integrate with other BI tools and dashboards so that R can work with business intelligence tools and visual discovery tools. In addition, Revolution Analytics makes money through professional and support services.

Companies are collecting enormous amounts of data, but few have activevr_bigdata_the_volume_of_big_data big data analytics strategies. Our big data benchmark research shows that more than 50 percent of companies in our sample maintain more than 10TB of data, but often they cannot analyze the data due to scale issues. Furthermore, our research into predictive analytics says that integrating into the current architecture is the biggest obstacle facing the implementation of predictive analytics.

Revolution Analytics helps address these challenges in a few ways. It can perform file-based analytics, where a single node orchestrates commands across a cluster of commodity servers and delivers the results back to the end user. This is an on-premise solution that runs on Linux clusters or Microsoft HPC clusters. A perhaps more exciting use case is alignment with the Hadoop MapReduce paradigm, where Revolution Analytics allows for direct manipulation of the HDFS file system, can submit a job directly to the Hadoop jobtracker, and can directly define and manipulate analytical data frames through Hbase database tables. When front-ended with a visualization tool such as Tableau, this ability to work with data directly in Hadoop becomes a powerful tool for big data analytics. A third use case has to do with the parallelization of computations within the database itself. This in-database approach is gaining a lot of traction for big data analysis primarily because it is the most efficient way to do analytics on very large structured datasets without moving a lot of data. For example, IBM’s PureData System for Analytics (IBM’s new name for its MPP Netezza appliance) uses the in-database approach with an R instance running on each processing unit in the database, each of which is connected to an R server via ODBC. The analytics are invoked as the data is served up to the processor such that the algorithms run in parallel across all of the data.

In the big data analytics context, speed and scale are critical drivers of success, and Revolution R delivers on both. It is built with the Intel Math Kernel Library, so that processing is streamlined for multithreading at the processor level and it can leverage multiple cores simultaneously. In test cases on a single node, R was only able to scale to observations of about 400,000 in a linear regression model, while Enterprise R was able to go into the millions. With respect to speed, Revolution R 6.1 was able to conduct a principal component analysis in about 11 seconds versus 142 seconds with version R-2 14.2. As similar tests are performed across multiple processors in a parallelized fashion, the observed performance difference increases.

vr_predanalytics_predictive_analytics_obstaclesSo why does all of this matter? Our benchmark research into predictive analytics shows that companies that are able to score and update their models more efficiently show higher maturity and gain greater competitive advantage. From an analytical perspective, we can now squeeze more value out of our large data sets. We can analyze all of the data and take a census approach instead of a sampling approach which in turn allows us to better understand the errors that exist in our models and identify outliers and patterns that are not linear in nature. Along with prediction, the ability to identify outliers is probably the most important capability since seeing the data anomalies often leads to the biggest insights and competitive advantage. Most importantly, from a business perspective, we can apply models to understand things such as individual customer behavior, a company’s risk profile or how to take advantage of market uncertainty through tactical advantage.

I’ve heard that Revolution R isn’t always the easiest software to use and that the experience isn’t exactly seamless, but it can be argued that in the cutting-edge field of big data analytics a challenging environment is to be expected. If Revolution Analytics can address some of these useability challenges, it may find its pie growing even faster than it is now. Regardless, I anticipate that Revolution Analytics will continue its fast growth (already its headcount is doubling year-over-year). Furthermore, I anticipate that in-database analytics (an area where R really shines) will become the de-facto approach to big data analytics and that companies that take full advantage of that trend will reap benefits.

Regards,

Tony Cosentino

VP and Research Director

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Stats

  • 73,106 hits
%d bloggers like this: