You are currently browsing the tag archive for the ‘data preparation’ tag.

Our benchmark research into business technology innovation shows that analytics ranks first or second as a business technology innovation priority in 59 percent of organizations. Businesses are moving budgets and responsibilities for analytics closer to the sales operations, often in the form of so-calledvr_Big_Data_Analytics_15_new_technologies_enhance_analytics shadow IT organizations that report into decentralized and autonomous business units rather than a central IT organization. New technologies such as in-memory systems (50%), Hadoop (42%) and data warehouse appliances (33%) are top back-end technologies being used to acquire a new generation of analytic capabilities. They are enabling new possibilities including self-service analytics, mobile access, more collaborative interaction and real-time analytics. In 2014, Ventana Research helped lead the discussion around topics such as information optimization, data preparation, big data analytics and mobile business intelligence. In 2015, we will continue to cover these topics while adding new areas of innovation as they emerge.

Three key topics lead our 2015 business analytics research agenda. The first focuses on cloud-based analytics. In our benchmark research on information optimization, nearly all (97%) organizations said it is important or very important to Ventana_Research_Benchmark_Research_Logosimplify informa­tion access for both their business and their customers. Part of the challenge in optimizing an organization’s use of information is to integrate and analyze data that originates in the cloud or has been moved there. This issue has important implications for information presentation, where analytics are executed and whether business intelligence will continue to move to the cloud in more than a piecemeal fashion. We are currently exploring these topics in our new benchmark research called analytics and data in the cloud Coupled with the issue of cloud use is the proliferation of embedded analytics and the imperative for organizations to provide scalable analytics within the workflow of applications. A key question we’ll try to answer this year is whether companies that have focused primarily on operational cloud applications at the expense of developing their analytics portfolio or those that have focused more on analytics will gain a competitive advantage.

The second research agenda item is advanced analytics. It may be useful to divide this category into machine learning and predictive analytics, which I have discussed and covered in vr_predanalytics_benefits_of_predictive_analytics_updatedour benchmark research on big data analytics. Predictive analytics has long been available in some sectors of the business world, and two-thirds (68%) of organizations as found in our research that use it said it provides a competitive advantage. Programming languages such as R, the use of Predictive Model Markup Language (PMML), inclusion of social media data in prediction, massive scale simulation, and right-time integration of scoring at the point of decision-making are all important advances in this area. Machine learning also been around for a long time, but it wasn’t until the instrumentation of big data sources and advances in technology that it made sense to use in more than academic environments. At the same time as the technology landscape is evolving, it is getting more fragmented and complex; in order to simplify it, software designers will need innovative uses of machine learning to mask the underlying complexity through layers of abstraction. A technology such as Spark out of Amp-Lab at Berkeley is still immature, but it promises to enable increasing uses of machine learning on big data. Areas such as sourcing data and preparing data for analysis must be simplified so analysts are not overwhelmed by big data.

Our third area of focus is the user experience in business intelligence tools. Simplification and optimization of information in a context-sensitive manner are paramount. An intuitive user experience can advance the people and process dimensions VR_Value_Index_Logoof business, which have lagged technology innovation according to our research in multiple areas. New approaches coming from business end-users, especially in the tech-savvy millennial generation, are pushing the envelope here. In particular, mobility and collaboration are enabling new user experiences in both business organizations and society at large. Adding to it is data collected in more forms, such as location analytics (which we have done research on), individual and societal relationships, information and popular brands. How business intelligence tools incorporate such information and make it easy to prepare, design and consume for different organizational personas is not just an agenda focus but also one focus of our 2015 Analytics and Business Intelligence Value Index to be published in the first quarter of the year.

This shapes up as an exciting year. I welcome any feedback you have on this research agenda and look forward to providing research, collaborating and educating with you in 2015.

Regards,

Ventana Research

Our recently released benchmark research on information optimization shows that 97 percent of organizations find it important or very important to make information available to the business and customers, Ventana_Research_Benchmark_Research_Logoyet only 25 percent are satisfied with the technology they use to provide that access. This wide gap between importance and satisfaction reflects the complexity of preparing and presenting information in a world where users need to access many forms of data that exist across distributed systems.

Information optimization is a new focus in the enterprise software market. It builds on existing investments in business applications, business intelligence and information management and also benefits from recent advances in business analytics and big data, lifting information to higher levels of use and greater value in organizations. Information optimization also builds on information management and information applications, areas Ventana Research has previously researched. For more on the background and definition of information optimization, please see my colleague Mark Smith’s foundational analysis.

vr_Info_Optimization_01_whos_responsible_for_information_availabilityThe drive to improve information availability derives from a need for greater operational efficiency, according to two-thirds (67%) of organizations. The imperative is so strong that 43 percent of all organizations currently are making changes to how they design and deploy information, while another 37 percent plan to make changes in the next 12 months. The pressure for such change is being directed toward the IT group, which is involved with the task of optimizing information in more than four-fifths of organizations with or without line of business support. IT, however, is in an untenable position, as demands are far outstripping its available resources and technology to deal with the problem, which leads to dissatisfaction with the IT department in two out of five organizations, according to our research. Internally, many organizations try to optimize information using manual spreadsheet processes and are confident in their ability to get by 73% of the time. But when the focus turns to the ability to make information available to partners or customers, an increasingly important capability in today’s information-driven economy, the confidence rate drops dramatically to 62% and 55% respectively.

A large part of the information optimization challenge is users’ vr_Info_Optimization_09_most_important_end_user_capabilitiesdifferent requirements. For instance, the top needs of analysts are extracting information, designing and integrating metrics, and developing access policies. In contrast, the top needs of business users are drilling into information (37%), search capabilities (36%) and collaboration (27%). IT must also consider multiple points of integration such as security frameworks and information modeling, as well as integration with operational and content management systems. This is complicated further by multiple new standards coming into play as customer and financial data – still the most important information systems in the organization – append less structured sources of data that add context and value. SQL is still the dominant standard when it comes to information platforms, but less structured approaches such as XML and JSON are emerging fast. Furthermore, innovations in the collaborative and mobile workforce are driving standards such as HTML5 and must be considered carefully when optimizing information. Platform considerations are also affected by the increasing use of analytic databases, in-memory approaches and Hadoop. Traditional approaches like an RDBMS on standard hardware and flat files are still the most common, but the most growth is with in-memory systems and Hadoop. This is interesting because these technologies allow for multiple new approaches to analysis such as visual discovery and machine learning on large data sets.  Adding to the impetus for change is that organizations using an RDBMS on standard hardware and flat files are less satisfied than those using the more innovative approaches to big data.

Information optimization also encounters challenges associated with data preparation and data presentation. In our research, 47 percent of organizations said that  they spend the largest portion of their time in data preparation, but less than half said they are satisfied with their process of creating information. Contributing to this dissatisfaction are lack of resources, lack of flexibility and speed of integration. Lack of resources and speed of integration tend to move together. That is, when more financial and human resources are dedicated to the integration efforts, satisfaction is higher. Adding more human and financial resources does not necessarily increase flexibility. That is a function of both tools and processes, and we see it as a result of divergent data preparation workflows occurring in organizations. One is a more structured approach that follows more traditional ETL paths that can lead to timely integration of data once everything is defined and the system is in place, but is less flexible. Another data preparation approach is to merge internal and external information on the fly in a sandbox environment or in response to sudden market challenges. These different information flows ultimately have to support specific forms of information presentation for users, whether that be the creation of an analytic data set for a complex statistical procedure by a data scientist within the organization or a single number with qualitative context for an executive on a mobile device.

Thus it is clear that information optimization is a critical focus for organizations; it’s also an important area of study for Ventana Research in 2014. Our latest benchmark research shows that the challenges are complex and involve the entire organization. As new technologies come to market and information processes must be aligned with the needs of the lines of business and the functional roles within organizations, companies that are able to simplify access to information and analytics through the information optimization approaches discussed above will provide an edge on competitors.

Regards,

Tony Cosentino

VP & Research Director

Paxata, a new data and analytics software provider says it wants to address one of the most pressing challenges facing today’s analyst performing analytics: simplifying data preparation. This trend toward simplification is well aligned with the market’s desire for improving usability, which our benchmark research into Next-Generation Business Intelligence shows is a primary buying consideration in two-thirds (64%) of companies. This trend is driving significant adoption of business-friendly-front-end visual and data discovery tools and is part of my research agenda for 2014.

On the back end, however, there is still considerable complexity. VR_Benchmark_Research_logoNon-traditional relational database systems such as Hadoop and big data appliances address the need to store and to some degree query massive amounts of structured and unstructured data. But the ability to efficiently and effectively blend these data sources and any third-party cloud-based data is still a challenge.

To address this challenge, the front end analytics tools that are being adopted by analysts and the multitude of back-end database systems must be integrated to deliver high quality analytic data sets. Today, this is no easy task. My latest benchmark research into Information Optimization recently released finds that when companies create and deploy information, the largest portions of time are spent on preparing data for analysis (49%) and reviewing data for quality and consistency issues (47%). In fact, our research shows that analysts consistently spend anywhere from 40 percent to 60 percent of their time in the data preparation phase that precedes actual analysis of the data.

Paxata and its Adaptive Data Preparation platform aims to solve the challenge of data preparation by improving the data vr_ss21_spreadsheets_arent_easily_replacedaggregation, enrichment, quality and governance processes. It does this using a spreadsheet paradigm, a choice of approach that should resonate well with business analysts; our research into spreadsheet use in today’s enterprises finds that the majority of them (56%) are resistant to a move away from spreadsheets.

In Paxata’s design, once the data is loaded the software displays the combined dataset in a spreadsheet format and the user then manipulates the rows and columns to accomplish the various data preparation tasks. For instance, to profile the data, the analyst can first use a search box and an autocomplete query to find the data of interest and then use color-coded cells and visualization techniques to highlight patterns in the data. For data that may include multiple duplicate records such as addresses, the company includes services that help to sort through these records and make suggestions on what records to combine. This last task may be of particular interest for marketers attempting to combine multiple third-party data sources that list several addresses and names for the same individual.

Another key aspect of Paxata’s software is a history function that allows users to return to any step in the data preparation process and make changes on the fly. This ability to explore the lineage of the data enables another interesting function: “Paxata Share.” This collaborative capability enables multiple users to collaboratively evaluate the differences between data sets by looking at different assumptions that went into the processing of the data. This function is particularly interesting as it has the potential to solve the challenge of “battling boardroom facts” – the situation in which people come to a meeting with different versions of the truth based on the same data sources but different data preparation assumptions.

Under the covers, Paxata’s offering boasts a cloud-based multi-tenant architecture hosted on Rackspace and leveraging the OpenStack platform. The company says its product can comfortably handle big data, processing millions of rows (or about a terabyte) of data in real time. If data sets are larger than this, a batch process can replace the real-time analysis.

In my view, the main value of Paxata’s technology lies in the data analyst time it potentially can save. Much of the functionality it offers involves data discovery driven by the kinds of machine learning algorithms that my colleague Mark Smith discussed Four types of Discovery Technology. For instance, the Paxata software will recommend data and metric definitions based on the business context in which the analyst is working – a customer versus a supply chain context, for example – and these recommendations will sharpen as more data runs through the system.

Paxata is off to a great start, though the data connectors its product offers currently are limited; this will improve as it builds out connectors for more data sources. The company will also need to sort through a very noisy marketplace of companies that provide similar services, on-premises or in the cloud, and that all are adapting their messages to address the data preparation challenge. On its website, Paxata lists Cloudera, Qlik Technologies and Tableau as technology partners. The company also lists dozens of information enrichment partners including government organizations and data companies such as Acxiom, DataSift, and Esri. The list of information partners is extensive, which reflects a thoughtful focus on the value of third-party data sources.

Utilizing efficient cloud computing technology, Paxata is able to come out of the gate with aggressive pricing listed on the company site that is about $300 per month which is pretty small amount for the time that is saved on daily, weekly and monthly basis. Such pricing should help adoption especially with business analysts that the company targets. Organizations that are struggling with the time they put into the data preparation phase of analytics and those that are looking to leverage outside data sources in new and innovative ways should look into Paxata.

Regards,

Tony Cosentino

VP and Research Director

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Stats

  • 73,049 hits
%d bloggers like this: