You are currently browsing the tag archive for the ‘information discovery’ tag.

Like every large technology corporation today, IBM faces an innovator’s dilemma in at least some of its business. That phrase comes from Clayton Christensen’s seminal work, The Innovator’s Dilemma, originally published in 1997, which documents the dynamics of disruptive markets and their impacts on organizations. Christensen makes the key point that an innovative company can succeed or fail depending on what it does with the cash generated by continuing operations. In the case of IBM, it puts around US$6 billion a year into research and development; in recent years much of this investment has gone into research on big data and analytics, two of the hottest areas in 21st century business technology. At the company’s recent Information On Demand (IOD) conference in Las Vegas, presenters showed off much of this innovative portfolio.

At the top of the list is Project Neo, which will go into beta release early in 2014. Its purpose to fill the skills gap related to big data analytics, which our benchmark research into big data shows is held back most by lack of knowledgeable staff (79%) and lack of training (77%). The skills situation can be characterized as a three-legged stool of domain knowledge (that is, line-of-business knowledge), statistical knowledge and technological knowledge. With Project Neo, IBM aims to reduce the technological and statistical demands on the domain expert and empower that person to use big data analytics in service of a particular outcome, such as reducing customer churn or presenting the next best offer. In particular, Neo focuses on multiple areas of discovery, which my colleague Mark Smith outlined. Most of the industry discussion about simplifying analytics has revolved around visualization rather than data discovery, which applies analytics that go beyond visualization, or information discovery, which addresses how we find and access information in a highly distributed environment. These areas are the next logical steps after visualization for software vendors to address, and IBM takes them seriously with Neo.

At the heart of Neo are the same capabilities found in IBM’s SPSSUntitled 1 Analytic Catalyst, which won the 2013 Ventana Research Innovation Award for analytics and which I wrote about. It also includes IBM’s BLU acceleration against the DB2 database, an in-memory optimization technique, which I have discussed as well, that provides access to the analysis of large data sets. The company’s Vivisimo acquisition, which is now called InfoSphere Data Explorer, adds information discovery capabilities. Finally, the Rapid Adaptive Visualization Engine (RAVE), which is IBM’s visualization approach across its portfolio, is layered on top for fast, extensible visualizations. Neo itself is a work in progress currently offered only over the cloud and back-ended by the DB2 database. However, following the acquisition earlier this year of SoftLayer, which provides a cloud infrastructure platform. I would expect to also have IBM make Neo to allow it to access more sources than just loaded data into IBM DB2.

IBM also recently started shipping SPSS Modeler 16.0. IBM bought SPSS in 2009 and has invested in Modeler heavily. Modeler Untitled 2(formerly SPSS Clementine) is an analytic workflow tool akin to others in the market such as SAS Enterprise Miner, Alteryx and more recent entries such as SAP Lumira. SPSS Modeler enables analysts at multiple levels to interact on analytics and do both data exploration and predictive analytics. Analysts can move data from multiple sources and integrate it into one analytic workflow. These are critical capabilities as our predictive analytics benchmark research shows: The biggest challenges to predictive analytics are architectural integration (for 55% of organizations) and lack of access to necessary source data (35%).

IBM has made SPSS the centerpiece of its analytic portfolio and offers it at three levels, Professional, Premium and Gold. With the top-level Gold edition, Modeler 16.0 includes capabilities that are ahead of the market: run-time integration with InfoSphere Streams (IBM’s complex event processing product), IBM’s Analytics Decision Management (ADM) and the information optimization capabilities of G2, a skunks-works project by led by Jeff Jonas, chief scientist of IBM’s Entity Analytics Group.

Integration with InfoSphere Streams that won a Ventana Research Technology Innovation award in 2013 enables event processing to occur in an analytic workflow within Modeler. This is a particularly compelling capability as the so-called “Internet of things” begins to evolve and the ability to correlate multiple events in real time becomes crucial. In such real-time environments, often quantified in milliseconds, events cannot be pushed back into a database and wait to be analyzed.

Decision management is another part of SPSS Modeler. Once models are built, users need to deploy them, which often entails steps such as integrating with rules and optimizing parameters. In a next best offer situation in a retail banking environment, for instance, a potential customer may score highly on propensity want to take out a mortgage and buy a house, but other information shows that the person would not qualify for the loan. In this case, the model itself would suggest telling the customer about mortgage offers, but the rules engine would override it and find another offer to discuss. In addition, there are times when optimization exercises are needed such as Monte Carlo simulations to help to figure out parameters such as risk using “what-if” modelling. In many situations, to gain competitive advantage, all of these capabilities must be rolled into a production environment where individual records are scored in real time against the organization’s database and integrated with the front-end system such as a call center application. The net capability that IBM’s ADM  brings is the ability to deploy analytical models into the business without consuming significant resources.

G2 is a part of Modeler and developed in IBM’s Entity Analytics Group. The group is garnering a lot of attention both internally and externally for its work around “entity analytics” – the idea that each information entity has characteristics that are revealed only in contextual information – charting innovative methods in the areas of data integration and privacy. In the context of Modeler this has important implications for bringing together disparate data sources that naturally link together but otherwise would be treated separately. A core example is that an individual may have multiple email addresses in different databases, has changed addresses or changed names perhaps due to a new marital status. Through machine-learning processes and analysis of the surrounding data, G2 can match records and attach them with some certainty to one individual. The system also strips out personally identifiable information (PII) to meet privacy and compliance standards. Such capabilities are critical for business as our latest benchmark research on information optimization shows that two in five organizations have more than 10 different data sources that they need to integrate and that the ability to simplify access to these systems is important to virtually all organizations (97%).

With the above capabilities, SPSS Modeler Gold edition achieves  market differentiation, but IBM still needs to show the advantage of base editions such as Modeler Professional. The marketing issue for SPSS Modeler is that it is considered a luxury car in a market being infiltrated by compacts and kit cars. In the latter case there is the R programming language, which is open-source and ostensibly free, but the challenge here is that companies need R programmers to run everything. SPSS Modeler and other such visually oriented tools (many of which integrate with open source R) allow easier collaboration on analytics, and ultimately the path to value is shorter. Even at its base level Modeler is an easy-to-use and capable statistical analysis tool that allows for collaborative workgroups and is more mature than many others in the market.

Companies must consider predictive analytics capabilities or Untitledrisk being left behind. Our research into predictive analytics shows that two-thirds of companies see predictive analytics as providing competitive advantage (68%) and particularly important in revenue-generating functions such as marketing (for 70%) and forecasting (72%). Companies currently looking into discovery analytics may want to try Neo, which will be available in beta in early 2014. Those interested in predictive analytics should consider the different levels of SPSS 16.0 as well as IBM’s flagship Signature Solutions, which I have covered. IBM has documented use cases that can give users guidance in terms of leading-edge deployment patterns and leveraging analytics for competitive advantage. If you have not taken a look at the depth of the analytic technology portfolio at IBM, I would make sure to do so, as you might miss some fundamental advancements to the processing of data and analytics to provide the valuable insights required to operate effectively in the global marketplace.

Regards,

Tony Cosentino

VP and Research Director

A few months ago, I wrote an article on the four pillars of big data analytics. One of those pillars is what is called discovery analytics or where visual analytics and data discovery combine together to meet the business and analyst needs. My colleague Mark Smith subsequently clarified the four types of discovery analytics: visual discovery, data discovery, information discovery and event discovery. Now I want to follow up with a discussion of three trends that our research has uncovered in this space. (To reference how I’m using these four discovery terms, please refer to Mark’s post.)

The most prominent of these trends is that conversations about visual discovery are beginning to include data discovery, and vendors are developing and delivering such tool sets today. It is well-known that while big data profiling and the ability to visualize data give us a broader capacity for understanding, there are limitations that can be vr_predanalytics_predictive_analytics_obstaclesaddressed only through data mining and techniques such as clustering and anomaly detection. Such approaches are needed to overcome statistical interpretation challenges such as Simpson’s paradox. In this context, we see a number of tools with different architectural approaches tackling this obstacle. For example, Information Builders, Datameer, BIRT Analytics and IBM’s new SPSS Analytic Catalyst tool all incorporate user-driven data mining directly with visual analysis. That is, they combine data mining technology with visual discovery for enhanced capability and more usability. Our research on predictive analytics shows that integrating predictive analytics into the existing architecture is the most pressing challenge (for 55% or organizations). Integrating data mining directly into the visual discovery process is one way to overcome this challenge.

The second trend is renewed focus on information discovery (i.e., search), especially among large enterprises with widely distributed systems as well as the big data vendors serving this market. IBM acquired Vivisimo and has incorporated the technology into its PureSystems and big data platform. Microsoft recently previewed its big data information discovery tool, Data Explorer. Oracle acquired Endeca and has made it a key component of its big data strategy. SAP added search to its latest Lumira platform. LucidWorks, an independent information discovery vendor that provides enterprise support for open source Lucene/Solr, adds search as an API and has received significant adoption. There are different levels of search, from documents to social media data to machine data,  but I won’t drill into these here. Regardless of the type of search, in today’s era of distributed computing, in which there’s a need to explore a variety of data sources, information discovery is increasingly important.

The third trend in discovery analytics is a move to more embeddable system architectures. In parallel with the move to the cloud, architectures are becoming more service-oriented, and the interfaces are hardened in such a way that they can integrate more readily with other systems. For example, the visual discovery market was born on the client desktop with Qlik and Tableau, quickly moved to server-based apps and is now moving to the cloud. Embeddable tools such as D3, which is essentially a visualization-as-a-service offering, allow vendors such as Datameer to include an open source library of visualizations in their products. Lucene/Solr represents a similar embedded technology in the information discovery space. The broad trend we’re seeing is with RESTful-based architectures that promote a looser coupling of applications and therefore require less custom integration. This move runs in parallel with the decline in Internet Explorer, the rise of new browsers and the ability to render content using JavaScript Object Notation (JSON). This trend suggests a future for discovery analysis embedded in application tools (including, but not limited to, business intelligence). The environment is still fragmented and in its early stage. Instead of one cloud, we have a lot of little clouds. For the vendor community, which is building more platform-oriented applications that can work in an embeddable manner, a tough question is whether to go after the on-premises market or the cloud market. I think that each will have to make its own decision on how to support customer needs and their own business model constraints.

Regards,

Tony Cosentino

VP and Research Director

RSS Tony Cosentino’s Analyst Perspectives at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

Tony Cosentino – Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Stats

  • 73,582 hits
%d bloggers like this: