Archive for the ‘Enterprise Search’ Category

Pawel Wroblewski

Search stuffed up with GIS

February 3 - 2012 | Pawel Wroblewski

When I browsed through marketing brochures of GIS (Geographic Information System) vendors I noticed that the message is quite similar to search analytics. It refers in general to integration of various separate sources into analysis based on geo-visualizations. I have recently seen quite nice and powerful combination of search and GIS technologies and so I would like to describe it a little bit. Let us start from the basic things.

Search result visualization

It is quite obvious to use a map instead of simple list of results to visualize what was returned for an entered query. This technique is frequently used for plenty of online search applications especially in directory services like yellow pages or real estate web sites. The list of things that are required to do this is pretty short:

- geoloalization of items  – it means to assign accurate geo coordinates to location names, addresses, zip codes or whatever expected to be shown in the map; geo localization services are given more less for free by Google or Bing maps.

- backgroud map – this is necessity and also given by Google or Bing; there are also plenty of vendors for more specialized mapping applications

- returned results with geo-coordinates  as metadata – to put them in the map

Normally this kind of basic GIS visualisation delivers basic map operations like zooming, panning, different views and additionally some more data like traffic, parks, shops etc. Results are usually pins [Bing] or drops [Google].

Querying / filtering with the map

The step further of integration between search and GIS would be utilizing the map as a tool for definition of search query. One way is to create area of interest that could be drawn in the map as circle, rectangle or polygon. In simple way it could be just the current window view on the map as the area of query. In such an approach full text query is refined to include only results belonging to area defined.

Apart from map all other query refinement tools should be available as well, like date-time sliders or any kind of navigation and fielded queries.

Simple geo-spatial analysis

Sometimes it is important to sort query results by distance from a reference point in order to see all the nearest Chinese restaurant in the neighborhood.  I would also categorize as simple geo-spatial analysis grouping of search result into a GIS layers like e.g. density heatmap, hot spots using geographical and other information stored in results metadata etc.

Advanced geo-spatial analysis

More advance query definition and refinement would involve geo-spatial computations. Basing on real needs it could be possible for example to refine search results by an area of sight line from a picked reference point or select filtering areas like those inside specific borders of cities, districts, countries etc.

So the idea is to use relevant output from advanced GIS analysis as an input for query refinement. In this way all the power of GIS can be used to get to the unstructured data through a search process.

What kind of applications do you think could get advantage of search stuffed with really advanced GIS? Looking forward to your comments on this post.

Kristian Norling

Text Analytics in Enterprise Search

January 11 - 2012 | Kristian Norling

A presentation made by Daniel Ling at Apache Lucene Eurocon in Barcelona, october 2011.

We think this is the first of many forthcoming presentations.

We also want to get more involved in the community in the future. By doing presentations, sponsoring, contributing code. Hope to bring more news on this subject in the next few weeks. Enjoy the presentation:

Text Analytics in Enterprise Search, Daniel Ling, Findwise, Eurocon 2011 from Lucene Revolution on Vimeo.

Leonard Saers

Analytics and BigData at IBM Information On Demand 2011

December 20 - 2011 | Leonard Saers

The big trend these days are in BigData and how you can analyze large amounts of information in order to gain important insights, and from those insights be able to take the right action. This trend was a hot topic at the IBM Information On Demand (IOD) conference in Las Vegas earlier this year. IBM has a very strong position in this field, it’s hard to have missed how their computer system Watson challenged the top players of all time in Jeopardy recently, and won! Read more about Watson

Now IBM has taken the technology behind Watson and started to apply it in their different analytics products, where one specific area that is being targeted is healthcare. For this area IBM released a new product during IOD called IBM Content and Predictive Analytics for Healthcare, which can for example be used as a tool for physicians to support them in their diagnosis of patients.

In April this year IBM merged two of their products, their search engine OmniFind and their product for analyzing large amounts of unstructured information, Content Analytics. The new product is called IBM Content analytics with Enterprise search and it too is based on much of the same technology that is used in Watson, more specifically it utilizes the same Natural Language Processing techniques. This means that it has the ability to understand text on a level just as sophisticated as that of Watson.

Content Analytics with enterprise search scales very well to many millions of documents. However, when there is a need for analyzing really enormous data sets, in the magnitude of petabytes or even exabytes, IBM has developed what they call their BigData platform. This platform mainly revolves around two products, InfoSphere Streams and InfoSphere BigInsights, and it builds on a foundation of open source software, such as Apache Hadoop and Apache Lucene. InfoSphere Streams is used for real time analysis of information in motion. This helps you understand what’s happening right at this moment in your organization and supports you in taking appropriate action as things are happening. InfoSphere BigInsights on the other hand lets you analyze and draw insight from massive amounts of already existing data.

Studies have shown how organizations that fall short in this area are overtaken by those who understand how to use the power of analytics.

IBM has surely chosen an interesting path when merging Analytics with Findability.

Christian Ubbesen

Inspiration from the Enterprise Search Europe conference

November 11 - 2011 | Christian Ubbesen

A couple of weeks ago, me and some of my colleagues attended the Enterprise Search Europe conference in London. We’re very grateful to the organizer Martin White at IntranetFocus for arranging the event, and having us as one of the gold sponsors.

For me it was the first time in years I attended a conference like this, and while it was “same old, same old” for many of the attendees, for me it was enlightening to meet up with the industry and have a discussion on where we are as an industry.

There were mainly software vendors and professional services/consultants there, as well a few customers or actual users of enterprise search… and I think the consensus of the two days were that we in the industry STILL haven’t really figured out what we should do with the enterprise search concept, and how to make it valuable for our customers. We at Findwise are not alone with this challenge, but rather it is an industry challenge. There are some vendors who seem to be doing some good work of delivering real value to customers, and also there are a few colleagues to us in the industry that do good professional services/consultant work. At first it was a bit of a downer to realize that we haven’t progressed more during the 10 years I’ve been in the business, but at the same time it was very inspirational to see that we at Findwise together with a few other players, seem to be on the right track with our hard work, and that we have the position to solve some of the real industry challenges we’re facing.

As I see it, if we gather our forces and make a focused “push forward” together now, we will be able to take the industry to a new maturity level where we better solve real business challenges with enterprise search (or search-driven Findability solutions, as we like to call them).

My simple analysis of all the discussions at the conference is that we need to do two things:

  1. Manage the whole “full picture” of enterprise search – from strategy to organizational governance, involving necessary competencies to cover all aspects of a successful Findability solution.
  2. Break down the customer challenge into manageable chunks, and solve actual business problems, not just solving the traditional “finding stuff when needed” challenge.

I think we are on the right track, and it’s going to be a very interesting journey from here on!

Caroline Abrahamsson

Enterprise search – market overview 2011

September 26 - 2011 | Caroline Abrahamsson

A few weeks ago Forrester research released a report with an overview of the 12 leading Enterprise search vendors on the global market (Attivio, Autonomy, Coveo, Endeca, Exalead, Fabasoft, Google, IBM, ISYS Search, Microsoft, Sinequa and Vivisimo).

When I wrote about the Gartner report, readers commented on the fact that open source solutions were not part of the scope, even though their market share is increasing rapidly. The Forrester report has the same approach, except it includes vendors offering their products stand-alone as well as those with products integrated in portal/ECM solutions.

So why the exclusion of open source? Well, it appears difficult to decide on how to evaluate open source, especially when it comes to more advanced appliances.

Looking at the Forrester report, it includes some familiar conclusions but also a few new insights. Leslie Owen from Forrester concludes that “Google, Autonomy, and Microsoft are the most well-known names; they own a large portion of the existing market”. Hence, these vendors are still standing strong, even though they are challenged in various areas.

More surprisingly, some niche players get higher scores than the giants in core areas such as “Indexing and connectivity”, “Interface flexibility” and “Social and collaborative features”.

Vivisimo is seen as somewhat of a leader (with a slightly lower score on Mobile support and Semantics/text analysis). In the Gartner report, Vivisimo was excluded from the information access evaluation due to the fact that they were ”focusing on specialized application categories, such as customer service”.

Search vendor overview

An interesting reflection from Forrester is that “in the next few years, we expect prices to rise as specialized vendors wax poetic on the transformative power of search in order to distinguish their products from Google and Microsoft FAST Search for SharePoint”. On the Nordic market, we have not seen a shift to such a strategy, but rather the opposite, since open source (with zero license fees) is becoming accepted in an Enterprise environment to a larger extent.

The vendors that provide integrated solutions (to CMS/WCM etc) still remains strong, whereas the stand-alone solutions becomes exposed to completion in new ways. It will be interesting to follow the US and Nordic market to see how this evolves within the next year. It might be that the market differs when it comes to open source adaption.

If you wish to read the full report it can be downloaded from Vivisimo through a simple registration.
To get a complete overview of vendors, I recommend reading both the Gartner and Forrester report.

Svetoslav Marinov

ECIR 2011 in retrospect

April 27 - 2011 | Svetoslav Marinov

The European Conference on Information Retrieval (ECIR) 2011 took place in Dublin last week, 18-21 April. In this blogpost I would try to highlight some of the papers and talks from the conference which caught my attention and back it up with what other attendees said about it.

First, I was intrigued by the session on evaluation for IR and especially the topic of Croudsourcing. In my opition, the paper A Methodology for Evaluating Aggregated Search Results, which also got the prize for best student paper, was among the most pedagogically presented ones. It deals with the task of incorporating search results from a number of different sources, called verticals, into Web search results. By using a small number of human judgements for a given query the authors present the way to evaluate any possible permutation of verticals in the result presentation. I think that this methodology should be adopted in the world of Enterprise search, since it is exactly there where we crawl, index and present information from a number of different sources – Web, databases, fileshares, etc. The prerequisites are really minimal and low cost but the return value, the user experience, seems quite high.

Amazon Mechanical Turk, or the Artificial Artificial Intelligence, which is the marketplace for Croudsourcing, provides a way for a ridiculously small sum of money to perform evaluation, relevance assessment or any task for which you would need humans to give you some judgements. Leaving aside ethical issues, two papers in the conference presented ways of how you can utilize this service for some IR tasks.

Evgeniy Gabrilovich from Yahoo! Research, who won the Karen Sparck Jones award for 2010, gave a very interesting keynote talk on Computational Advertising. Up to now, it has never struck me how hard advertising in Information Retrieval systems is actually. I liked one of his points on the future of Ads – by using product feeds, one can automatically create product description via Text Summarization and Natural Language Generation and index this, thus avoiding bid words.

Another interesting and very pedagogically presented paper was about the gensim package by Radim Řehůřek. I definitely think we can use it in some of our projects. In general, text categorization and IR for social network were the dominant tracks. In one of the social networks tracks, Oscar Täckström presented a neat way of discovering fine-grained sentiment where some coarse-grained supervision is available. It really hooked me on trying it for any of our customers where sentiment analysis is required.

Thorsten Joachims, the last of the keynote speakers, gave a very inspiring talk on The Value of User Feedback. He put forward the idea of designing retrieval systems for feedback. In stead of just looking at the clicklogs post factum one can think of a system which uses the clicks feedback to learn, thus creating a better ranker for a given query and a given user need. In a single session, we can use click feedback to disambiguate the query and deliver results on the run which are of immediate benefit to the users.

Unfortunately, I guess I could have missed other interesting presentations but with two parallel sessions and several workshops there was a limit to what I could devour. What surprised me though, was that there were very few papers by the industry. We do try to solve exactly the same problems and tackle the same issues as academia. We, at Findwise, have constantly flagged the huge benefit of good, relevant Metadata for the task of achieving better search performace, which was also touched upon in the paper “Topic Classification in Social Media using Metadata from Hyperlinked Objects”.

It was really great to visit Dublin and attent ECIR 2011. It was an inspiring conference and I do believe that at next ECIR we, from Findwise, can be on the podium, sharing our knowledge and hands-on experience on Enterprise search and IR.

Sláinte!

Caroline Abrahamsson

Search conferences 2011

March 3 - 2011 | Caroline Abrahamsson

During 2011 a large number of search conferences will take place all over the world. Some of them are dedicated to search, whereas others discuss the topic related to specific products, information management, usability etc.

Here are a few that might be of interest for those of you looking to be inspired and broaden your knowledge. Within a few weeks we will compile all the research related conferences – there are quite a few of them out there!
If there is anything you miss, please post a comment.

March
IntraTeam Event Copenhagen 2011
Main focus: Social intranets, SharePoint and Enterprise Search
March 1, 2 and 3, 2011, Copenhagen, Denmark

Webcoast
Main focus: A web event that is an unconference, meaning that the attendees themselves create the program by presenting on topics of their own expertise and interest.
March 18-20 , Gothenburg, Sweden

Info360
Main focus: Business productivity, Enterprise Content Management, SharePoint 2010
March 21-24, Walter E. Washington Convention Center, Washington, USA

April
International Search Summit Munich
Main focus: International search and social media.
4th April 2011, Hilton Munich Park Hotel, Germany

ECIR 2011: European Conference on Information Retrieval
Main focus: Presentation of new research results in the field of Information Retrieval
April18-21, Dublin, Ireland

May
Enterprise Search Summit Spring 2011
Main focus: Develop, implement and enhance cutting-edge internal search capabilities
May 10-11, New York, USA

International Search Summit: London
Main focus: International search and social media
May 18th, Millennium Gloucester Hotel, London, England

Lucene Revolution
Main focus: The world’s largest conference dedicated to open source search.
May 25-26, San Francisco Airport Hyatt Regency, USA

SharePoint Fest – Denver 2011
Main focus: In search track: Enterprise Search, Search & Records Management, & FAST for SharePoint
May 19-20, Colorado Convention Center, USA

June
International Search Summit Seattle
Main focus: International search and social media
June 9th, Bell Harbor Conference Center, Seattle, USA

2011 Semantic Technology Conference
Main focus: Semantic technologies – including Search, Content Management, Business Intelligence
June 5-9, Hilton Union Square, San Francisco, USA

October
SharePoint Conference 2011
Main focus: SharePoint and related technologies
October 3-6, Anaheim, California, USA

November
Enterprise Search Summit Fall Nov 1-3
Main focus: How to implement, manage, and enhance search in your organization
Integrated with the KMWorld Conference, SharePoint Symposium and Taxonomy Bootcamp,

KM-world
(Co-locating with Enterprise Search Summit Fall, Taxonomy Boot Camp and Sharepoint Symposium)
Main focus: Knowledge creation, publishing, sharing, finding, mining, reuse etc
November 1 – 3, Washington Marriott Wardman Park, Washington DC, USA

Gilbane group Boston
Main focus: Within search: semantic, mobile, SharePoint, social search
November 29 – December 1, Boston, USA

Caroline Abrahamsson

Gartner and the magic quadrants – crowning the leaders of Enterprise Search

January 25 - 2011 | Caroline Abrahamsson

For years Gartner, the research and advisory company, has been publishing their magic quadrants – and their verdict of everything from ECM-systems to Data Warehouse and E-commerce plays a big role in many company’s decision to choose the right tools.
Simply put, the vendors are presented in a matrix measuring the different players by ability to execute (product, overall viability, customer experience etc.) and the completeness of their vision (offering strategy, innovation etc.). The vendors are then positioned as niche players (a rather crowded spot), visionaries, challengers and leaders.

At the end of last year Gartner decided to retire their old “Information Access Quadrant” and introduce “Enterprise Search MarketScope” due to a more mature market. A number of vendors (such as Vivisimo and Recommind) were removed, in order to exclude those whose businesses were not entirely search driven.

The evaluation criteria’s for MarketScope cover: offering (product) strategy, Innovation, Overall viability (business unit, financial, strategy, and organization), Customer experience, Market understanding and business model.
To summarize: the criteria’s are to a large extent the same, but the two areas “overall viability” and “customer experience” are weighted higher than the rest. This is most likely a result of the last years discussion around user friendly interfaces, easier administration and the fact that some customers have suffered quite bad when vendors do not survive (one example in Northen Europe is the Danish vendor that went bankrupted for some time)

The yearly fight between the three leaders; Microsoft, Endeca and Autonomy has been somewhat disrupted and Microsoft, Endeca and Google are now seen as the leaders.
Microsoft has got a very broad product line, which stretches from low-price and less functionality to Enterprise Search built on the former FAST technology. Endeca follow the same trend, as Gartner puts it their “products (are) intended to serve organizations seeking to develop general search installations..(..) broadly applicable for a variety of different search challenges”.
In the old quadrant, Google remained a “challenger” for quite some time – but never made it to the “leaders” corner. Ease of administration and “user friendly” are two words that keeps being repeated. That, in combination with a profit of $ 7290000000 during the last quarter of 2010 makes Google a player that easily can continue to develop their Enterprise business.

Gartner's MarketScope for Enterprise Search

 

Autonomy should still not be disregarded, the main reason for it falling a bit behind the three others seem to be conquerable problems with support and pricing transparency. It will be interesting to see how Autonomy chooses to handle these issues during 2011.

To put it short: the new MarketScope is good reading with quite few surprises. If you wish to get a better understanding of the development going on at the different vendors, start with Gartner and continue to search among our blog posts.

Tobias Berg

The difference between Search and Find

January 23 - 2011 | Tobias Berg

Is “Findability” only a buzzword to describe the same thing as before when talking about search solutions, or does it bring something new to the discussion? I’d like to think the latter and this week I read a blog post describing the difference between search and findability in a very good way. I couldn’t have written it better myself :)

For the lazy one, I’ve picked a quote that is the key element in the post:

Findability: introducing the robot waiter

Imagine you’re in a futuristic restaurant and when the robot waiter approaches, you ask for ‘ham and cheese omelette’. In response he just shrugs his robotic shoulders and says ‘not found – please try again.’ You then have to keep guessing until you find a match for something you’d like to order.

Now imagine a second futuristic restaurant where the robot waiter says ‘Mr Grimes, how lovely to see you, the last time you visited you had A and B and gave them a 5 star rating. People who ordered x, also ordered y and found that the wines a, b and c went really well with it.’At first restaurant the menu was searchable (though regretably the ‘ham and cheese omelette’ query didn’t match anything), at the second restaurant the menu was findable.

To me, this analogy is spot-on. I dare to say that making content searchable is more of a technical issue while reaching great findability requries understanding of the business. Why is that?

Well, making a content repository searchable you “only” need to hook up a connector, index the repository and display a search box to the users. To succeed with this, it doesn’t matter if the content is movie reviews, user manuals, reciepes, a product catalog or whatever. What you need to know is the format of the repository (is it a SQL database, filesystem, ECM, etc.).

But if you want your users to find what they want in your repositories, business knowledge is a requirement. It’s true that you help your users find information by implementing technical stuff likequery completion, facets, did-you-mean, synonym dictionaries, etc. But if they are to be of any help you need to present facets that are useful, populate the synonym dictionary with terms used in your organisation,etc. For example, a good synonym file targeted towards nurses and doctors would be very different compared to one targeted at employees at an insurance company.

Ludvig Johansson

Search is a journey not a destination

December 2 - 2010 | Ludvig Johansson

Two weeks ago me, Ludvig Johansson and Christopher Wallström attended KMWorlds quadruple conference in Washington D.C. The conference consisted of four different conferences; KMWorld, Enterprise Search Summit, Taxonomy Bootcamp and SharePoint Symposium. I focused on Enterprise Search Summit and SharePoint Symposium and Christopher mainly covered Taxonomy Bootcamp as well as the Enterprise Search Summit. (Christopher will soon write a blog post about this as well.)

During the conferences there where some good quality content, however most of it was old news with speakers mainly focusing on outputs of their own products. This was disappointing since I had hoped to see the newest and coolest solutions within my area. Speakers presented systems from their corporations, where the newest and coolest functionality they described was shallow filters on a Google Search Appliance. From my perspective this is not new or cool. I would rather consider this standard functionality in today’s search solutions.

However, some sessions where really good. Daniel W. Rasmus talked about the Evolution of Search in quite a fun and thoughtful way. One thing he wanted to see in the near future was more personalization of search. Search needs to know the user and adapt to him/her and not simply use a standardized algorithm. As Rasmus sad it: “my search engine is not that in to me”. This is, as I would put it, spot on how we see it at Findwise. Today’s customer wants standard search with components that have existed for years now. It’s time for search to take the next step in the evolution and for us to start deliver Findabillity solutions adapted to your needs as an individual. In the line of this, Rasmus ended with another good quote: “Don’t let your search vendors set your exceptions to low”. I think this speaks for it self more or less. If we want contextual search then we should push the vendors out there to start deliver!

Another good session was delivered by Ellen Feaheny on how to utilize both old and new systems smarter. It was from this session the title of this post origins, “It’s a journey not a destination”. I thought this sums up what we feel everyday in our projects. It’s common that customers want to see projects to have a clear start and end. However with search and Findability we see it as a journey. I can even go as far to say it’s a journey without an end. We have customers coming and complaining about their search; saying “It doesn’t work anymore” or “The content is old”, to give two examples. The problem is that search is not a one time problem that you solve and then never have to think about again. If you don’t work with your search solution and treat search as a journey, continually improve relevance, content and invest time in search analytics your solution will soon get dusty and not deliver what your employees or customers wants.

Search is a journey not a destination.