Archive for the ‘Search’ Category

Pawel Wroblewski

Search stuffed up with GIS

February 3 - 2012 | Pawel Wroblewski

When I browsed through marketing brochures of GIS (Geographic Information System) vendors I noticed that the message is quite similar to search analytics. It refers in general to integration of various separate sources into analysis based on geo-visualizations. I have recently seen quite nice and powerful combination of search and GIS technologies and so I would like to describe it a little bit. Let us start from the basic things.

Search result visualization

It is quite obvious to use a map instead of simple list of results to visualize what was returned for an entered query. This technique is frequently used for plenty of online search applications especially in directory services like yellow pages or real estate web sites. The list of things that are required to do this is pretty short:

- geoloalization of items  – it means to assign accurate geo coordinates to location names, addresses, zip codes or whatever expected to be shown in the map; geo localization services are given more less for free by Google or Bing maps.

- backgroud map – this is necessity and also given by Google or Bing; there are also plenty of vendors for more specialized mapping applications

- returned results with geo-coordinates  as metadata – to put them in the map

Normally this kind of basic GIS visualisation delivers basic map operations like zooming, panning, different views and additionally some more data like traffic, parks, shops etc. Results are usually pins [Bing] or drops [Google].

Querying / filtering with the map

The step further of integration between search and GIS would be utilizing the map as a tool for definition of search query. One way is to create area of interest that could be drawn in the map as circle, rectangle or polygon. In simple way it could be just the current window view on the map as the area of query. In such an approach full text query is refined to include only results belonging to area defined.

Apart from map all other query refinement tools should be available as well, like date-time sliders or any kind of navigation and fielded queries.

Simple geo-spatial analysis

Sometimes it is important to sort query results by distance from a reference point in order to see all the nearest Chinese restaurant in the neighborhood.  I would also categorize as simple geo-spatial analysis grouping of search result into a GIS layers like e.g. density heatmap, hot spots using geographical and other information stored in results metadata etc.

Advanced geo-spatial analysis

More advance query definition and refinement would involve geo-spatial computations. Basing on real needs it could be possible for example to refine search results by an area of sight line from a picked reference point or select filtering areas like those inside specific borders of cities, districts, countries etc.

So the idea is to use relevant output from advanced GIS analysis as an input for query refinement. In this way all the power of GIS can be used to get to the unstructured data through a search process.

What kind of applications do you think could get advantage of search stuffed with really advanced GIS? Looking forward to your comments on this post.

Christian Ubbesen

Inspiration from the Enterprise Search Europe conference

November 11 - 2011 | Christian Ubbesen

A couple of weeks ago, me and some of my colleagues attended the Enterprise Search Europe conference in London. We’re very grateful to the organizer Martin White at IntranetFocus for arranging the event, and having us as one of the gold sponsors.

For me it was the first time in years I attended a conference like this, and while it was “same old, same old” for many of the attendees, for me it was enlightening to meet up with the industry and have a discussion on where we are as an industry.

There were mainly software vendors and professional services/consultants there, as well a few customers or actual users of enterprise search… and I think the consensus of the two days were that we in the industry STILL haven’t really figured out what we should do with the enterprise search concept, and how to make it valuable for our customers. We at Findwise are not alone with this challenge, but rather it is an industry challenge. There are some vendors who seem to be doing some good work of delivering real value to customers, and also there are a few colleagues to us in the industry that do good professional services/consultant work. At first it was a bit of a downer to realize that we haven’t progressed more during the 10 years I’ve been in the business, but at the same time it was very inspirational to see that we at Findwise together with a few other players, seem to be on the right track with our hard work, and that we have the position to solve some of the real industry challenges we’re facing.

As I see it, if we gather our forces and make a focused “push forward” together now, we will be able to take the industry to a new maturity level where we better solve real business challenges with enterprise search (or search-driven Findability solutions, as we like to call them).

My simple analysis of all the discussions at the conference is that we need to do two things:

  1. Manage the whole “full picture” of enterprise search – from strategy to organizational governance, involving necessary competencies to cover all aspects of a successful Findability solution.
  2. Break down the customer challenge into manageable chunks, and solve actual business problems, not just solving the traditional “finding stuff when needed” challenge.

I think we are on the right track, and it’s going to be a very interesting journey from here on!

Björn Klockljung Johansson

Book Review: Search Analytics for Your Site

September 14 - 2011 | Björn Klockljung Johansson

Lou Rosenfeld is the founder and publisher of Rosenfeld Media and also the co-author (with Peter Morville) of the best-selling book Information architecture for the World Wide Web, which is considered one of the best books about information management.

In Lou Rosenfeld’s latest book he lets us know how to successfully work with Site Search Analytics (SSA). With SSA you analyse the saved search logs of what your users are searching for to try to find emerging patterns. This information can be a great help to figure out what users want and need from your site.  The search terms used on your site will offer more clues to why the user is on your site compared to search queries from Google (which reveal how they get to your site).

So what’s in the book?

Part I – Introducing Site Search Analytics

In part one the reader gets a great example of why to use SSA and an introduction to what SSA is. In the first chapters you follow John Ferrara who worked at a company called Vanguard and how he analysed search logs to prove that a newly bought search engine performed poorly whilst using the same statistics to improve it. This is a great real world example of how to use SSA for measuring quality of search AND to set up goals for improvement.

a word cloud is one way to play with the data

Part II – Analysing the data

In this part Lou gets hands on with user logs and lets you how to analyse the data. He makes it fun and emphasizes the need to play with user data. Without emphasis on playing, the task to analyse user data may seem daunting. Also, with real world examples from different companies and institutions it is easy to understand the different methods for analysis. Personally, I feel the use of real data in the book makes the subject easier (and more interesting) to understand.

From which pages do users search?

Part III – Improving your site

In the third part of the book, Rosenfeld shows how to apply your findings during your analysis. If you’ve worked with SSA before most of it will be familiar (improving best bets, zero hits, query completion and synonyms) but even for experienced professionals there is good information about how to improve everything from site navigation to site content and even to connect your ssa to your site KPI’s.

Conclusion

Search Analytics For Your Site shows how easy it is to get started with SSA but also the depth and usefulness of it. This book is easy to read and also quite funny. The book is quite short which in this day and age isn’t negative. For me this book reminded me of the importance of search analytics and I really hope more companies and sites takes the lessons in this book to heart and focuses on search analytics.

Tobias Berg

Google Search Appliance 6.10 released

May 4 - 2011 | Tobias Berg

Last week, Google released version 6.10 of the software to their Google Search Appliance (GSA).

This is a minor update and the focus at the Google teams has been bug fixes and increased stability. Looking at the release notes, there’s indeed plenty of bugs that has been solved.

However, there are also some new features in this release. Some of the more interesting, in my opinion, are:

Multiple front-end configuration for Dynamic Navigation
Since the 6.8 release, the GSA has been able to provde facets, or Dynamic Navigation as Google calls it. However the facets has been global so you couldn’t have two front ends with different facets. This is now possible.

More feeds statistics and Adjust PageRank in feeds
More statistics of what’s happening with feeds you push into the GSA is a very welcome feature. The possibility to adjus PageRank allows for some more control over relevancy in feeds.

Indexing Crawl time kerberos support and Indexing large files
Google is working hard on security and every release since 6.0 has included some security improvements. Nice to see that it continues. Since beginning, the GSA has simply dropped files bigger than 30 MB. Now it will index larger (you can configure how large), but still only the first 2.5 MB of the content will be indexed.

Stopword lists for differented languages

Scalability Centralized configuration
For a multi-node GSA setup, you can now specify the configuration on the master and it’s propagated to the slaves

For a complete list of new features, see the New and Changed Features page in the documentation

1 Comment;   Topics: Search

Svetoslav Marinov

ECIR 2011 in retrospect

April 27 - 2011 | Svetoslav Marinov

The European Conference on Information Retrieval (ECIR) 2011 took place in Dublin last week, 18-21 April. In this blogpost I would try to highlight some of the papers and talks from the conference which caught my attention and back it up with what other attendees said about it.

First, I was intrigued by the session on evaluation for IR and especially the topic of Croudsourcing. In my opition, the paper A Methodology for Evaluating Aggregated Search Results, which also got the prize for best student paper, was among the most pedagogically presented ones. It deals with the task of incorporating search results from a number of different sources, called verticals, into Web search results. By using a small number of human judgements for a given query the authors present the way to evaluate any possible permutation of verticals in the result presentation. I think that this methodology should be adopted in the world of Enterprise search, since it is exactly there where we crawl, index and present information from a number of different sources – Web, databases, fileshares, etc. The prerequisites are really minimal and low cost but the return value, the user experience, seems quite high.

Amazon Mechanical Turk, or the Artificial Artificial Intelligence, which is the marketplace for Croudsourcing, provides a way for a ridiculously small sum of money to perform evaluation, relevance assessment or any task for which you would need humans to give you some judgements. Leaving aside ethical issues, two papers in the conference presented ways of how you can utilize this service for some IR tasks.

Evgeniy Gabrilovich from Yahoo! Research, who won the Karen Sparck Jones award for 2010, gave a very interesting keynote talk on Computational Advertising. Up to now, it has never struck me how hard advertising in Information Retrieval systems is actually. I liked one of his points on the future of Ads – by using product feeds, one can automatically create product description via Text Summarization and Natural Language Generation and index this, thus avoiding bid words.

Another interesting and very pedagogically presented paper was about the gensim package by Radim Řehůřek. I definitely think we can use it in some of our projects. In general, text categorization and IR for social network were the dominant tracks. In one of the social networks tracks, Oscar Täckström presented a neat way of discovering fine-grained sentiment where some coarse-grained supervision is available. It really hooked me on trying it for any of our customers where sentiment analysis is required.

Thorsten Joachims, the last of the keynote speakers, gave a very inspiring talk on The Value of User Feedback. He put forward the idea of designing retrieval systems for feedback. In stead of just looking at the clicklogs post factum one can think of a system which uses the clicks feedback to learn, thus creating a better ranker for a given query and a given user need. In a single session, we can use click feedback to disambiguate the query and deliver results on the run which are of immediate benefit to the users.

Unfortunately, I guess I could have missed other interesting presentations but with two parallel sessions and several workshops there was a limit to what I could devour. What surprised me though, was that there were very few papers by the industry. We do try to solve exactly the same problems and tackle the same issues as academia. We, at Findwise, have constantly flagged the huge benefit of good, relevant Metadata for the task of achieving better search performace, which was also touched upon in the paper “Topic Classification in Social Media using Metadata from Hyperlinked Objects”.

It was really great to visit Dublin and attent ECIR 2011. It was an inspiring conference and I do believe that at next ECIR we, from Findwise, can be on the podium, sharing our knowledge and hands-on experience on Enterprise search and IR.

Sláinte!

Delivering information where it’s needed

April 7 - 2011 | David Ronnqvist

I recently started working at Findwise after having finished my thesis on location-based information delivery in a mobile phone. The purpose of my thesis was to:

  • Investigate how location-based information (as opposed to fixed locations) could be connected to search results
  • Improve quality of location-based information by considering the course and velocity of the user

To start with, I created an iPhone application with a location-based reminder system. The reminders described location constraints and users could create reminders with single locations (at home) or groups of locations (at any pharmacy). To find these groups of locations, the system searched for locations with associated information (like nearby pharmacies) and delivered this information without users having to click Search repeatedly.

This is an unusual approach to search as the user is passive, instead the system is performing searches for the user. However, to make search results relevant one has to add contextual constraints to describe when, where and to whom a piece of information is relevant. When all constraints are met, information should be relevant. If not, the system lacks some crucial contextual constraints.

When search is automated, the importance of relevant search results increases and the more you know of the users world, the better you can adjust the results. However, traditional search can also benefit from contextual information. It can be used as a filter where search results that are irrelevant in the current context are removed. Alternatively it could be a part of the relevance model, improving search results by reordering them according to context. Hence, whereas automatic information delivery is probably undesirable for many types of information – contextual constraints can still be of good use!

The people who tested my application created 25% of their reminders as groups of locations and found it useful as it helped them find places they weren’t aware of, facilitating opportunistic behavior. The course and velocity information reduced the number of false-positive information deliveries. Overall, the system worked well as a niche product.

No Comments   Topics: Research | Search

Tobias Berg

Solr 3.1 released

April 5 - 2011 | Tobias Berg

Last friday, Solr 3.1 was released along with Lucene 3.1. This might seem like a big step from previous version 1.4.1, but is an effect of the merged development for Solr and Lucene that took place a year ago. The Solr version now reflects the Lucene version that is used.

For a complete list of new features and enhancements, you can read the release notes. Though, some of the most interesting features are:

  • Extended dismax (edismax) query parser. It’s an enhancement over dismax, supports full lucene query syntax etc.
  • Spatial search (ie, we can now enable geo-search; sort by distance, boost by distance etc)
  • Numeric range facets.
  • Lots of optimizations and performance improvements, including better Unicode and 64-bit JVM support.

Update: There’s a good list of features and enhancements at Sematexts blog:

I’m really keen on the Spatial Search which open up a new set of applications, espeacially for Mobile Search where you have the advantage of knowing the position of the user.

I’m glad the community pulled of this release after the merge with Lucene and it will be fun to start working with it. What’s your favorite feature in 3.1? Drop a comment!

2 Comments;   Topics: Search

Daniel Ling

Open source tools for text analytics

March 21 - 2011 | Daniel Ling

Recently, both clients of Findwise as well as the Enterprise Search community in general are increasingly showing interest in text analytics in order to get a higher business value out of their (often large) volumes of unstructured information.

Text Analytics merges techniques from linguistics, computer science, machine learning, statistics and many of the central algorithms in this field are publically available as open source tools and packages with easily accessible APIs. While many customers of commercial Enterprise Search solutions, such as Automomy, IBM Omnifind, Microsoft FAST ESP, etc., have long benefitted from some sort of Text Analytics (e.g. Entity Extraction, Keyword Extraction and document summarization), the open source components have now come a long way in providing alternative, free of charge solutions with similar performance and feature set.
As every modern enterprise search architecture today has some kind of document processing that is extensible by additional stages or APIs (for example the Open Pipeline with Solr or the pipeline that comes with Microsoft FAST) – the opportunity for plugging new text analytics stages to existing search implementations is open and ready for new innovation.

Among the most popular applications of text analytics that have emerged lately are customized entity extraction, sentiment analysis and document classification – each with a set of open source alternatives (such as Balie, OpenNLP and GATE) readily available for customization and implementation to your document processing.

Regardless of your industry domain, these techniques open up for a wide variety of new ways to interpret the content and discover new trends from your unstructured textual data – be it through sentiment analysis to support the decision making process, trend analysis or relevance model of search, or entity extraction in order to navigate your content by entities (such as company name or person), the enhancement of your texts by meta-data tagging or finding similar and related content.

How are you taking advantage of modern text analytics?

Mahmood Ahmad

Findability on an e-commerce site

March 13 - 2011 | Mahmood Ahmad

Findability on any e-commerce site is a beast all on its own. What if visitors’ searches return no results? Will they continue to search or did you lose your chance at a sale?

While product findability is a key factor of success in e-commerce, it is predominantly enabled by simple search alone. And while simple search usually doesn’t fulfill complex needs among users, website developers and owners still regard advanced search as just another boring to-do item during development. Owners won’t go so far as to leave it out, because every e-commerce website has some kind of advanced search functionality, but they probably do not believe it brings in much revenue.

Research shows:
-    50% of online buyers go straight to the search function
-    34% of visitors leave the site if they can’t find an (available) product
-    Buyers are more likely than Browsers to use search (91%)

What can’t be found, can’t be bought:
-    Search is often mission critical in e-commerce
-    Users don’t know how to spell
-    Users often don’t even know how to describe it

First of all, Findability can accelerate the sales process. And faster sales can increase conversions, because you will not be losing customers who give up trying to find products. Furthermore, fast, precise and successful searches increase your customers’ trust.

On both e-commerce and shopping comparison sites, users can find products in two different ways: searching and browsing. Searching obviously means using the site search whilst browsing involves drilling down through the categories provided by the website. The most common location for a site search on e-commerce sites is at the top of the page, and generally on the right side. Many e-commerce sites have a site search, user login, and shopping cart info all located in the same general area. Keeping the site search in a location that is pretty common will help it to be easier to find for some of your visitors who are accustomed to this trend.

Faceted search should be the de facto standard for an e-commerce website. When a user performs a simple search first, but then on the results page, he or she can narrow the search through a drill-down link (for a single choice) or a check box selection (for multiple non-overlapping choices). The structure of the search results page must also be crystal clear. The results must be ranked in a logical order (i.e. for the user, not for you) by relevance. Users should be able to scan and comprehend the results easily. Queries should be easy to refine and resubmit, and the search results page should show the query itself.

Spell-check is also crucial. Many products have names that are hard to remember or type correctly. Users might think to correct their misspelling when they find poor results, but they will be annoyed at having to do that… or worse, they might think that the website either doesn’t work properly or does not have their product.

Query completion can decrease the problems caused by mistyping or not knowing the proper terminology. Queries usually start with words; so unambiguous character inputting is crucial.

Search analytics, contextual advertisement and behavioral targeting is more than just finding a page or a product. When people search they tell you something about their interests, time, location and what is in demand right now, they say something about search quality by the way they navigate and click in result pages and finally what they do after they found what they were looking for.

A good e-commerce solution uses search technology to:

-    Dynamically tailor a site to suit the visitors’ interests
-    Help the user to find and explore
-    Relate information and promote up- and cross sales
-    Improve visitor satisfaction
-    Increase stickiness
-    Increase sales of related products or accessories
-    Inspire visitors to explore new products/areas
-    Provide-increased understanding of visitor needs/preferences

–> Convert visitors into returning customers!

Caroline Abrahamsson

Search conferences 2011

March 3 - 2011 | Caroline Abrahamsson

During 2011 a large number of search conferences will take place all over the world. Some of them are dedicated to search, whereas others discuss the topic related to specific products, information management, usability etc.

Here are a few that might be of interest for those of you looking to be inspired and broaden your knowledge. Within a few weeks we will compile all the research related conferences – there are quite a few of them out there!
If there is anything you miss, please post a comment.

March
IntraTeam Event Copenhagen 2011
Main focus: Social intranets, SharePoint and Enterprise Search
March 1, 2 and 3, 2011, Copenhagen, Denmark

Webcoast
Main focus: A web event that is an unconference, meaning that the attendees themselves create the program by presenting on topics of their own expertise and interest.
March 18-20 , Gothenburg, Sweden

Info360
Main focus: Business productivity, Enterprise Content Management, SharePoint 2010
March 21-24, Walter E. Washington Convention Center, Washington, USA

April
International Search Summit Munich
Main focus: International search and social media.
4th April 2011, Hilton Munich Park Hotel, Germany

ECIR 2011: European Conference on Information Retrieval
Main focus: Presentation of new research results in the field of Information Retrieval
April18-21, Dublin, Ireland

May
Enterprise Search Summit Spring 2011
Main focus: Develop, implement and enhance cutting-edge internal search capabilities
May 10-11, New York, USA

International Search Summit: London
Main focus: International search and social media
May 18th, Millennium Gloucester Hotel, London, England

Lucene Revolution
Main focus: The world’s largest conference dedicated to open source search.
May 25-26, San Francisco Airport Hyatt Regency, USA

SharePoint Fest – Denver 2011
Main focus: In search track: Enterprise Search, Search & Records Management, & FAST for SharePoint
May 19-20, Colorado Convention Center, USA

June
International Search Summit Seattle
Main focus: International search and social media
June 9th, Bell Harbor Conference Center, Seattle, USA

2011 Semantic Technology Conference
Main focus: Semantic technologies – including Search, Content Management, Business Intelligence
June 5-9, Hilton Union Square, San Francisco, USA

October
SharePoint Conference 2011
Main focus: SharePoint and related technologies
October 3-6, Anaheim, California, USA

November
Enterprise Search Summit Fall Nov 1-3
Main focus: How to implement, manage, and enhance search in your organization
Integrated with the KMWorld Conference, SharePoint Symposium and Taxonomy Bootcamp,

KM-world
(Co-locating with Enterprise Search Summit Fall, Taxonomy Boot Camp and Sharepoint Symposium)
Main focus: Knowledge creation, publishing, sharing, finding, mining, reuse etc
November 1 – 3, Washington Marriott Wardman Park, Washington DC, USA

Gilbane group Boston
Main focus: Within search: semantic, mobile, SharePoint, social search
November 29 – December 1, Boston, USA