Archive for the ‘Information quality’ Category

Tobias Larsson Hult

Metadata: What is it and what is it good for?

september 3 - 2010 | Tobias Larsson Hult
After reading a blog post explaining the word stemming, I started thinking about other words that are commonly used in a Findability solution and might need some explanation. The word that first came to my mind was ”Metadata”. It’s inevitable to talk about Metadata when you’re talking about Findability. So what is Metadata and why do we need it?

According to Wikipedia, metadata is defined as data about data. That might sound a bit abstract, but what it means is that metadata provides a bit more information about some content whether it’s a piece of text, an image, a video or something else. For a text metadata can be the file format it’s stored as (plain text, word, pdf, etc) and for an image metadata can be the resolution of the image.

Metadata can be divided into different types. Exactly what the types are is not set but  I like to think of metadata that is either a) technical or b) descriptive.

Technical metadata represents ”hard” types assigned automatically by systems like file type, file size, creation date, encoding etc. Descriptive metadata represents more ”soft” metadata assigned by humans like author, title, summary, keywords, category etc.

Technical metadata is often a finite set that can be common accross organisations, where descriptive metadata is more related to the organisation’s needs and structure.

So all this talk about metadata, why do we need to worry about this in a findability solution? Well, since metadata tells us a bit more about our content, we should use this to help our users to find their information quicker. I like to think that metadata can be used in at least three ways in a findability solution; relevance influence, navigation, and result presentation.

So if you define descriptive metadata that makes sense to the users in your organisation, they are very likely to assign them to content they are creating. When content has a high degree of metadata assigned you can use this to help users navigate to the content by using the metadata instead of a fixed folder-like structure. When searching, you can tune the relevance so that if the user’s query matches content in the metadata of the document, it is ranked higher than other documents.

The important thing about metadata is that if you can make users assign it to their content it can be used in many different ways and applications to help people find their content quickly.

Caroline Abrahamsson

Search and content quality – ways of improving your intranet

mars 28 - 2010 | Caroline Abrahamsson

If you have 6 minutes to spare I would recommend you to watch this interview with Gabriel Olsson from Tetra Pak. During the last years Tetra Pak has been working strategically with turning their intranet into something true end user-centric.

By actually asking the employees what they expect to find and what sort of information that would make their everyday work (tasks) more efficient, Tetra Pak has managed to create a navigation structure based on facts reflecting these needs. The method used is Gerry McGovern’s Task based Customer Carewords.
..and the result?
The ones that scream the loudest are not the most important – the need of the employees is.

Gabriel is also talking about the importance of following up on search by key matches and synonyms.
This, together with content quality initiatives, helps create a solid foundation for search, the simple reasons being:

Use metadata to filter search results (note, not a Tetra Pak picture)

  • If the quality of the information is good (clear headings, good metadata, frequent keywords), the information found through search will be good as well. If you have a lot of old content and duplicates this will be just as visible, making it hard for the users to determinate what is qualitative and trustworthy.Good quality will also make it possible to group and categorize information.
  • Synonyms makes it easy to adjust the corporate language to the one used by the employees. Let people search for “report” when they want to find a ”bulletin”. A simple synonym list, based on search statistics will make users find what they want, without thinking about how to phrase the query.The synonyms can used in the background (without the users knowledge) or as ‘did you mean-suggestions’:

    Synonyms used for 'Did you mean" functionality (note, not a Tetra Pak picture)

  • Key matches (also referred to as sponsored links, best bets or editor’s pick) are used to manually force the first hit in the search result list to refer to a specific page or document. By following up on search statistics and knowing what sort of information that is frequently most asked for, it is easy to adjust the search result list. However, this take  time and effort to follow up.

Tetra Pak is not alone when it comes to adjusting their intranets to true end-user needs. During the spring there will be a number of conferences where our customers will be sharing experiences from their initiatives. Among others Ability Partner, and the recently completed IntraTeam.

Apart from this, our own breakfast seminaries is a, as always, announced on our homepage and on twitter.
Looking forward to seeing you!

Caroline Abrahamsson

How to create better search – VGR leads the way

januari 11 - 2010 | Caroline Abrahamsson

I realise we are a bit late. Fredrik Wackå, a senior IT-strategist, has already written an excellent article on his blog (in Swedish). He has, among other things, been interviewing Kristian Norling (at Twitter), who has been working with portal strategies and search for many years at Västra Götalands regionen.
Although, for all our non-Swedish speaking guests here is a short summary:

Findwise has during the last few months been working on a new search solution for Västra Götalands regionen.  The two main goals have been to deliver a search experience that seems both fast and accurate.
The result?
Today making a search at VGR takes about 0,1-0,2 seconds, faster than a Google search on the web.

Furthermore, there was a need for context. Large amount of information requires ways to filter and sort – otherwise the users will drown in the result list.
By giving the end-users the ability to sort the search result the users can look for general information within an area as well as quickly narrow down to a specific piece (for example by two clicks be able to see only the PDF-files created in 2009). The filters (and thereby metadata standard) includes:

• Information type
• Where the document resides
• Where it belongs in the organization
• What source it has
• When it was last changed
• Who has written it
• What format it resides in
• Keywords that has been created

VGR

VGR

The search solution also includes a metadata service. As so many others VGR has been struggling with getting the metadata in place.
Apart from the metadata supported by the system (where Dublin Core is being used) the metadata service is doing two things:
• Analyses the content in the text, compares it to taxonomy and gives the writer suggestions of keywords that he/she can use
• Gives the writer the ability to add additional keywords

Apart from this the end-users will be able to add etiquettes (tags). These will be compared with two lists. If the tags appears in the “white list” it will be published right away, if they are in the “blacklist” they will be deleted. Anything inbetween are controlled before they are published.

To conclude: a lot of effort has been put into creating a good search experience and VGR continues to deliver functionality and solutions that are light-years ahead of many others. The combination of supporting systems and using the ”collected intelligence” of the writers and end-users will make it even better over time.
Search is about both supporting systems, content and people.

Read more in Fredrik Wackås blog

Tobias Larsson Hult

To Crawl or not to Crawl

december 11 - 2009 | Tobias Larsson Hult

Having an Enterprise Search Engine, there are basically two ways of getting content into the index; using a web crawler or a connector. Both methods have their advantages and disadvantages. In this post I’ll try to poinpoint the differences with the two methods.

Web crawler

Most systems of today have a web-interface. Let it be your time reporting system, intranet, document management, you’ll probably access those with your web browser. Because of this, it’s very easy to use a web crawler to index this content as well.

The web crawler index the pages by starting at one page. From there, it follows all outbound links and index those. From those pages, it follows all links, and so on. This process continues until all links at a web site has been followed and the pages been indexed. The crawler thus uses the same technique as a human, visit a page and clicking the links.

Most Enterprise Search Engines are bundled with a web crawler. Thus, it’s usually very easy to get started. Just enter a start page and within minutes you’ll have searchable content in your index. No extra installation or license fee are required. For some sources, this may also be the only option, i.e if you’re indexing external sources that your company has no control of.

The main disadvantage though, is that web pages are designed for humans, not crawlers. This means that there are a lot of extra information for presentation purposes, such as navigation menus, sticky information messages, headers and footers and so on. All of this makes it a more pleasant experience for the user, and also making it easier to navigate on the page. The crawler on the other hand has no use of this information when retrieving pages. It’s actually reducing information quality in the index. For example, a navigation menu will be displayed on every page, thus the crawler will index the navigation content for all pages. So if you have a navigation item called ”Customers” and a user searches for customers, he/she will get a hit in ALL pages in the index.

There are ways to get around this, but it requires either altering of the produced HTML or adjustments in the search engine. Also, if the design of the site change, you have to do these adjustments again.

Connector

Even though the majority of systems has a web-interface, the content is stored in a data source of some format. It might be a database, structured file system, etc. By using a connector, you connect either to the underlying data source or to the system directly by its programming API.

Using a connector, the search engine does not get any presentation information but only the pure content, making the information quality in the index better. The connector can also retrieve all metadata associated with the information which further increases the quality. Often, you’ll also have more fine-grained control over what will be indexed with a connector than a web crawler.

Though, using a connector requires more configuration. It might also cost some extra money to buy one for your system, and require additional hardware. Though, once set up, it’s most likely to produce more relevant results compared to a web crawler.

Bottom line is it’s a consideration between quality and cost, as most decisions in life :)

Caroline Abrahamsson

Do you know something I don’t? The art of benchmarking

december 1 - 2009 | Caroline Abrahamsson

During the autumn we have been trying to keep our customers and others up to date with the search world by hosting breakfast seminars.
By sharing experiences and discussing with others the participants have taken giant leaps in understanding what search can deliver in true value.
The same goes for sharing experiences between companies, where you often find yourself struggling with the same problems, regardless of business or company size.

We have been discussing how Enterprise search can help intranets, extranets, external sites and support centers to capitalize on their knowledge.
Some of the things that have been discussed:

…Business Cases:
How can search help companies save 100 million SEK/year?
How do you count return on investment (ROI) for search?

…Search functionality:
How and why should you work with:
Key Matches to promote certain content (similar to Google’s sponsored links on the web)
Synonyms (to make sure that the end-users language corresponds to the corporate without having to change the information)
Query completion and suggestion to give the user an overview of what other people have been searching for when they start to type (similar to Apples web site search).

…End-user experience
How can different interfaces serve different information needs and user-groups?
How does your user interface serve your end-users?

…Information Quality
Do taxonomies and folksonomies help us find information faster?
Can search be used to improve the quality of your content?

During the spring we will continue to hold seminars, keeping you up-to date. If you’re not on our mailing list, please send us an e-mail and we’ll make sure you will get an invitation.

During Wednesday and Thursday this week we will be attending the Ability conference to discuss search. Hope to see you there!

Christopher Wallstrom

Enterprise Search 2.0?

november 30 - 2009 | Christopher Wallstrom

While visiting Enterprise Search Summit in San Jose I realized that enabling Enterprise 2.0 within enterprise search is the hottest trend at the moment.

Andrew McAfee who coined the term Enterprise 2.0 and has released a book on the subject, spoke about how to use altruism to develop the enterprise. People are wired to help and if we stop obsessing about the risks and lower the bars for how people can help each other it is possible to make this work within a corporate environment.

He also spoke about how process control and how much workflow control. How much do we really need? Make it easy to correct mistake instead of making it hard to make them. With regards to innovation he pointed out that we need to question credentialism and build communities that people want to join. To leverage the intelligence aspects within the enterprise we should explore and experiment with collective intelligence such as prediction markets and open peer review processes. All in all make it easy for people to interconnect.

Very high improvement in access to knowledge, internal experts, satisfaction, increased innovation and customer satisfaction.

I also recommend to read Price Waterhouse Coopers Technology Forecast Summer 2008 to get a good overview of the available tools and technologies.

So how does this impact enterprise search? Search can be made to be the facilitator for Enterprise 2.0. Of course it is possible to index and make all blogs, wikipedias, tweets (yammer), online communities and social networks searchable, but that is only one way to make it this new environment more findable. If someone tweets or blogs about information we should use that information to impact on the search results and ranking. We could also track user behavior on a site to make certain information more visible with regards to implicitly expressed interests.

Karl Jansson

Findwise releases Open Pipeline plugins

oktober 9 - 2009 | Karl Jansson

Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the Open Pipeline Plugins page and the ones Findwise have created can be downloaded on our Findwise Open Pipeline Plugins page.

(Läs mer…)

Mickel Gronroos

Six Simple Steps to Superior Search

januari 8 - 2009 | Mickel Gronroos

Do you have your search application up and running but it still doesn’t quite seem to do the trick? Here are six simple steps to boost the search experience.

Avoid the Garbage in-Garbage out Syndrome

Fact 1: A search application is only as good as the content it makes findable.

If you have a news search service that only provides yesterday’s news, the search bit does not add any value to your offering.

If your Intranet search service provides access to a catalog of employee competencies, but this catalog does not cover all co-workers or contain updated contact details, then search is not the means it should be to help users get in touch with the right people.

If your search service gives access to a lot of different versions of the same document and there is no metadata available as to single out which copy is the official one, then users might end up spending unnecessary time reviewing irrelevant search results. And still you cannot rule out the risk that they end up using old or even flawed versions of documents.

The key learning here is that there is no plug and play when it comes to accurate and well thought out information access. Sure, you can make everything findable by default. But you will annoy your users while doing so unless you take a moment and review your data.

Focus on Frequent Queries

Fact 2: Users tend to search for the same things over and over again.

It is not unusual that 20 % of the full query volume is made up of less than 1 % of all query strings. In other words, people tend to use search for a rather fixed set of simple information access tasks over and over again. Typical tasks include finding the front page of a site or application on the Intranet, finding the lunch menu at the company canteen or finding the telephone number to the company helpdesk.

In other words, you will be much advised to make sure your search application works for these highly frequent (often naïve) information access tasks. An efficient way of doing so is to keep an analytic eye on the log file of your search application and take appropriate action on frequent queries that do not return any results whatsoever or return weird or unexpected results.

The key learning here is that you should focus on providing relevant results for frequent queries. This is the least expensive way to get boosted benefit from your search application. 

Make the Information People Often Need Searchable

Fact 3: Users do not know what information is available through search.

Users often believe that a search application gives them access to information that really isn’t available through search. Say your users are frequently searching for ”lunch menu”, ”canteen” and ”today’s lunch”, what do you do if you do not have the menu available at all on your Intranet or Web site?

In the best of worlds, you will make frequently requested information available through search. In other words, you would add the lunch menu to your site and make it searchable. If that is not an option, you might consider informing your users that the lunch menu—or some other popular information people tend to request—is not available in the search application and provide them with a hard-coded link to the canteen contractor or some other related service as a so called “best bet” (or sponsored link as in Google web search).

The key learning here is to monitor what users frequently search for and make sure the search application can tackle user expectations properly.

Adapt to the User’s Language

Fact 4: Users do not know your company jargon.

People describe things using different words. Users are regularly searching for terms which are synonymous to—but not the same as—the terms used in the content being searched. Say your users are frequently looking for a ”travel expense form” on your Intranet search service, but the term used in your official company jargon  is ”travel expenses template”. In cases like this you can build a glossary of synonyms mapping those common language terms people tend to search for frequently to official company terms in order to satisfy your users’ frequent information needs better without having to deviate from company terminology. Another way of handling the problem is to provide hand-crafted best bets (or sponsored links as in Google web search) that are triggered by certain common search terms.

Furthermore, research suggests that Intranet searches often contain company-specific abbreviations. A study of the query log of a search installation at one of Findwise’s customers showed that abbreviations—query strings consisting of two, three or four letters—stood for as much as 18 % of all queries. In other words, it might be worthwhile for the search application to add the spelled-out form to a query for a frequently used abbreviation. Users searching for “cp” on the Intranet would for example in effect see the results of the query “cp OR collaboration portal”

The lesson to learn here is that you should use your query log to learn the terminology the users are using and adapt the search application accordingly, not the other way around!

Help Users With Spelling

Fact 5: Users do not know how to spell.

Users make spelling mistakes—lots of them. Research suggests that 10—25 % of all queries sent to a search engine contain spelling mistakes. So turn on spellchecking in your search platform if you haven’t already! And while you are at it, make sure your search platform can handle queries containing inflected forms (e.g. “menu”, “menus”, “menu’s”, “menus’”). There’s your quick wins to boost the search experience.

Keep Your Search Solution Up-To-Date

Fact 6: Your search application requires maintenance.

Information sources change, so should your search application. There is a fairly widespread misconception that a search application will maintain itself once you’ve got it up and running. The truth is you need to monitor and maintain your search solution as any other business-critical IT application.

A real-life example is a fairly large enterprise that decided to perform a total makeover of its internal communication process, shifting focus from the old Intranet, which was built on a web content management system, in favor of a more “Enterprise 2.0 approach” using a collaboration platform for active projects and daily communication and a document management system for closed projects and archived information.

The shift had many advantages, but it was a disaster for the Enterprise Search application that was only monitoring the old Intranet being phased out. Employees looking for information using the search tool would in other words only find outdated information.

The lesson to learn here is that the fairly large investment in efficient Findability requires maintenance in order for the search application to meet the requirements posed on it now and in the future.

References

100 Most Often Mispelled Misspelled Words in English – http://www.yourdictionary.com/library/misspelled.html

Definition of “sponsored link” – http://encyclopedia2.thefreedictionary.com/Sponsored+link

Maria Johansson

What differentiates a good search engine from a bad one?

november 28 - 2007 | Maria Johansson

That was one of the questions the UIE research group asked themselves when conducting a study of on-site search. One of the things they discovered was that the choice of search engine was not as important as the implementation. Most of the big search vendors were found in both the top sites and the bottom sites.

So even though the choice of vendor influences what functionality you can achieve and the control you have over your content there are other things that matter, maybe even more. Because the best search engine in the world will not work for you unless you configure it properly.

(Läs mer…)

Daniel Johansson

Search as a tool for information quality assurance

oktober 25 - 2007 | Daniel Johansson

Feedback from stakeholders in ongoing projects has highlighted the real need for a supporting tool to assist in the analysis of large amounts of content.
This would introduce a phase where super users and information owners have the possibility to go through a quality assurance process across the information silos, before releasing information directly to end users.
(Läs mer…)