<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Findability blog &#187; Information quality</title>
	<atom:link href="http://findabilityblog.se/category/information-quality/feed/" rel="self" type="application/rss+xml" />
	<link>http://findabilityblog.se</link>
	<description>the search and findability blog</description>
	<lastBuildDate>Fri, 03 Feb 2012 11:49:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Metadata: What is it and what is it good for?</title>
		<link>http://findabilityblog.se/metadata-what-is-it-and-what-is-it-good-for/</link>
		<comments>http://findabilityblog.se/metadata-what-is-it-and-what-is-it-good-for/#comments</comments>
		<pubDate>Fri, 03 Sep 2010 12:50:17 +0000</pubDate>
		<dc:creator>Tobias Berg</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Findability]]></category>
		<category><![CDATA[Information management]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=2242</guid>
		<description><![CDATA[After reading a blog post explaining the word stemming, I started thinking about other words that are commonly used in a Findability solution and might need some explanation. The word that first came to my mind was &#8220;Metadata&#8221;. It&#8217;s inevitable to talk about Metadata when you&#8217;re talking about Findability. So what is Metadata and why do [...]]]></description>
			<content:encoded><![CDATA[<div>After reading a blog post explaining the word <a href="http://www.enterprisesearchblog.com/2010/08/-todays-search-term-stemming.html">stemming</a>, I started thinking about other words that are commonly used in a Findability solution and might need some explanation. The word that first came to my mind was &#8220;Metadata&#8221;. It&#8217;s inevitable to talk about Metadata when you&#8217;re talking about Findability. So what is Metadata and why do we need it?</div>
<p>According to Wikipedia, metadata is defined as <a href="http://en.wikipedia.org/wiki/Metadata">data about data</a>. That might sound a bit abstract, but what it means is that metadata provides a bit more information about some content whether it&#8217;s a piece of text, an image, a video or something else. For a text metadata can be the file format it&#8217;s stored as (plain text, word, pdf, etc) and for an image metadata can be the resolution of the image.</p>
<p>Metadata can be divided into <a href="http://en.wikipedia.org/wiki/Metadata#Metadata_types">different types</a>. Exactly what the types are is not set but  I like to think of metadata that is either a) technical or b) descriptive.</p>
<p>Technical metadata represents &#8220;hard&#8221; types assigned automatically by systems like file type, file size, creation date, encoding etc. Descriptive metadata represents more &#8220;soft&#8221; metadata assigned by humans like author, title, summary, keywords, category etc.</p>
<p>Technical metadata is often a finite set that can be common accross organisations, where descriptive metadata is more related to the organisation&#8217;s needs and structure.</p>
<p>So all this talk about metadata, why do we need to worry about this in a findability solution? Well, since metadata tells us a bit more about our content, we should use this to help our users to find their information quicker. I like to think that metadata can be used in at least three ways in a findability solution; relevance influence, navigation, and result presentation.</p>
<p>So if you define descriptive metadata that makes sense to the users in your organisation, they are very likely to assign them to content they are creating. When content has a high degree of metadata assigned you can use this to help users navigate to the content by using the metadata instead of a fixed folder-like structure. When searching, you can tune the relevance so that if the user&#8217;s query matches content in the metadata of the document, it is ranked higher than other documents.</p>
<p>The important thing about metadata is that if you can make users assign it to their content it can be used in many different ways and applications to help people find their content quickly.</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/metadata-what-is-it-and-what-is-it-good-for/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search and content quality &#8211; ways of improving your intranet</title>
		<link>http://findabilityblog.se/search-and-content-quality-ways-of-improving-your-intranet/</link>
		<comments>http://findabilityblog.se/search-and-content-quality-ways-of-improving-your-intranet/#comments</comments>
		<pubDate>Sun, 28 Mar 2010 12:53:23 +0000</pubDate>
		<dc:creator>Caroline Abrahamsson</dc:creator>
				<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[Relevancy]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=1903</guid>
		<description><![CDATA[If you have 6 minutes to spare I would recommend you to watch this interview with Gabriel Olsson from Tetra Pak. During the last years Tetra Pak has been working strategically with turning their intranet into something true end user-centric. By actually asking the employees what they expect to find and what sort of information [...]]]></description>
			<content:encoded><![CDATA[<p>If you have 6 minutes to spare I would recommend you to watch<a title="Interview Gabriel Olsson, Tetra Pak" href="http://my.intrateam.dk/gb/node/3539" target="_blank"> this interview </a>with Gabriel Olsson from Tetra Pak. During the last years Tetra Pak has been working strategically with turning their intranet into something true end user-centric.</p>
<p>By actually asking the employees what they expect to find and what sort of information that would make their everyday work (tasks) more efficient, Tetra Pak has managed to create a navigation structure based on facts reflecting these needs. The method used is Gerry McGovern&#8217;s <a title="Gerry McGovern Custome Carewords" href="http://www.customercarewords.com/what-it-is.html" target="_blank">Task based Customer Carewords</a>.<br />
..and the result?<br />
The ones that scream the loudest are not the most important – the need of the employees is.</p>
<p>Gabriel is also talking about the importance of following up on search by key matches and synonyms.<br />
This, together with content quality initiatives, helps create a solid foundation for search, the simple reasons being:</p>
<div id="attachment_1906" class="wp-caption alignright" style="width: 144px"><a href="http://media.findabilityblog.se/2010/03/navigators2111.jpg"><img class="size-full wp-image-1906 " title="Navigators" src="http://media.findabilityblog.se/2010/03/navigators2111.jpg" alt="" width="134" height="194" /></a><p class="wp-caption-text">Use metadata to filter search results (note, not a Tetra Pak picture)</p></div>
<ul>
<li>If the quality of the information is good (clear headings, good metadata, frequent keywords), the information found through search will be good as well. If you have a lot of old content and duplicates this will be just as visible, making it hard for the users to determinate what is qualitative and trustworthy.Good quality will also make it possible to group and categorize information.</li>
</ul>
<ul>
<li>Synonyms makes it easy to adjust the corporate language to the one used by the employees. Let people search for “report” when they want to find a &#8220;bulletin&#8221;. A simple synonym list, based on search statistics will make users find what they want, without thinking about how to phrase the query.The synonyms can used in the background (without the users knowledge) or as &#8216;did you mean-suggestions&#8217;:
<div id="attachment_1904" class="wp-caption aligncenter" style="width: 310px"><a href="http://media.findabilityblog.se/2010/03/did-you-mean11.jpg"><img class="size-medium wp-image-1904 " title="Did you mean" src="http://media.findabilityblog.se/2010/03/did-you-mean1-300x37.jpg" alt="" width="300" height="37" /></a><p class="wp-caption-text">Synonyms used for &#39;Did you mean&quot; functionality (note, not a Tetra Pak picture)</p></div>
<p style="text-align: center;">
</li>
</ul>
<ul>
<li>Key matches (also referred to as sponsored links, best bets or editor’s pick) are used to manually force the first hit in the search result list to refer to a specific page or document. By following up on search statistics and knowing what sort of information that is frequently most asked for, it is easy to adjust the search result list. However, this take  time and effort to follow up.</li>
</ul>
<p>Tetra Pak is not alone when it comes to adjusting their intranets to true end-user needs. During the spring there will be a number of conferences where our customers will be sharing experiences from their initiatives. Among others <a title="Ability partner conference" href="http://www.abilitypartner.se/intranatdagarna.aspx" target="_blank">Ability Partner</a>, and the recently completed <a title="Intrateam conference" href="http://www.intrateam.se" target="_blank">IntraTeam</a>.</p>
<p>Apart from this, our own breakfast seminaries is a, as always, announced on <a title="Findwise Homepage" href="http://www.findwise.se" target="_blank">our homepage</a> and on <a title="Findwise on Twitter" href="http://twitter.com/Findwise" target="_blank">twitter</a>.<br />
Looking forward to seeing you!</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/search-and-content-quality-ways-of-improving-your-intranet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to create better search &#8211; VGR leads the way</title>
		<link>http://findabilityblog.se/how-to-create-better-search-vgr-leads-the-way/</link>
		<comments>http://findabilityblog.se/how-to-create-better-search-vgr-leads-the-way/#comments</comments>
		<pubDate>Mon, 11 Jan 2010 22:26:13 +0000</pubDate>
		<dc:creator>Caroline Abrahamsson</dc:creator>
				<category><![CDATA[Future development]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Internet search]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1328</guid>
		<description><![CDATA[I realise we are a bit late. Fredrik Wackå, a senior IT-strategist, has already written an excellent article on his blog (in Swedish). He has, among other things, been interviewing Kristian Norling (at Twitter), who has been working with portal strategies and search for many years at Västra Götalands regionen. Although, for all our non-Swedish speaking [...]]]></description>
			<content:encoded><![CDATA[<p>I realise we are a bit late. Fredrik Wackå, a senior IT-strategist, has already written an excellent article on <a title="Fredrik Wackås blogg" href="http://www.wpr.se/2010/01/snabbhet-grunden-metadata-forfiningen-nar-vg-regionen-skapade-sokmotor/" target="_blank">his blog</a> (in Swedish). He has, among other things, been interviewing <a title="Kristian Norling" href="http://se.linkedin.com/in/kristiannorling" target="_blank">Kristian Norling</a> (at <a title="Kristian Norling at Twitter" href="https://twitter.com/kristiannorling" target="_blank">Twitter</a>), who has been working with portal strategies and search for many years at Västra Götalands regionen.<br />
Although, for all our non-Swedish speaking guests here is a short summary:</p>
<p>Findwise has during the last few months been working on a new search solution for Västra Götalands regionen.  The two main goals have been to deliver a search experience that seems both fast and accurate.<br />
The result?<br />
Today making a search at VGR takes about 0,1-0,2 seconds, faster than a Google search on the web.</p>
<p>Furthermore, there was a need for context. Large amount of information requires ways to filter and sort – otherwise the users will drown in the result list.<br />
By giving the end-users the ability to sort the search result the users can look for general information within an area as well as quickly narrow down to a specific piece (for example by two clicks be able to see only the PDF-files created in 2009). The filters (and thereby metadata standard) includes:</p>
<p>• Information type<br />
• Where the document resides<br />
• Where it belongs in the organization<br />
• What source it has<br />
• When it was last changed<br />
• Who has written it<br />
• What format it resides in<br />
• Keywords that has been created</p>
<div id="attachment_1329" class="wp-caption alignleft" style="width: 310px"><a href="http://None"><img class="size-medium wp-image-1329" title="Västra Götalands regionen" src="http://www.findwise.se/wp/wp-content/vgr-300x192.jpg" alt="VGR" width="300" height="192" /></a><p class="wp-caption-text">VGR</p></div>
<p>The search solution also includes a metadata service. As so many others VGR has been struggling with getting the metadata in place.<br />
Apart from the metadata supported by the system (where <a title="Dublin core" href="http://www.dublincore.org/" target="_blank">Dublin Core</a> is being used) the metadata service is doing two things:<br />
• Analyses the content in the text, compares it to taxonomy and gives the writer suggestions of keywords that he/she can use<br />
• Gives the writer the ability to add additional keywords</p>
<p>Apart from this the end-users will be able to add etiquettes (tags). These will be compared with two lists. If the tags appears in the “white list” it will be published right away, if they are in the “blacklist” they will be deleted. Anything inbetween are controlled before they are published.</p>
<p>To conclude: a lot of effort has been put into creating a good search experience and VGR continues to deliver functionality and solutions that are light-years ahead of many others. The combination of supporting systems and using the &#8220;collected intelligence&#8221; of the writers and end-users will make it even better over time.<br />
Search is about both supporting systems, content and people.</p>
<p>Read more in <a title="Fredrik Wackås blogg" href="http://www.wpr.se/2010/01/snabbhet-grunden-metadata-forfiningen-nar-vg-regionen-skapade-sokmotor/" target="_blank">Fredrik Wackås blog</a></p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/how-to-create-better-search-vgr-leads-the-way/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>To Crawl or not to Crawl</title>
		<link>http://findabilityblog.se/to-crawl-or-not-to-crawl/</link>
		<comments>http://findabilityblog.se/to-crawl-or-not-to-crawl/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 08:37:03 +0000</pubDate>
		<dc:creator>Tobias Berg</dc:creator>
				<category><![CDATA[Connector]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1248</guid>
		<description><![CDATA[Having an Enterprise Search Engine, there are basically two ways of getting content into the index; using a web crawler or a connector. Both methods have their advantages and disadvantages. In this post I&#8217;ll try to poinpoint the differences with the two methods. Web crawler Most systems of today have a web-interface. Let it be [...]]]></description>
			<content:encoded><![CDATA[<p>Having an Enterprise Search Engine, there are basically two ways of getting content into the index; using a web crawler or a connector. Both methods have their advantages and disadvantages. In this post I&#8217;ll try to poinpoint the differences with the two methods.</p>
<p><strong>Web crawler</strong></p>
<p>Most systems of today have a web-interface. Let it be your time reporting system, intranet, document management, you&#8217;ll probably access those with your web browser. Because of this, it&#8217;s very easy to use a web crawler to index this content as well.</p>
<p>The web crawler index the pages by starting at one page. From there, it follows all outbound links and index those. From those pages, it follows all links, and so on. This process continues until all links at a web site has been followed and the pages been indexed. The crawler thus uses the same technique as a human, visit a page and clicking the links.</p>
<p>Most Enterprise Search Engines are bundled with a web crawler. Thus, it&#8217;s usually very easy to get started. Just enter a start page and within minutes you&#8217;ll have searchable content in your index. No extra installation or license fee are required. For some sources, this may also be the only option, i.e if you&#8217;re indexing external sources that your company has no control of.</p>
<p>The main disadvantage though, is that web pages are designed for humans, not crawlers. This means that there are a lot of extra information for presentation purposes, such as navigation menus, sticky information messages, headers and footers and so on. All of this makes it a more pleasant experience for the user, and also making it easier to navigate on the page. The crawler on the other hand has no use of this information when retrieving pages. It&#8217;s actually reducing information quality in the index. For example, a navigation menu will be displayed on every page, thus the crawler will index the navigation content for all pages. So if you have a navigation item called &#8220;Customers&#8221; and a user searches for customers, he/she will get a hit in ALL pages in the index.</p>
<p>There are ways to get around this, but it requires either altering of the produced HTML or adjustments in the search engine. Also, if the design of the site change, you have to do these adjustments again.</p>
<p><strong>Connector</strong></p>
<p>Even though the majority of systems has a web-interface, the content is stored in a data source of some format. It might be a database, structured file system, etc. By using a connector, you connect either to the underlying data source or to the system directly by its programming API.</p>
<p>Using a connector, the search engine does not get any presentation information but only the pure content, making the information quality in the index better. The connector can also retrieve all metadata associated with the information which further increases the quality. Often, you&#8217;ll also have more fine-grained control over what will be indexed with a connector than a web crawler.</p>
<p>Though, using a connector requires more configuration. It might also cost some extra money to buy one for your system, and require additional hardware. Though, once set up, it&#8217;s most likely to produce more relevant results compared to a web crawler.</p>
<p>Bottom line is it&#8217;s a consideration between quality and cost, as most decisions in life <img src='http://findabilityblog.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/to-crawl-or-not-to-crawl/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Do you know something I don’t? The art of benchmarking</title>
		<link>http://findabilityblog.se/do-you-know-something-i-don%e2%80%99t-the-art-of-benchmarking/</link>
		<comments>http://findabilityblog.se/do-you-know-something-i-don%e2%80%99t-the-art-of-benchmarking/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 17:22:58 +0000</pubDate>
		<dc:creator>Caroline Abrahamsson</dc:creator>
				<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Internet search]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[Market trends]]></category>
		<category><![CDATA[Relevancy]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1253</guid>
		<description><![CDATA[During the autumn we have been trying to keep our customers and others up to date with the search world by hosting breakfast seminars. By sharing experiences and discussing with others the participants have taken giant leaps in understanding what search can deliver in true value. The same goes for sharing experiences between companies, where [...]]]></description>
			<content:encoded><![CDATA[<p>During the autumn we have been trying to keep our customers and others up to date with the search world by hosting breakfast seminars.<br />
By sharing experiences and discussing with others the participants have taken giant leaps in understanding what search can deliver in true value.<br />
The same goes for sharing experiences between companies, where you often find yourself struggling with the same problems, regardless of business or company size.</p>
<p>We have been discussing how Enterprise search can help intranets, extranets, external sites and support centers to capitalize on their knowledge.<br />
Some of the things that have been discussed:</p>
<p><strong>…Business Cases:</strong><br />
How can search help companies save 100 million SEK/year?<br />
How do you count return on investment (ROI) for search?</p>
<p><strong>…Search functionality:</strong><br />
How and why should you work with:<br />
<strong>Key Matches</strong> to promote certain content (similar to Google’s sponsored links on the web)<br />
<strong>Synonyms</strong> (to make sure that the end-users language corresponds to the corporate without having to change the information)<br />
<strong>Query completion and suggestion</strong> to give the user an overview of what other people have been searching for when they start to type (similar to <a title="Apples web site search" href="http://www.apple.com/" target="_blank">Apples web site search</a>).</p>
<p><strong>…End-user experience</strong><br />
How can different interfaces serve different information needs and user-groups?<br />
How does your user interface serve your end-users?</p>
<p><strong>…Information Quality</strong><br />
Do taxonomies and folksonomies help us find information faster?<br />
Can search be used to improve the quality of your content?</p>
<p>During the spring we will continue to hold seminars, keeping you up-to date. If you’re not on our mailing list, please send us <a href="info@findwise.se">an e-mail</a> and we’ll make sure you will get an invitation.</p>
<p>During Wednesday and Thursday this week we will be attending the <a title="Ability konferens" href="http://www.abilitypartner.se/intranat-2_0-och-verksamhetsportaler.aspx" target="_blank">Ability conference</a> to discuss search. Hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/do-you-know-something-i-don%e2%80%99t-the-art-of-benchmarking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Enterprise Search 2.0?</title>
		<link>http://findabilityblog.se/enterprise-search-20/</link>
		<comments>http://findabilityblog.se/enterprise-search-20/#comments</comments>
		<pubDate>Mon, 30 Nov 2009 16:15:01 +0000</pubDate>
		<dc:creator>Christopher Wallstrom</dc:creator>
				<category><![CDATA[Enterprise 2.0]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Information management]]></category>
		<category><![CDATA[Information quality]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1250</guid>
		<description><![CDATA[While visiting Enterprise Search Summit in San Jose I realized that enabling Enterprise 2.0 within enterprise search is the hottest trend at the moment. Andrew McAfee who coined the term Enterprise 2.0 and has released a book on the subject, spoke about how to use altruism to develop the enterprise. People are wired to help [...]]]></description>
			<content:encoded><![CDATA[<p>While visiting Enterprise Search Summit in San Jose I realized that enabling Enterprise 2.0 within enterprise search is the hottest trend at the moment.</p>
<p><a href="http://andrewmcafee.org">Andrew McAfee</a> who coined the term Enterprise 2.0 and has released a book on the subject, spoke about how to use altruism to develop the enterprise. People are wired to help and if we stop obsessing about the risks and lower the bars for how people can help each other it is possible to make this work within a corporate environment.</p>
<p>He also spoke about how process control and how much workflow control. How much do we really need? Make it easy to correct mistake instead of making it hard to make them. With regards to innovation he pointed out that we need to question credentialism and build communities that people want to join. To leverage the intelligence aspects within the enterprise we should explore and experiment with collective intelligence such as prediction markets and open peer review processes. All in all make it easy for people to interconnect.</p>
<p>Very high improvement in access to knowledge, internal experts, satisfaction, increased innovation and customer satisfaction.</p>
<p>I also recommend to read <a href="http://www.pwc.com/en_US/us/technology-forecast/assets/pwc-tech-forecast-summer-2008.pdf">Price Waterhouse Coopers Technology Forecast Summer 2008</a> to get a good overview of the available tools and technologies.</p>
<p>So how does this impact enterprise search? Search can be made to be the facilitator for Enterprise 2.0. Of course it is possible to index and make all blogs, wikipedias, tweets (yammer), online communities and social networks searchable, but that is only one way to make it this new environment more findable. If someone tweets or blogs about information we should use that information to impact on the search results and ranking. We could also track user behavior on a site to make certain information more visible with regards to implicitly expressed interests.</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/enterprise-search-20/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Findwise releases Open Pipeline plugins</title>
		<link>http://findabilityblog.se/findwise-releases-open-pipeline-plugins/</link>
		<comments>http://findabilityblog.se/findwise-releases-open-pipeline-plugins/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 06:54:57 +0000</pubDate>
		<dc:creator>Karl Jansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Future development]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1141</guid>
		<description><![CDATA[Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the Open Pipeline Plugins page and the ones Findwise have created can be downloaded on our Findwise Open Pipeline Plugins page. [...]]]></description>
			<content:encoded><![CDATA[<p>Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the <a href="http://www.openpipeline.org/plugins/">Open Pipeline Plugins page</a> and the ones Findwise have created can be downloaded on our <a href="&lt;br &gt;&lt;/a&gt; http://www.findwise.se/findwise-open-pipeline">Findwise Open Pipeline Plugins page.</a></p>
<p><span id="more-1141"></span></p>
<p>OpenPipeline is an open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.</p>
<p>Findwise have been using this framework in a number of customer projects with great success. It ties particularly good together with Apache Solr, not only because it is open source but most importantly because it fills a hole in functionality that Solr lacks &#8211; an easy to use framework for developing document processors and connectors. However we are not using this for Solr only, a number of plugins for the Google Search Appliance have also been made and we have started investigating how Open Pipeline can be integrated with the IBM Omnifind search engine as well.</p>
<p>The best thing with this framework is that it is very flexible and customizable but still easy to use AND, maybe most importantly for me as a developer, easy to work with and develop against. It has a simple yet powerful enough API to handle all that you need. And because it is an open source framework any shortcomings and limitations that we find along the way can be investigated in detail and a better solution can be proposed to the Open Pipeline team for inclusion in future releases.</p>
<p>We have in fact already contributed to the development of the project in a great deal by using it, testing it and by reporting bugs and suggested improvements on their forums. And the response from the team has been very good &#8211; some of our suggested improvements have already been included and some are on the way in the new 0.8 version. We are also in the process of further deepening the collaboration by signing a contributors agreement so that we eventually can be able to contribute with code as well.</p>
<p>So how do our customers benefit from this?</p>
<p>First it makes us develop and deliver search and index solutions more quickly and of better quality to our customers. This is because more developers can work with the same framework as a base and the overall code base will be used more, tested more and is thus of better quality. We have also the possibility to reuse good and well tested components so that several customers together can share the costs of development and thus get a better service/product for less money which is always a good thing of course!</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/findwise-releases-open-pipeline-plugins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Six Simple Steps to Superior Search</title>
		<link>http://findabilityblog.se/six-simple-steps-to-superior-search/</link>
		<comments>http://findabilityblog.se/six-simple-steps-to-superior-search/#comments</comments>
		<pubDate>Thu, 08 Jan 2009 13:11:10 +0000</pubDate>
		<dc:creator>Mickel Gronroos</dc:creator>
				<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Information seeking behaviour]]></category>
		<category><![CDATA[Knowledge management]]></category>
		<category><![CDATA[data quality]]></category>
		<category><![CDATA[frequent queries]]></category>
		<category><![CDATA[jargon]]></category>
		<category><![CDATA[maintenance]]></category>
		<category><![CDATA[spelling]]></category>
		<category><![CDATA[synonyms]]></category>
		<category><![CDATA[user expectations]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=496</guid>
		<description><![CDATA[Do you have your search application up and running but it still doesn’t quite seem to do the trick? Here are six simple steps to boost the search experience.]]></description>
			<content:encoded><![CDATA[<p><strong style="font-size: 1.2em;">Do you have your search application up and running but it still doesn’t quite seem to do the trick? Here are six simple steps to boost the search experience.</strong></p>
<h2 style="font-size: 1.4em; font-weight: bold;">Avoid the Garbage in-Garbage out Syndrome</h2>
<p><strong>Fact 1</strong>: <strong>A search application is only as good as the content it makes findable. </strong></p>
<p>If you have a news search service that only provides yesterday’s news, the search bit does not add any value to your offering.</p>
<p>If your Intranet search service provides access to a catalog of employee competencies, but this catalog does not cover all co-workers or contain updated contact details, then search is not the means it should be to help users get in touch with the right people.</p>
<p>If your search service gives access to a lot of different versions of the same document and there is no metadata available as to single out which copy is the official one, then users might end up spending unnecessary time reviewing irrelevant search results. And still you cannot rule out the risk that they end up using old or even flawed versions of documents.</p>
<p>The key learning here is that there is no plug and play when it comes to accurate and well thought out information access. Sure, you can make everything findable by default. But you will annoy your users while doing so unless you <strong>take a moment and review your data</strong>.</p>
<h2 style="font-size: 1.4em; font-weight: bolder;">Focus on Frequent Queries</h2>
<p><strong>Fact 2: Users tend to search for the same things over and over again.</strong></p>
<p>It is not unusual that 20 % of the full query volume is made up of less than 1 % of all query strings. In other words, people tend to use search for a rather fixed set of simple information access tasks over and over again. Typical tasks include finding the front page of a site or application on the Intranet, finding the lunch menu at the company canteen or finding the telephone number to the company helpdesk.</p>
<p>In other words, you will be much advised to make sure your search application works for these highly frequent (often naïve) information access tasks. An efficient way of doing so is to keep an analytic eye on the log file of your search application and take appropriate action on frequent queries that do not return any results whatsoever or return weird or unexpected results.</p>
<p>The key learning here is that you should <strong>focus on providing relevant results for frequent queries</strong>. This is the least expensive way to get boosted benefit from your search application. </p>
<h2 style="font-size: 1.4em; font-weight: bolder;">Make the Information People Often Need Searchable</h2>
<p><strong>Fact 3: Users do not know what information is available through search.</strong></p>
<p>Users often believe that a search application gives them access to information that really isn’t available through search. Say your users are frequently searching for ”lunch menu”, ”canteen” and ”today’s lunch”, what do you do if you do not have the menu available at all on your Intranet or Web site?</p>
<p>In the best of worlds, you will make frequently requested information available through search. In other words, you would add the lunch menu to your site and make it searchable. If that is not an option, you might consider informing your users that the lunch menu—or some other popular information people tend to request—is not available in the search application and provide them with a hard-coded link to the canteen contractor or some other related service as a so called “best bet” (or sponsored link as in Google web search).</p>
<p>The key learning here is to monitor what users frequently search for and <strong>make sure the search application can tackle user expectations properly</strong>.</p>
<h2 style="font-size: 1.4em; font-weight: bolder;">Adapt to the User’s Language</h2>
<p><strong>Fact 4: Users do not know your company jargon.</strong></p>
<p>People describe things using different words. Users are regularly searching for terms which are synonymous to—but not the same as—the terms used in the content being searched. Say your users are frequently looking for a ”travel expense form” on your Intranet search service, but the term used in your official company jargon  is ”travel expenses template”. In cases like this you can build a glossary of synonyms mapping those common language terms people tend to search for frequently to official company terms in order to satisfy your users’ frequent information needs better without having to deviate from company terminology. Another way of handling the problem is to provide hand-crafted best bets (or sponsored links as in Google web search) that are triggered by certain common search terms.</p>
<p>Furthermore, research suggests that Intranet searches often contain company-specific abbreviations. A study of the query log of a search installation at one of Findwise’s customers showed that abbreviations—query strings consisting of two, three or four letters—stood for as much as 18 % of all queries. In other words, it might be worthwhile for the search application to add the spelled-out form to a query for a frequently used abbreviation. Users searching for “cp” on the Intranet would for example in effect see the results of the query “cp OR collaboration portal”</p>
<p>The lesson to learn here is that you should use your query log to <strong>learn the terminology the users are using and adapt the search application accordingly</strong>, not the other way around!</p>
<h2 style="font-size: 1.4em; font-weight: bolder;">Help Users With Spelling</h2>
<p><strong>Fact 5: Users do not know how to spell.</strong></p>
<p>Users make spelling mistakes—lots of them. Research suggests that 10—25 % of all queries sent to a search engine contain spelling mistakes. So <strong>turn on spellchecking</strong> in your search platform if you haven’t already! And while you are at it, make sure your search platform can <strong>handle queries containing inflected forms</strong> (e.g. “menu”, “menus”, “menu’s”, “menus’”). There’s your quick wins to boost the search experience.</p>
<h2 style="font-size: 1.4em; font-weight: bolder;">Keep Your Search Solution Up-To-Date</h2>
<p><strong>Fact 6: Your search application requires maintenance.</strong></p>
<p>Information sources change, so should your search application. There is a fairly widespread misconception that a search application will maintain itself once you’ve got it up and running. The truth is you need to monitor and maintain your search solution as any other business-critical IT application.</p>
<p>A real-life example is a fairly large enterprise that decided to perform a total makeover of its internal communication process, shifting focus from the old Intranet, which was built on a web content management system, in favor of a more “Enterprise 2.0 approach” using a collaboration platform for active projects and daily communication and a document management system for closed projects and archived information.</p>
<p>The shift had many advantages, but it was a disaster for the Enterprise Search application that was only monitoring the old Intranet being phased out. Employees looking for information using the search tool would in other words only find outdated information.</p>
<p>The lesson to learn here is that <strong>the fairly large investment in efficient Findability requires maintenance</strong> in order for the search application to meet the requirements posed on it now and in the future.</p>
<h2 style="font-weight: bolder;">References</h2>
<p>100 Most Often Mispelled Misspelled Words in English &#8211; <a href="http://www.yourdictionary.com/library/misspelled.html">http://www.yourdictionary.com/library/misspelled.html</a></p>
<p>Definition of “sponsored link” &#8211; <a href="http://encyclopedia2.thefreedictionary.com/Sponsored+link">http://encyclopedia2.thefreedictionary.com/Sponsored+link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/six-simple-steps-to-superior-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What differentiates a good search engine from a bad one?</title>
		<link>http://findabilityblog.se/what-differentiates-a-good-search-engine-from-a-bad-one/</link>
		<comments>http://findabilityblog.se/what-differentiates-a-good-search-engine-from-a-bad-one/#comments</comments>
		<pubDate>Wed, 28 Nov 2007 10:43:07 +0000</pubDate>
		<dc:creator>Maria Johansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Internet search]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Usability]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=52</guid>
		<description><![CDATA[That was one of the questions the UIE research group asked themselves when conducting a study of on-site search. One of the things they discovered was that the choice of search engine was not as important as the implementation. Most of the big search vendors were found in both the top sites and the bottom [...]]]></description>
			<content:encoded><![CDATA[<p>That was one of the questions the <a href="http://www.uie.com">UIE</a> research group asked themselves when conducting a study of <a href="http://www.uie.com/brainsparks/2007/11/26/usability-tools-podcast-on-site-search/">on-site search</a>. One of the things they discovered was that the choice of search engine was not as important as the implementation. Most of the big search vendors were found in both the top sites and the bottom sites.</p>
<p>So even though the choice of vendor influences what functionality you can achieve and the control you have over your content there are other things that matter, maybe even more. Because the best search engine in the world will not work for you unless you configure it properly.</p>
<p><span id="more-52"></span>According to Jared Spool there are four kinds of search results:</p>
<ul>
<li> ‘Match relevant results’ &#8211;  returns the exact thing you were looking for.</li>
<li> ‘Zero results’ – no relevant results found.</li>
<li> ‘Related results’ &#8211;  i.e. search for a sweater and also get results for a cardigan. (If you know that a cardigan is a type of sweater you are satisfied. Otherwise you just get frustrated and wonder why you got a result for a cardigan when you searched for a sweater).</li>
<li> ‘Wacko results – the results seem to have nothing in common with your query.</li>
</ul>
<p>So what did the best sites do according to Jared Spool and his colleagues?<br />
They returned match relevant results, and they did not return 0 results for searches.</p>
<p>So how do you achieve that then? We have previously written about the importance of <a href="http://www.findwise.se/?cat=19#jump">content refinement</a> and <a href="http://www.findwise.se/?p=50#jump">information quality</a>. But what do you do when trying to achieve good search results with your search engine? And what if you do not have the time or knowledge to do a proper content tuning process?</p>
<p>Well, the search logs are a good way to start. Start looking at them to identify the 100 most common searches and the results they return. Are they match relevant results? It is also a good idea to look at the searches that return zero results and see if there is anything that can be done to improve those searches as well.</p>
<p>Jared Spool and his colleagues at UIE mostly talk about site search for e-commerce sites. For e-commerce sites bad search results mean loss of revenue while good search results hopefully give an increase in revenue (if other things such as check out do not fail). Working with intranet search the implications are a bit different.</p>
<p>With intranet search solutions the searches can be more complex when information not items, is what users are searching for. It might not be as easy to just add synonyms or group similar items to achieve better search results. I believe that in such a complex information universe, proper content tuning is the key to success. But looking at the search logs is a good way for you to start. And me and my colleagues here at Findwise can always help you how to get the most out of your search solution.</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/what-differentiates-a-good-search-engine-from-a-bad-one/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search as a tool for information quality assurance</title>
		<link>http://findabilityblog.se/information-quality-assurance-through-search/</link>
		<comments>http://findabilityblog.se/information-quality-assurance-through-search/#comments</comments>
		<pubDate>Thu, 25 Oct 2007 15:22:42 +0000</pubDate>
		<dc:creator>Daniel Johansson</dc:creator>
				<category><![CDATA[Company]]></category>
		<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Information quality]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=50</guid>
		<description><![CDATA[Feedback from stakeholders in ongoing projects has highlighted the real need for a supporting tool to assist in the analysis of large amounts of content. This would introduce a phase where super users and information owners have the possibility to go through a quality assurance process across the information silos, before releasing information directly to [...]]]></description>
			<content:encoded><![CDATA[<p>Feedback from stakeholders in ongoing projects has highlighted the real need for a supporting tool to assist in the analysis of large amounts of content.<br />
This would introduce a phase where super users and information owners have the possibility to go through a quality assurance process across the information silos, before releasing information directly to end users.<br />
<span id="more-50"></span><br />
Using standard features contained within enterprise search platforms, great value can be delivered as well as time saved in extracting essential information. Furthermore, you have the possibility to detect key information objects that are hidden by a lack of a holistic view.</p>
<p>In this way adapted applications can easily be built on top to support process specific analysing demands e.g. through entity extraction (automatic detection and extraction of names, places, dates etc) and cross-referencing unstructured and structured sources. The time is here to gain control of your enterprise information and turn it into knowledge.</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/information-quality-assurance-through-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

