<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Findability blog &#187; Content refinement</title>
	<atom:link href="http://findabilityblog.se/category/content-refinement/feed/" rel="self" type="application/rss+xml" />
	<link>http://findabilityblog.se</link>
	<description>the search and findability blog</description>
	<lastBuildDate>Fri, 03 Feb 2012 11:49:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Solr Processing Pipeline</title>
		<link>http://findabilityblog.se/solr-processing-pipeline/</link>
		<comments>http://findabilityblog.se/solr-processing-pipeline/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 13:07:25 +0000</pubDate>
		<dc:creator>Max Charas</dc:creator>
				<category><![CDATA[Connector]]></category>
		<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=1952</guid>
		<description><![CDATA[Hi again Internet, For once I have had time to do some thinking. Why is there no powerful data processing layer between the Lucene Connector Framework and Solr? I´ve been looking into the Apache Commons Processing Pipeline. It seems like a likely candidate to do some cool stuff.  Look at the diagram below. What I´m thinking [...]]]></description>
			<content:encoded><![CDATA[<p>Hi again Internet,</p>
<p>For once I have had time to do some thinking. Why is there no powerful data processing layer between the <a title="the Lucene Connector Framework" href="http://incubator.apache.org/connectors/" target="_blank">Lucene Connector Framework</a> and Solr? I´ve been looking into the <a title=" the Apache Commons Processing Pipeline" href="http://commons.apache.org/sandbox/pipeline/" target="_blank">Apache Commons Processing Pipeline</a>. It seems like a likely candidate to do some cool stuff.  Look at the diagram below.</p>
<div id="attachment_1953" class="wp-caption aligncenter" style="width: 310px"><a href="http://media.findabilityblog.se/2010/04/Drawing11.jpg"><img class="size-medium wp-image-1953  " src="http://media.findabilityblog.se/2010/04/Drawing1-300x148.jpg" alt="" width="300" height="148" /></a><p class="wp-caption-text">A schematic drawing of a Solr Pipeline concept. (Click to enlarge)</p></div>
<p>What I´m thinking of is to make a transparent Solr pipeline that speaks the Solr REST protocol on each end. This means that you would be able to use SolrJ or any other API to communicate with the Pipeline.</p>
<p>Has anyone attempted this before?  If you’re interested in chatting about the pipeline drop me a mail or just grab me at Eurocon in Prague this year.</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/solr-processing-pipeline/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Findwise releases Open Pipeline plugins</title>
		<link>http://findabilityblog.se/findwise-releases-open-pipeline-plugins/</link>
		<comments>http://findabilityblog.se/findwise-releases-open-pipeline-plugins/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 06:54:57 +0000</pubDate>
		<dc:creator>Karl Jansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Future development]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1141</guid>
		<description><![CDATA[Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the Open Pipeline Plugins page and the ones Findwise have created can be downloaded on our Findwise Open Pipeline Plugins page. [...]]]></description>
			<content:encoded><![CDATA[<p>Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the <a href="http://www.openpipeline.org/plugins/">Open Pipeline Plugins page</a> and the ones Findwise have created can be downloaded on our <a href="&lt;br &gt;&lt;/a&gt; http://www.findwise.se/findwise-open-pipeline">Findwise Open Pipeline Plugins page.</a></p>
<p><span id="more-1141"></span></p>
<p>OpenPipeline is an open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.</p>
<p>Findwise have been using this framework in a number of customer projects with great success. It ties particularly good together with Apache Solr, not only because it is open source but most importantly because it fills a hole in functionality that Solr lacks &#8211; an easy to use framework for developing document processors and connectors. However we are not using this for Solr only, a number of plugins for the Google Search Appliance have also been made and we have started investigating how Open Pipeline can be integrated with the IBM Omnifind search engine as well.</p>
<p>The best thing with this framework is that it is very flexible and customizable but still easy to use AND, maybe most importantly for me as a developer, easy to work with and develop against. It has a simple yet powerful enough API to handle all that you need. And because it is an open source framework any shortcomings and limitations that we find along the way can be investigated in detail and a better solution can be proposed to the Open Pipeline team for inclusion in future releases.</p>
<p>We have in fact already contributed to the development of the project in a great deal by using it, testing it and by reporting bugs and suggested improvements on their forums. And the response from the team has been very good &#8211; some of our suggested improvements have already been included and some are on the way in the new 0.8 version. We are also in the process of further deepening the collaboration by signing a contributors agreement so that we eventually can be able to contribute with code as well.</p>
<p>So how do our customers benefit from this?</p>
<p>First it makes us develop and deliver search and index solutions more quickly and of better quality to our customers. This is because more developers can work with the same framework as a base and the overall code base will be used more, tested more and is thus of better quality. We have also the possibility to reuse good and well tested components so that several customers together can share the costs of development and thus get a better service/product for less money which is always a good thing of course!</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/findwise-releases-open-pipeline-plugins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What differentiates a good search engine from a bad one?</title>
		<link>http://findabilityblog.se/what-differentiates-a-good-search-engine-from-a-bad-one/</link>
		<comments>http://findabilityblog.se/what-differentiates-a-good-search-engine-from-a-bad-one/#comments</comments>
		<pubDate>Wed, 28 Nov 2007 10:43:07 +0000</pubDate>
		<dc:creator>Maria Johansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Internet search]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Usability]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=52</guid>
		<description><![CDATA[That was one of the questions the UIE research group asked themselves when conducting a study of on-site search. One of the things they discovered was that the choice of search engine was not as important as the implementation. Most of the big search vendors were found in both the top sites and the bottom [...]]]></description>
			<content:encoded><![CDATA[<p>That was one of the questions the <a href="http://www.uie.com">UIE</a> research group asked themselves when conducting a study of <a href="http://www.uie.com/brainsparks/2007/11/26/usability-tools-podcast-on-site-search/">on-site search</a>. One of the things they discovered was that the choice of search engine was not as important as the implementation. Most of the big search vendors were found in both the top sites and the bottom sites.</p>
<p>So even though the choice of vendor influences what functionality you can achieve and the control you have over your content there are other things that matter, maybe even more. Because the best search engine in the world will not work for you unless you configure it properly.</p>
<p><span id="more-52"></span>According to Jared Spool there are four kinds of search results:</p>
<ul>
<li> ‘Match relevant results’ &#8211;  returns the exact thing you were looking for.</li>
<li> ‘Zero results’ – no relevant results found.</li>
<li> ‘Related results’ &#8211;  i.e. search for a sweater and also get results for a cardigan. (If you know that a cardigan is a type of sweater you are satisfied. Otherwise you just get frustrated and wonder why you got a result for a cardigan when you searched for a sweater).</li>
<li> ‘Wacko results – the results seem to have nothing in common with your query.</li>
</ul>
<p>So what did the best sites do according to Jared Spool and his colleagues?<br />
They returned match relevant results, and they did not return 0 results for searches.</p>
<p>So how do you achieve that then? We have previously written about the importance of <a href="http://www.findwise.se/?cat=19#jump">content refinement</a> and <a href="http://www.findwise.se/?p=50#jump">information quality</a>. But what do you do when trying to achieve good search results with your search engine? And what if you do not have the time or knowledge to do a proper content tuning process?</p>
<p>Well, the search logs are a good way to start. Start looking at them to identify the 100 most common searches and the results they return. Are they match relevant results? It is also a good idea to look at the searches that return zero results and see if there is anything that can be done to improve those searches as well.</p>
<p>Jared Spool and his colleagues at UIE mostly talk about site search for e-commerce sites. For e-commerce sites bad search results mean loss of revenue while good search results hopefully give an increase in revenue (if other things such as check out do not fail). Working with intranet search the implications are a bit different.</p>
<p>With intranet search solutions the searches can be more complex when information not items, is what users are searching for. It might not be as easy to just add synonyms or group similar items to achieve better search results. I believe that in such a complex information universe, proper content tuning is the key to success. But looking at the search logs is a good way for you to start. And me and my colleagues here at Findwise can always help you how to get the most out of your search solution.</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/what-differentiates-a-good-search-engine-from-a-bad-one/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search as a tool for information quality assurance</title>
		<link>http://findabilityblog.se/information-quality-assurance-through-search/</link>
		<comments>http://findabilityblog.se/information-quality-assurance-through-search/#comments</comments>
		<pubDate>Thu, 25 Oct 2007 15:22:42 +0000</pubDate>
		<dc:creator>Daniel Johansson</dc:creator>
				<category><![CDATA[Company]]></category>
		<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Information quality]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=50</guid>
		<description><![CDATA[Feedback from stakeholders in ongoing projects has highlighted the real need for a supporting tool to assist in the analysis of large amounts of content. This would introduce a phase where super users and information owners have the possibility to go through a quality assurance process across the information silos, before releasing information directly to [...]]]></description>
			<content:encoded><![CDATA[<p>Feedback from stakeholders in ongoing projects has highlighted the real need for a supporting tool to assist in the analysis of large amounts of content.<br />
This would introduce a phase where super users and information owners have the possibility to go through a quality assurance process across the information silos, before releasing information directly to end users.<br />
<span id="more-50"></span><br />
Using standard features contained within enterprise search platforms, great value can be delivered as well as time saved in extracting essential information. Furthermore, you have the possibility to detect key information objects that are hidden by a lack of a holistic view.</p>
<p>In this way adapted applications can easily be built on top to support process specific analysing demands e.g. through entity extraction (automatic detection and extraction of names, places, dates etc) and cross-referencing unstructured and structured sources. The time is here to gain control of your enterprise information and turn it into knowledge.</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/information-quality-assurance-through-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search-driven process to increase content quality</title>
		<link>http://findabilityblog.se/search-driven-process-to-increase-content-quality/</link>
		<comments>http://findabilityblog.se/search-driven-process-to-increase-content-quality/#comments</comments>
		<pubDate>Mon, 09 Jul 2007 07:44:05 +0000</pubDate>
		<dc:creator>Daniel Johansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Future development]]></category>
		<category><![CDATA[Intranet]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=22</guid>
		<description><![CDATA[Experience from recent and ongoing search and retrieval projects have shown that enterprises have got a better and deeper insight in their content when deploying a new search platform. Not only in unstructured content repositories, but also in structured sources. As information is indexed and is visualized in a more user friendly way it doesn’t [...]]]></description>
			<content:encoded><![CDATA[<p>Experience from recent and ongoing search and retrieval projects have shown that enterprises have got a better and deeper insight in their content when deploying a new search platform. Not only in unstructured content repositories, but also in structured sources. As information is indexed and is visualized in a more user friendly way it doesn’t take much time before the people responsible find content issues that are brought out in the light. Content that e.g. is misplaced, tagged wrongly, documents with poorly defined security information etc. Issues that earlier were hidden due to lack of a holistic view of content. <span id="more-22"></span></p>
<p>It has been said that before enterprises should think of deploying an enterprise search solution one is recommended to get a completely clear picture of all it’s content; but maybe one should reformulate this and also think of an enterprise search solution as a supporting tool in the process when improving the content as well.<br />
Taking it a step further would be to allow write-backs from the search engine to content sources to enrich and improve quality and completeness of stored information.<br />
Tune search quality and content quality at the same time!</p>
]]></content:encoded>
			<wfw:commentRss>http://findabilityblog.se/search-driven-process-to-increase-content-quality/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

