Archive for the ‘Content refinement’ Category

Max Charas

Solr Processing Pipeline

april 19 - 2010 | Max Charas

Hi again Internet,

For once I have had time to do some thinking. Why is there no powerful data processing layer between the Lucene Connector Framework and Solr? I´ve been looking into the Apache Commons Processing Pipeline. It seems like a likely candidate to do some cool stuff.  Look at the diagram below.

A schematic drawing of a Solr Pipeline concept. (Click to enlarge)

What I´m thinking of is to make a transparent Solr pipeline that speaks the Solr REST protocol on each end. This means that you would be able to use SolrJ or any other API to communicate with the Pipeline.

Has anyone attempted this before?  If you’re interested in chatting about the pipeline drop me a mail or just grab me at Eurocon in Prague this year.

Karl Jansson

Findwise releases Open Pipeline plugins

oktober 9 - 2009 | Karl Jansson

Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the Open Pipeline Plugins page and the ones Findwise have created can be downloaded on our Findwise Open Pipeline Plugins page.

(Läs mer…)

Maria Johansson

What differentiates a good search engine from a bad one?

november 28 - 2007 | Maria Johansson

That was one of the questions the UIE research group asked themselves when conducting a study of on-site search. One of the things they discovered was that the choice of search engine was not as important as the implementation. Most of the big search vendors were found in both the top sites and the bottom sites.

So even though the choice of vendor influences what functionality you can achieve and the control you have over your content there are other things that matter, maybe even more. Because the best search engine in the world will not work for you unless you configure it properly.

(Läs mer…)

Daniel Johansson

Search as a tool for information quality assurance

oktober 25 - 2007 | Daniel Johansson

Feedback from stakeholders in ongoing projects has highlighted the real need for a supporting tool to assist in the analysis of large amounts of content.
This would introduce a phase where super users and information owners have the possibility to go through a quality assurance process across the information silos, before releasing information directly to end users.
(Läs mer…)

Daniel Johansson

Search-driven process to increase content quality

juli 9 - 2007 | Daniel Johansson

Experience from recent and ongoing search and retrieval projects have shown that enterprises have got a better and deeper insight in their content when deploying a new search platform. Not only in unstructured content repositories, but also in structured sources. As information is indexed and is visualized in a more user friendly way it doesn’t take much time before the people responsible find content issues that are brought out in the light. Content that e.g. is misplaced, tagged wrongly, documents with poorly defined security information etc. Issues that earlier were hidden due to lack of a holistic view of content. (Läs mer…)