Archive for the ‘Enterprise Search’ Category

Maria Johansson

Findability and the Google experience

september 2 - 2010 | Maria Johansson

In almost every project we work on, users ask us why finding information on their intranet is not as easy as finding information on Google. One of my team members told me he was once asked:

”If Google can search the whole internet in less than a second, how come you can’t search our internal information which is only a few million documents?”

I don’t remember his answer but I do remember what he said he would have wanted to answer:

”Google doesn’t have to handle rigorous security. We do. Google has got millions of servers all around the world. We have got one.”

The truth is, you get the search experience you deserve. Google delivers an excellent user experience to millions of users because they have thousands of employees working hard to achieve this. So do the other players in the search market. All the search engine are continuously working on improving the user experience for the users. It is possible to achieve good things without a huge budget. But I can guarantee you that just installing any of the search platforms on the market and then doing nothing will not result in a good experience for your users. So the question is; what is your company doing to achieve a good search experience?

Jeff Carr from Earley & Associates recently published a 2 part article about this desire to duplicate the Google experience, and why it won’t succeed. I recommend that you read it. Hopefully it will not only help you meet the questions and expectations from your users; it will also help you in how you can improve the search experience for them.

Enterprise Search and why we can’t just get Google.

Caroline Abrahamsson

Search and Business Intelligence?

juli 9 - 2010 | Caroline Abrahamsson

BI and search is a never ending story.
A number of years ago Gartner coined “Biggle” – which was an expression for BI meeting Google. Back then a number of BI vendors, among them Cognos and SAS, claimed that they were working with search strategically (e.g. became Google One-box partners). Search vendors, like FAST, Autonomy and IBM also started to cooperate with companies such as Cognos. ”The Adaptive Warehouse” and “BI for the masses” soon became buzzwords that spread in the industry.

The skeptics claimed that Enterprise Search never would be good at numbers and that BI never with text.
Since then a lot a lot has happened and today the major vendors within Enterprise Search all claim to have BI solutions that can be fully integrated (and the other way around – BI solutions that can integrate with Enterprise search).

The aim is the same now as back then:  to provide unified access to both structured (database) and unstructured (content) corporate information. As FAST wrote in a number of ‘Special Focus’: “Users should have access to a wide variety of data from just one, simple search interface, covering reports, analysis, scorecards, dashboards and other information from the BI side, along with documents, e-mail and other forms of unstructured information”.

And of course, this seems appealing to customers. But does access to all information really make us more likely to take the right decisions in terms of Business Intelligence. Gartner is in doubt.
Nigel Rayner, research vice president at Gartner Inc, says that ” The problem isn’t that they (users) don’t have access to information or tools; they already have too much information, and that’s just in the structured BI world. Now you want to couple it with unstructured data? That’s a whole load of garbage coming from the outside world”. But he also states that search can be used as one part of BI: “Part of the problem with traditional BI is that it’s very focused on structured information. Search can help with getting access to the vast amount of structured information you have”

Looking at the discussions going on in forums, in blogs and in the research domain most people seem to agree with Gartner’s view: search and BI makes a powerful combination, but the integrations needs to be made with a number of things in mind:

Data quality
As mentioned before, if one wants to make unstructured and structured information available as a complement to BI it needs to be of a good quality. Knowing that the information found is the latest copy and written by someone with knowledge of the area is essential. Bad information quality is a threat to an Enterprise Search solution, to a combined BI- and search solution it can be devastating. Having Content Lifecycles in place (reviewing, deleting, archiving etc) is a fundamental prerequisite.

Data analysis
Business Intelligence in traditionally built on pre-thought ideas of what data the users need, whereas search gives access to all information in an ad-hoc manner.
To combine these two requires a structured way of analyzing the data. If the unstructured information is taken out of its context there is a risk that decisions are built on assumptions and not fact.

BI for the masses?
The old buzzwords are still alive, but the question mark remains. If one wants to give everyone access to BI-data it has to be clear what the purpose is. Giving people a context , for example combining the latest sales statistics with searches for information about the ongoing marketing activities serves a purpose and improves findability. Just making numbers available does not.

BI and search dashboard

BI and search in a combined dashboard - vision or reality within a near future?

So, to conclude: Gartner’s vision of “Biggle” is not yet fulfilled. There are a number of interesting opportunities for the business to create Findability solutions that combines BI and search, but the strategies for adopting it needs to be developed in order to create the really interesting cases.

Have you come across any successful search and BI integrations? What is your vision? Do you think the integration between the two is a likely scenario?
Please let us know by posting your comments.

It’s soon time for us to go on summer vacation.

If you are Swedish, Nicklas Lundblad from Google had an interesting program about search (Sommar i P1) the other day, which is available as a pod

Have a nice summer all of you!

Caroline Abrahamsson

Search in SharePoint 2010

maj 15 - 2010 | Caroline Abrahamsson

This week there has been a lot of buzz about Microsoft’s launch of SharePoint 2010 and Office 2010. Since SharePoint 2007 has been the quickest growing server product in the history of Microsoft, the expectations on SharePoint 2010 is tremendous.

Apart from a great deal of possibilities when it comes to content creation, collaboration and networking, easy business intelligence etc.  the launch also holds another promise: that of even better search capabilities (with the integration of FAST).

Since Microsoft acquired FAST in 2008, there have been a lot of speculations about what the future SharePoint versions may include in terms of search. And since Microsoft announced that they will drop their Linux and UNIX versions in order to focus on higher innovation speed, Microsoft customer are expecting something more than the regular. In an early phase it was also clear that Microsoft is eager to take market shares from the growing market in internet business.

So, simply put, the solutions that Microsoft now provide in terms of search is solutions for Business productivity (where the truly sophisticated search capabilities are available if you have Enterprise CAL-licenses, i.e. you pay for the number of users you have) and Internet Sites (where the pricing is based on the number of servers). These can then be used in a number of scenarios, all dependent on the business and end-user needs.
Microsoft has chosen to describe it like this:

  • Foundation” is, briefly put, basic SharePoint search (Site Search).
  • Standard” adds collaboration features to the ”Foundation” edition and allows it to tie into repositories outside of SharePoint.
  • Enterprise ” adds a number of capabilities, previously only available through FAST licenses, such as contextual search (recognition of departments, names, geographies etc), ability to tag meta data to unstructured content, more scalability etc.

I’m not going to go into detail, rather just conclude that the more Microsoft technology the company or organization already use, the more benefits it will gain from investing in SharePoint search capabilities.

And just to be clear:  non-SharePoint versions (stand-alone) of FAST are still available, even though they are not promoted as intense as the SharePoint ones.

Apart from Microsoft’s overview above, Microsoft Technet provides a more deepdrawing description of the features and functionality from both an end-user and administrator point of view.

We look forward describing the features and functions in more detail in our upcoming customer cases. If you have any questions to our SharePoint or FAST search specialist, don’t hesitate to post them here on the blog. We’ll make sure you get all the answers.

Tobias Larsson Hult

Real time search in the Enterprise

maj 10 - 2010 | Tobias Larsson Hult

Real time search is a big fuzz in the global network called Internet. Major search engines like Google and Bing are now providing users with real time search results from Facebook, Twitter, Blogs and other social media sites. Real time search means that as soon as content are created or updated, it is immediately searchable. This might be obvious and seems like a basic requirement, but working with search you know that this is not the case most of the time. Looking inside the firewall, in the enterprise, I dare to say that real time search is far from common. Sometimes content is not changed very frequently so it is not necessary to make it instantly searchable. Though, in many cases it’s the technical architecture that limits a real time search implementation.

The most common way of indexing content is by using a web crawler or a connector. Either way, you schedule them to go out and fetch new/updated/deleted content at specific interval during the day. This is the basic architecture for search platforms these days. The advantage of this approach is that the content systems does not need to adapt to the search platform, they just deliver content through their ordinary API:s during indexing. The drawback is that new or updated content is not available until next scheduled indexing. Depending on the system this might take several hours. Due to several reasons, mostly performance, you do not want to schedule connectors or web crawlers to fetch content too often. Instead, to provide real time search you have to do the other way around; let the content system push content to the search platform.

Most systems have some sort of event system that triggers an event when content is created/updated/deleted. Listening for these events, the system can send the content to the search platform at the same time it’s stored in the content system. The search platform can immediately index the pushed content and make it searchable. This requires adaptation of the content system towards the search platform. In this case though, I think the advantages outweighs the disadvantages. Modern content systems of today are (or should be) providing a plug-in architecture so you should fairly easy be able to plug in this kind of code. These plugins could also be provided by the search platform vendors just as ordinary connectors are provided today.

Do you agree, or have I been living in a cave for the past years? I’d love to hear you comments on this subject!

Tobias Larsson Hult

Relevance is important

mars 24 - 2010 | Tobias Larsson Hult

A couple of weeks ago I read an interesting blog post about comparing the relevance of three different search engines. This made me start thinking of relevance and how it’s sometimes overlooked when choosing or implementing a search engine in a findability solution. Sometimes a big misconception is that if we just install a search engine we will get splendid search results out of the box. While it’s true that the results will be better than an existing database based search solution, the amount of configuration needed to get splendid results is based on how good relevance you get from the start. And as seen in the blog post, it can be quite a bit of different between search engines.

So what is relevance and why does it differ between search engines? Computing relevance is the core of a search engine. Essentially the target is to deliver the most relevant set of results with regards to your search query. When you submit your query, the search engine is using a number of algorithms to find, within all indexed content, the documents or pages that best corresponds to the query. Each search engine uses it’s own set of algorithms and that is why we get different results.

Since the relevance is based on the content it will also differ from company to company. That’s why we can’t say that one search engine has better relevance than the other. We can just say that it differs. To know who performs the best, you have to try it out on your own content. The best way to choose a search engine for your findability solution would thus be to compare a couple and see which yields the best results. After comparing the results, the next step would then be to look at how easy it is to tune the relevance algorithms, to what extent it is possible and how much you need to tune. Based on how good relevance you get from the start you might not need to do much relevance tuning, thus you don’t need the ”advanced relevance tuning functionality” that might cost extra money.

In the end, the best search engine is not the one with most functionality. The best one is the one that gives you the most relevant results, and by choosing a search engine with good relevance for your content some initial requirements might be obsolete which will save you time and money.

Findwise

Welcome to the search and findability blog!

mars 2 - 2010 | Findwise

As some of you already know, Findwise has been blogging at findwise.se for several years now. However, we thought it was time to separate the blog from our web site and create a forum especially dedicated to the exciting area of findability. From Findwise perspective, findability is the art of making information easy to find by using (enterprise) search technology, this regardless of when the information is needed or where it may be stored.

Here we invite you to learn more about findability and we welcome you to give us feedback and keep a dialogue with us. We will, among other things, keep you updated on relevant research within the findability area, exciting search functionality and news about enterprise search vendors.

Our new blog includes features that were not available in our previous blog. These are: rss subscription, Findwise Twitter feed and the possibility to share information via other social medias. We hope and believe our readers will appreciate these features and we are looking forward to discussing findability and search with you!

Tobias Larsson Hult

To Crawl or not to Crawl

december 11 - 2009 | Tobias Larsson Hult

Having an Enterprise Search Engine, there are basically two ways of getting content into the index; using a web crawler or a connector. Both methods have their advantages and disadvantages. In this post I’ll try to poinpoint the differences with the two methods.

Web crawler

Most systems of today have a web-interface. Let it be your time reporting system, intranet, document management, you’ll probably access those with your web browser. Because of this, it’s very easy to use a web crawler to index this content as well.

The web crawler index the pages by starting at one page. From there, it follows all outbound links and index those. From those pages, it follows all links, and so on. This process continues until all links at a web site has been followed and the pages been indexed. The crawler thus uses the same technique as a human, visit a page and clicking the links.

Most Enterprise Search Engines are bundled with a web crawler. Thus, it’s usually very easy to get started. Just enter a start page and within minutes you’ll have searchable content in your index. No extra installation or license fee are required. For some sources, this may also be the only option, i.e if you’re indexing external sources that your company has no control of.

The main disadvantage though, is that web pages are designed for humans, not crawlers. This means that there are a lot of extra information for presentation purposes, such as navigation menus, sticky information messages, headers and footers and so on. All of this makes it a more pleasant experience for the user, and also making it easier to navigate on the page. The crawler on the other hand has no use of this information when retrieving pages. It’s actually reducing information quality in the index. For example, a navigation menu will be displayed on every page, thus the crawler will index the navigation content for all pages. So if you have a navigation item called ”Customers” and a user searches for customers, he/she will get a hit in ALL pages in the index.

There are ways to get around this, but it requires either altering of the produced HTML or adjustments in the search engine. Also, if the design of the site change, you have to do these adjustments again.

Connector

Even though the majority of systems has a web-interface, the content is stored in a data source of some format. It might be a database, structured file system, etc. By using a connector, you connect either to the underlying data source or to the system directly by its programming API.

Using a connector, the search engine does not get any presentation information but only the pure content, making the information quality in the index better. The connector can also retrieve all metadata associated with the information which further increases the quality. Often, you’ll also have more fine-grained control over what will be indexed with a connector than a web crawler.

Though, using a connector requires more configuration. It might also cost some extra money to buy one for your system, and require additional hardware. Though, once set up, it’s most likely to produce more relevant results compared to a web crawler.

Bottom line is it’s a consideration between quality and cost, as most decisions in life :)

Caroline Abrahamsson

Do you know something I don’t? The art of benchmarking

december 1 - 2009 | Caroline Abrahamsson

During the autumn we have been trying to keep our customers and others up to date with the search world by hosting breakfast seminars.
By sharing experiences and discussing with others the participants have taken giant leaps in understanding what search can deliver in true value.
The same goes for sharing experiences between companies, where you often find yourself struggling with the same problems, regardless of business or company size.

We have been discussing how Enterprise search can help intranets, extranets, external sites and support centers to capitalize on their knowledge.
Some of the things that have been discussed:

…Business Cases:
How can search help companies save 100 million SEK/year?
How do you count return on investment (ROI) for search?

…Search functionality:
How and why should you work with:
Key Matches to promote certain content (similar to Google’s sponsored links on the web)
Synonyms (to make sure that the end-users language corresponds to the corporate without having to change the information)
Query completion and suggestion to give the user an overview of what other people have been searching for when they start to type (similar to Apples web site search).

…End-user experience
How can different interfaces serve different information needs and user-groups?
How does your user interface serve your end-users?

…Information Quality
Do taxonomies and folksonomies help us find information faster?
Can search be used to improve the quality of your content?

During the spring we will continue to hold seminars, keeping you up-to date. If you’re not on our mailing list, please send us an e-mail and we’ll make sure you will get an invitation.

During Wednesday and Thursday this week we will be attending the Ability conference to discuss search. Hope to see you there!

Christopher Wallstrom

Enterprise Search 2.0?

november 30 - 2009 | Christopher Wallstrom

While visiting Enterprise Search Summit in San Jose I realized that enabling Enterprise 2.0 within enterprise search is the hottest trend at the moment.

Andrew McAfee who coined the term Enterprise 2.0 and has released a book on the subject, spoke about how to use altruism to develop the enterprise. People are wired to help and if we stop obsessing about the risks and lower the bars for how people can help each other it is possible to make this work within a corporate environment.

He also spoke about how process control and how much workflow control. How much do we really need? Make it easy to correct mistake instead of making it hard to make them. With regards to innovation he pointed out that we need to question credentialism and build communities that people want to join. To leverage the intelligence aspects within the enterprise we should explore and experiment with collective intelligence such as prediction markets and open peer review processes. All in all make it easy for people to interconnect.

Very high improvement in access to knowledge, internal experts, satisfaction, increased innovation and customer satisfaction.

I also recommend to read Price Waterhouse Coopers Technology Forecast Summer 2008 to get a good overview of the available tools and technologies.

So how does this impact enterprise search? Search can be made to be the facilitator for Enterprise 2.0. Of course it is possible to index and make all blogs, wikipedias, tweets (yammer), online communities and social networks searchable, but that is only one way to make it this new environment more findable. If someone tweets or blogs about information we should use that information to impact on the search results and ranking. We could also track user behavior on a site to make certain information more visible with regards to implicitly expressed interests.