An indexing search engine with Nutch and Solr

Go Find It

© Lead Image © Dmitry Naumov, 123RF.com

© Lead Image © Dmitry Naumov, 123RF.com

Article from Issue 186/2016
Author(s): , Author(s):

Build you own search engine using Apache's Nutch web crawler and Solr search platform.

CMS, wikis, text files … modern companies store important data in many different places, and that data must be accessible down to the tiniest detail through a single search. Commercial software vendors such as Google [1] offer tools that will index the data and store the index on an external server. But many organizations prefer to keep control of the search capabilities – for security and privacy reasons, but also to add flexibility and promote innovation and customization.

A handy constellation of open source tools from the Apache project will help you build your own search index for the assorted documents and data on your network: Nutch, Solr, Apache, and Lucene.

Nutch [2] is a powerful web crawler, and Apache Solr [3] is a search engine based on Apache Lucene [4]. You can combine Nutch with Solr to create a complete search engine – a miniature Google, if you like.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Index Search with Lucene

    Even state-of-the-art computers need to use clever methods to process ever-increasing amounts of document data. The open source Lucene framework uses inverted indexing for fast searches of document collections.

  • Full-Text Search Engines

    Full-text search engines like Solr, Xapian, and Sphinx make the daily data chaos on your hard disk searchable – and they even cooperate with relational databases.

  • Open Data with CKAN

    CKAN, a versatile data management system, lets you build a portal to share your open data.

  • Search Engines

    If you are interested in data privacy, you might want to try an alternative search engine. We discuss a few search engines that serve up good results, along with an option for setting up your own search engine.

  • FOSS Metasearch Engines

    Alternative open source metasearch engines offer more privacy than mainstream search engines and can sometimes yield better results. While SearXNG is the best-known open source metasearch engine, 4get is a capable alternative worth checking out.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News