Alternative metasearch engines

Forgetful Search

© Image by Gerd Altmann from Pixabay

© Image by Gerd Altmann from Pixabay

Article from Issue 291/2025
Author(s):

Alternative open source metasearch engines offer more privacy than mainstream search engines and can sometimes yield better results. While SearXNG is the best-known open source metasearch engine, 4get is a capable alternative worth checking out.

An issue most privacy advocates don't think much about is that there are only three high-end search engines in the world who sport their own index. Google, Bing, and Yandex are about the only authoritative general-purpose search engines capable of giving usable results to common queries. Most other alternatives (such as DuckDuckGo or Startpage) just act as front ends to these. Smaller alternative search engines who have their own web crawlers and indexes just don't have big enough databases to give good results, or are too specialized (see the "Why Not an Open Source Big Index" box).

Why Not an Open Source Big Index

Readers may be wondering why nobody from the free open source software (FOSS) community has come up with an open source search engine for indexing the Internet at a scale like Google, Bing, and Yandex do. The answer is people are actually trying and failing [1] [2].

The first obstacle for a new engine is that content delivery networks (CDNs) typically throttle alternative search engines, making it hard for them to crawl and map sites. The second obstacle is big search engines are closing deals with popular sites in order to ensure competitors are banned from crawling these sites [3]. Finally, running a large-scale search engine is hard; in order to provide usable results, you'll need an index worth many petabytes of data. It is expensive to sustain a monolithic index this size. The partially open source Gigablast was the engine that got closest to good results by using FOSS technology. While it was usable without being fantastic, Gigablast silently ceased operations in 2023 and was replaced by counterfeit sites capitalizing on its reputation.

The FOSS crowd, therefore, is focusing their resources on alternative methods. The YaCy [4] search engine, for example, is distributed, so instead of a big index in a server room, each user maintains a small piece of a larger index that is searchable by every user. Unfortunately, YaCy's quality doesn't look promising.

The three big search engines, therefore, comfortably dominate the market. This is a problem for a number of reasons. First of all, if you use one of these engines directly (i.e., you perform Google searches from your computer), the engine gets to know what you search and, eventually, will produce an accurate profile about your interests. Because mainstream engines get to see the queries of most Internet citizens, they can make a profile of nearly everyone. Using a front end such as Startpage alleviates the problem: Google does not get to correlate your searches with your identity, because the alternative front end acts as an anonymizing proxy of sorts, but you still have to trust that the alternative front end is playing fair and not profiling you itself.

Secondly, whether you use a big search engine directly or not, at the end of the day you only get the results that the big indexes want you to find. If you use DuckDuckGo, you are indirectly getting Bing results. If there is an article Microsoft does not want you to read, the article won't feature high in the list of results, if at all. Because most netizens only use big search engines, these big search engines get to decide what most people find.

Finally, consumers don't have much of a chance to find better alternatives if they dislike Google, Bing, and Yandex. In fact, the quality of results from the big three engines has been declining for a while  [5], and lookups done in mainstream search engines often result in a flood of spam, fake reviews, and low quality AI-generated clickbait.

The Solution

The traditional way to mitigate these problems is to use a reputable metasearch engine. A metasearch engine is just a front end that relays your queries to multiple search engines and combines the results offered by all of them into a single list. In theory, this maximizes your chance of finding something even if it is not present on some of the engines for whatever reason. In addition, by using a reputable middleman, the search engines being queried in the back end can't profile you.

The most popular FOSS solution is using a SearXNG [6] instance. SearXNG, a fork of searx, is a self-hostable metasearcher with support for multiple back ends (Figure 1). Lots of instances, run by volunteers, are available for public use – you can even host your instance if so desired. That said, FOSS is all about avoiding monocultures. Because searx and SearXNG are already popular enough, maybe it is time to talk about a different option.

Figure 1: SearXNG provides an amazing number of back ends for performing web searches.

Enter 4get

4get is a metasearcher released under the GNU Affero General Public License (AGPL). You can either host it on your own server (see the "Self-Hosting Your Instance" box) or use any of the available public instances (Figure 2). Many of the public instances are well maintained. As long as you trust the public instance operators not to analyze your search profile, I find self-hosting offers little benefit when compared to the drawbacks. Certainly the main obstacle when using a public instance is the necessity of solving an (easy) CAPTCHA for every 100 queries. The developer, who goes by the lolcat handle, intends to introduce a paid tier for his instance, which will allow verified subscribers to bypass the CAPTCHA.

Self-Hosting Your Instance

4get's Git repository offers all the information you need to install your own instance on a server you control [7]. 4get is a PHP application that requires no dedicated database, and configuration examples are provided for use with Apache 2 and NGINX. I have also successfully installed 4get on OpenBSD 7.5 using OpenHttpd as my WWW daemon, but this process is unofficial and not for the faint of heart. For the impatient, 4get can also be deployed as a Docker container.

Services such as 4get are run either as public or private instances. A public instance is available for any user and typically listed in a directory. The advantage of owning your public instance is that your queries are mixed with other users' searches: If you look up a scone recipe using the Google back end and a different user searches for the best price on hunting rifles, Google won't be able to build a good profile for your searching habits because its data pool will be confusing. On the other hand, search engines may impose rate limits against your instance or outright ban it if you have too many users and generate too much traffic.

A private instance is intended for your personal use only, is not listed publicly, and will probably have some access control in place to ensure only your friends and you use it. It is unlikely such instance will be rate limited, but your searches won't be as anonymous as far as the back-end engines are concerned. If you are the only person using an instance, any search proxied to Google through it must be yours.

Figure 2: A respectable list of public 4get instances you can use for your searches.

4get's usage model is different than most other metasearchers. While metasearchers such as SearXNG will combine results from multiple sources, 4get offers a drop-down menu to switch between back ends on demand. The idea is that accumulating a mix of bad results from multiple sources does not offer any advantage. The developer thinks it is better to use one search engine at a time, switching to the next engine once you have tried and obtained no results from the current one. It could be argued that 4get is not a true metasearch engine, but I feel such distinction is just a matter of semantics.

Available Engines

4get supports searching websites, images, videos, news, and music. A comprehensive list of supported back ends is shown in Table 1.

Table 1

4get Back Ends

Back End

Type

Notes

DuckDuckGo

Web, images, videos, news

Metasearch engine that uses Bing as its main source; it claims to operate a small index and use more than 400 secondary sources

Brave

Web, images, videos, news

Search engine with its own index; it uses Bing as a secondary source

Yandex

Web, images, videos

Popular Russian search engine

Google

Web, images, videos, news

The most popular search engine

Startpage

Web, images, videos, news

A privacy conscious proxy for Google

Qwant

Web, images, videos, news

Search engine that supplements its own index with Bing results

Yep

Web, images

Search engine whose business model includes paying content creators

Greppr

Web

Simple search engine with its own index, for keyword-based searches

CrowdView

Web

Forum search engine with an AI foundation for analyzing and categorizing forum posts

Mwmbl

Web

Open source, community-driven nonprofit search engine

Mojeek

Web, news

Search engine with independent index

Marginalia

Web

Search engine specializing in noncommercial and hobby content

Wiby

Web

Search engine specializing in hobby and retro content, with a deliberately weak ranking algorithm

Curlie

Web

Human-curated web directory

Imgur

Images

Image sharing and hosting service

FindThatMeme

Images

Meme search engine

YouTube

Videos

Most popular video site

Soundcloud

Music

Internet audio streaming service

As you may expect, the big search engines are supported (Figure 3). The fact Yandex is included is noteworthy, because this search engine is not available on SearXNG instances. SearXNG's number of supported engines dwarves 4get, but, on the other hand, 4get carries a number of smaller specialist engines SearXNG lacks. In addition, 4get allows passing engine-specific options to each of its back ends, whereas SearXNG is much less granular in this regard.

Figure 3: 4get offers multiple selectable engines to carry out your searches, in different categories.

Some novel search engines available from 4get deserve mention. Marginalia  [8] is an interesting example. This search engine is focused on personal, noncommercial sites, and features its own index and crawler. When using Marginalia as the back end, it has the option to outright exclude sites featuring advertisements, affiliate links, cookies, trackers, or JavaScript (or any combination thereof). There is also a focus on excluding AI-generated sites. Marginalia's source code itself is available under APGL 3.0, with some parts licensed as MIT.

Wiby [9] is similar in it focus, and its goal is to serve as a portal to the small Internet (meaning personal, noncommercial sites). It doesn't feature as many options as Marginalia, but that is because Wiby's indexing policies are very strict. No site will be included in Wiby's index if it contains intrusive advertisements or bloated code. Wiby is also open source and released as GPLv2. Its main drawback is that its ranking algorithm is especially basic by design, because Wiby is intended to let you find random interesting stuff rather than answers to specific questions.

Many of the search engines are also prone to failure because they are hobby projects or are highly experimental. At the time of writing, Curlie  [10], a human-curated web directory, had a broken search engine – its related 4get back end returned no results. CrowdView [11], the AI-powered forum search engine, also only returned empty results. The same happened with Mwmbl [12]. None of these problems was specific to 4get, because it was the upstream engines that were failing.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News