Alternative metasearch engines
Forgetful Search

© Image by Gerd Altmann from Pixabay
Alternative open source metasearch engines offer more privacy than mainstream search engines and can sometimes yield better results. While SearXNG is the best-known open source metasearch engine, 4get is a capable alternative worth checking out.
An issue most privacy advocates don't think much about is that there are only three high-end search engines in the world who sport their own index. Google, Bing, and Yandex are about the only authoritative general-purpose search engines capable of giving usable results to common queries. Most other alternatives (such as DuckDuckGo or Startpage) just act as front ends to these. Smaller alternative search engines who have their own web crawlers and indexes just don't have big enough databases to give good results, or are too specialized (see the "Why Not an Open Source Big Index" box).
Why Not an Open Source Big Index
Readers may be wondering why nobody from the free open source software (FOSS) community has come up with an open source search engine for indexing the Internet at a scale like Google, Bing, and Yandex do. The answer is people are actually trying and failing [1] [2].
The first obstacle for a new engine is that content delivery networks (CDNs) typically throttle alternative search engines, making it hard for them to crawl and map sites. The second obstacle is big search engines are closing deals with popular sites in order to ensure competitors are banned from crawling these sites [3]. Finally, running a large-scale search engine is hard; in order to provide usable results, you'll need an index worth many petabytes of data. It is expensive to sustain a monolithic index this size. The partially open source Gigablast was the engine that got closest to good results by using FOSS technology. While it was usable without being fantastic, Gigablast silently ceased operations in 2023 and was replaced by counterfeit sites capitalizing on its reputation.
The FOSS crowd, therefore, is focusing their resources on alternative methods. The YaCy [4] search engine, for example, is distributed, so instead of a big index in a server room, each user maintains a small piece of a larger index that is searchable by every user. Unfortunately, YaCy's quality doesn't look promising.
The three big search engines, therefore, comfortably dominate the market. This is a problem for a number of reasons. First of all, if you use one of these engines directly (i.e., you perform Google searches from your computer), the engine gets to know what you search and, eventually, will produce an accurate profile about your interests. Because mainstream engines get to see the queries of most Internet citizens, they can make a profile of nearly everyone. Using a front end such as Startpage alleviates the problem: Google does not get to correlate your searches with your identity, because the alternative front end acts as an anonymizing proxy of sorts, but you still have to trust that the alternative front end is playing fair and not profiling you itself.
Secondly, whether you use a big search engine directly or not, at the end of the day you only get the results that the big indexes want you to find. If you use DuckDuckGo, you are indirectly getting Bing results. If there is an article Microsoft does not want you to read, the article won't feature high in the list of results, if at all. Because most netizens only use big search engines, these big search engines get to decide what most people find.
Finally, consumers don't have much of a chance to find better alternatives if they dislike Google, Bing, and Yandex. In fact, the quality of results from the big three engines has been declining for a while [5], and lookups done in mainstream search engines often result in a flood of spam, fake reviews, and low quality AI-generated clickbait.
The Solution
The traditional way to mitigate these problems is to use a reputable metasearch engine. A metasearch engine is just a front end that relays your queries to multiple search engines and combines the results offered by all of them into a single list. In theory, this maximizes your chance of finding something even if it is not present on some of the engines for whatever reason. In addition, by using a reputable middleman, the search engines being queried in the back end can't profile you.
The most popular FOSS solution is using a SearXNG [6] instance. SearXNG, a fork of searx, is a self-hostable metasearcher with support for multiple back ends (Figure 1). Lots of instances, run by volunteers, are available for public use – you can even host your instance if so desired. That said, FOSS is all about avoiding monocultures. Because searx and SearXNG are already popular enough, maybe it is time to talk about a different option.
Enter 4get
4get is a metasearcher released under the GNU Affero General Public License (AGPL). You can either host it on your own server (see the "Self-Hosting Your Instance" box) or use any of the available public instances (Figure 2). Many of the public instances are well maintained. As long as you trust the public instance operators not to analyze your search profile, I find self-hosting offers little benefit when compared to the drawbacks. Certainly the main obstacle when using a public instance is the necessity of solving an (easy) CAPTCHA for every 100 queries. The developer, who goes by the lolcat handle, intends to introduce a paid tier for his instance, which will allow verified subscribers to bypass the CAPTCHA.
Self-Hosting Your Instance
4get's Git repository offers all the information you need to install your own instance on a server you control [7]. 4get is a PHP application that requires no dedicated database, and configuration examples are provided for use with Apache 2 and NGINX. I have also successfully installed 4get on OpenBSD 7.5 using OpenHttpd as my WWW daemon, but this process is unofficial and not for the faint of heart. For the impatient, 4get can also be deployed as a Docker container.
Services such as 4get are run either as public or private instances. A public instance is available for any user and typically listed in a directory. The advantage of owning your public instance is that your queries are mixed with other users' searches: If you look up a scone recipe using the Google back end and a different user searches for the best price on hunting rifles, Google won't be able to build a good profile for your searching habits because its data pool will be confusing. On the other hand, search engines may impose rate limits against your instance or outright ban it if you have too many users and generate too much traffic.
A private instance is intended for your personal use only, is not listed publicly, and will probably have some access control in place to ensure only your friends and you use it. It is unlikely such instance will be rate limited, but your searches won't be as anonymous as far as the back-end engines are concerned. If you are the only person using an instance, any search proxied to Google through it must be yours.
4get's usage model is different than most other metasearchers. While metasearchers such as SearXNG will combine results from multiple sources, 4get offers a drop-down menu to switch between back ends on demand. The idea is that accumulating a mix of bad results from multiple sources does not offer any advantage. The developer thinks it is better to use one search engine at a time, switching to the next engine once you have tried and obtained no results from the current one. It could be argued that 4get is not a true metasearch engine, but I feel such distinction is just a matter of semantics.
Available Engines
4get supports searching websites, images, videos, news, and music. A comprehensive list of supported back ends is shown in Table 1.
Table 1
4get Back Ends
Back End | Type | Notes |
---|---|---|
DuckDuckGo |
Web, images, videos, news |
Metasearch engine that uses Bing as its main source; it claims to operate a small index and use more than 400 secondary sources |
Brave |
Web, images, videos, news |
Search engine with its own index; it uses Bing as a secondary source |
Yandex |
Web, images, videos |
Popular Russian search engine |
|
Web, images, videos, news |
The most popular search engine |
Startpage |
Web, images, videos, news |
A privacy conscious proxy for Google |
Qwant |
Web, images, videos, news |
Search engine that supplements its own index with Bing results |
Yep |
Web, images |
Search engine whose business model includes paying content creators |
Greppr |
Web |
Simple search engine with its own index, for keyword-based searches |
CrowdView |
Web |
Forum search engine with an AI foundation for analyzing and categorizing forum posts |
Mwmbl |
Web |
Open source, community-driven nonprofit search engine |
Mojeek |
Web, news |
Search engine with independent index |
Marginalia |
Web |
Search engine specializing in noncommercial and hobby content |
Wiby |
Web |
Search engine specializing in hobby and retro content, with a deliberately weak ranking algorithm |
Curlie |
Web |
Human-curated web directory |
Imgur |
Images |
Image sharing and hosting service |
FindThatMeme |
Images |
Meme search engine |
YouTube |
Videos |
Most popular video site |
Soundcloud |
Music |
Internet audio streaming service |
As you may expect, the big search engines are supported (Figure 3). The fact Yandex is included is noteworthy, because this search engine is not available on SearXNG instances. SearXNG's number of supported engines dwarves 4get, but, on the other hand, 4get carries a number of smaller specialist engines SearXNG lacks. In addition, 4get allows passing engine-specific options to each of its back ends, whereas SearXNG is much less granular in this regard.

Some novel search engines available from 4get deserve mention. Marginalia [8] is an interesting example. This search engine is focused on personal, noncommercial sites, and features its own index and crawler. When using Marginalia as the back end, it has the option to outright exclude sites featuring advertisements, affiliate links, cookies, trackers, or JavaScript (or any combination thereof). There is also a focus on excluding AI-generated sites. Marginalia's source code itself is available under APGL 3.0, with some parts licensed as MIT.
Wiby [9] is similar in it focus, and its goal is to serve as a portal to the small Internet (meaning personal, noncommercial sites). It doesn't feature as many options as Marginalia, but that is because Wiby's indexing policies are very strict. No site will be included in Wiby's index if it contains intrusive advertisements or bloated code. Wiby is also open source and released as GPLv2. Its main drawback is that its ranking algorithm is especially basic by design, because Wiby is intended to let you find random interesting stuff rather than answers to specific questions.
Many of the search engines are also prone to failure because they are hobby projects or are highly experimental. At the time of writing, Curlie [10], a human-curated web directory, had a broken search engine – its related 4get back end returned no results. CrowdView [11], the AI-powered forum search engine, also only returned empty results. The same happened with Mwmbl [12]. None of these problems was specific to 4get, because it was the upstream engines that were failing.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
Blender App Makes it to the Big Screen
The animated film "Flow" won the Oscar for Best Animated Feature at the 97th Academy Awards held on March 2, 2025 and Blender was a part of it.
-
Linux Mint Retools the Cinnamon App Launcher
The developers of Linux Mint are working on an improved Cinnamon App Launcher with a better, more accessible UI.
-
New Linux Tool for Security Issues
Seal Security is launching a new solution to automate fixing Linux vulnerabilities.
-
Ubuntu 25.04 Coming Soon
Ubuntu 25.04 (Plucky Puffin) has been given an April release date with many notable updates.
-
Gnome Developers Consider Dropping RPM Support
In a move that might shock a lot of users, the Gnome development team has proposed the idea of going straight up Flatpak.
-
openSUSE Tumbleweed Ditches AppArmor for SELinux
If you're an openSUSE Tumbleweed user, you can expect a major change to the distribution.
-
Plasma 6.3 Now Available
Plasma desktop v6.3 has a couple of pretty nifty tricks up its sleeve.
-
LibreOffice 25.2 Has Arrived
If you've been hoping for a release that offers more UI customizations, you're in for a treat.
-
TuxCare Has a Big AlmaLinux 9 Announcement in Store
TuxCare announced it has successfully completed a Security Technical Implementation Guide for AlmaLinux OS 9.
-
First Release Candidate for Linux Kernel 6.14 Now Available
Linus Torvalds has officially released the first release candidate for kernel 6.14 and it includes over 500,000 lines of modified code, making for a small release.