Sorting and Searching
Doghouse – Algorithms and Books
A look at the history of computer memory and a classic algorithm text.
When I started programming in 1969, even mainframe computers from IBM might have had only 1MB of main memory and two or three disk drives that were measured in hundreds of megabytes, with multitrack tape drives that would supplement that storage. Any type of parallelism in programming typically stressed the goal of keeping the data that you needed coming into main memory, while you were also trying to get the processed data out to "stable storage," either to a disk or directly to a magnetic tape.
Some of the first IBM computers I programmed used an operating system called MFT (Multiple Fixed Tasking) which had up to 16 "partitions" in memory, each of a fixed size set when the operating system was configured.
When a job was started by the operators, the Job Control Language (JCL) told the operating system how much main memory it would take, and the job was slotted into a memory slot that was large enough to hold it. If the creator of the JCL specified too much memory, that meant memory was wasted. Too small an amount and the program might not finish, or even start.
Later the operating system was changed to Multiprogramming with a Variable number of Tasks (MVT), which allowed the programmer (who typically was the one who generated the JCL) to specify the exact size of one of the 16 possible slots. The problem with MVT was that it could create a series of small, unusable holes in the physical main memory of the system, and only when several small holes were joined together could a larger program run, a rather crude form of garbage collection.
If this sounds a bit complex, and even unbelievable, in the day of demand-paged virtual memory systems, I will throw in the additional information that the address space of these IBM mainframes was originally only 24 bits (instead of 32 or 64) meaning that any program could only address a maximum of 16 million bytes of main memory at a time.
Of course while these machines were incredibly slow, and logically small, by today's standards, we thought they were "magic" when we were programming them.
Still, we gave great thought to how much storage was needed for each piece of data and how much computation would be given to each calculation. Sometimes this worked against us in the long run, and some were pretty famous, such as the "Y2K" problem, where we we thought two digits, and not four, were enough for the date.
Many people will remember that as we moved toward the year 2000, panic started to erupt about what would happen near or at midnight of December 31, 1999, as we turned to the next century. Fortunately people started thinking about this early enough and working to negate the effects.
However, some years earlier, a series of books named The Art of Computer Programming, by Donald E. Knuth, was being published. Volume 3 of that work, Sorting and Searching, was first published in 1973, when I started programming IBM mainframes, and slightly before I started my masters degree in computer science.
The algorithms in the book carefully explained the different methods of sorting and searching and how they could take advantage of the levels of storage from disk to CPU registers and everything between (main memory, cache, etc.). The book pointed out that with the 24 bit address space limitations of that day (often 16 bit or less in mini computers), hash tables and hash searching techniques were often inefficient due to the number of collisions the search might have.
However, it is the combination of knowing the efficiency of the algorithm, the size of the sort, and the architecture of the machine that can avoid drastic consequences.
I once took a program that was supposed to sort 1,206 32-byte records, which ran on a PDP-11/70 computer (with a 64KB data address space) running RSTS/E as an operating system, that took 10.5 hours to finish the sort. The program used a bubble sort, and if all the data had been able to be held in main memory it would not have been that bad, but, unfortunately, it used a virtual array that was implemented on disk. The program made over 13 million comparisons and 700,000 accesses to the disk.
Rewriting the program using a tree sort and a sort merge as defined in Knuth's book broke the 1,206 records into seven files of 173 records, each of which could reside and be sorted in main memory. Finally, I merged the seven sorted files together into the resultant output file. The entire time was reduced to three minutes from 10.5 hours.
Knuth is not the only good author on algorithm study, but the depth of analysis makes these books way more timeless than the editions of other popular computer books.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.
-
Fedora 41 Released with New Features
If you're a Fedora fan or just looking for a Linux distribution to help you migrate from Windows, Fedora 41 might be just the ticket.
-
AlmaLinux OS Kitten 10 Gives Power Users a Sneak Preview
If you're looking to kick the tires of AlmaLinux's upstream version, the developers have a purrfect solution.
-
Gnome 47.1 Released with a Few Fixes
The latest release of the Gnome desktop is all about fixing a few nagging issues and not about bringing new features into the mix.