Linus Torvalds Upset over Ext3 and Ext4
Linus Torvalds, Ted Ts'o, Alan Cox, Ingo Molnar, Andrew Morton and other Linux kernel developers are embroiled in a contentious discussion over the sense -- or nonsense -- of journaling and delayed allocation before a commit in the ext3 and ext4 filesystems. Heavy words are flying.
It all started with a request for help from Jesper Krogh in one of the first responses to Torvalds's announcement March 24 of Kernel 2.6.29 on the gmane.linux.kernel mailing list. Krogh reported a significant delay when writing from cache with the ext3 filesystem, despite faster hardware and extensive RAM. Was there a way to autotune it? Ingo Molnar opined that Krogh's wait time of 10 minutes was totally unacceptable, "it is the year 2009, not 1959." His personal "pain threshold" is about one second: "the historic limit for the hung tasks check was 10 seconds, then 60 seconds."
Ted Ts'o, groundbreaking in the filesystem's development, chimed in to the forum. It was just recently that he had been confronted by users over data loss upon installing their apps on the new ext4 filesystem. Ts'o set himself intensely on the problem with the source research and detailed explanation. Again he described the delayed effect in writing data. Synchronization in ext3 occurs every five seconds, whereas ext4 normally writes from cache every two minutes. Ts'o got pretty defensive: "People can call file system developers idiots if it makes them feel better --- sure, OK, we all suck. If someone wants to try to create a better file system, show us how to do better, or send us some patches."
Torvalds, for one, didn't seem too excited about the delayed synchronization. He writes on the mailing list, "Doesn't at least ext4 default to the insane model of 'data is less important than metadata, and it doesn't get journalled'? And ext3 with 'data=writeback' does the same, no? Both of which are -- as far as I can tell -- total brain damage. At least with ext3 it's not the default mode." To avoid the synchronization problem Ts'o had recommended at least temporarily migrating ext4 to a few separate systems only. Torvalds considered this to be "crappy" advice and that "we might as well go back to ext2 then."
In his response, Ts'o fell back on the performance benefits thanks to delayed allocation, as had been allowed earlier under POSIX. By his experience, the difference between five seconds and three minutes "wasn't that big of a deal" in practice, "at least in the days when people were proud of their Linux systems having 2-3 year uptimes." Plus there was a remedy: "For precious files, applications that use fsync() will be safe." If this were a problem for some, they could "turn off delayed allocation with the nodelalloc mount option."
Kernel chief Torvalds is hardly convinced by these arguments. In his view, "if you write your metadata earlier (say, every 5 sec) and the real data later (say, every 30 sec), you're actually more likely to see corrupt files than if you try to write them together... This is why I absolutely detest the idiotic ext3 writeback behavior. It literally does everything the wrong way around -- writing data later than the metadata that points to it. Whoever came up with that solution was a moron. No ifs, buts, or maybes about it."
Comments
comments powered by DisqusSubscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
There's a New Open Source Terminal App in Town
Ghostty is a new Linux terminal app that's fast, feature-rich, and offers a platform-native GUI while remaining cross-platform.
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
You might as well use XFS
What FS Does Linus Use/LIke?
THANKS
Ext3/4 reliability
Mechanism for ext3/ext4 data loss?
I can see that if there is a power failure when data is in memory and it hasn't been written to a journal somewhere - it can be lost. A fairly old fix to this is to write the journal to a battery backed memory on the disk controller. If this write can be done before the main power supply capacitors are depleted, there shouldn't be any loss. Maybe there is a less expensive way to do it.
Delayed sync
Unless it is a day or time when your machine is busy.
But imagine the situation on a fresh install or the copying of huge amounts of data, I can't help feeling that cacheing system is going to be a terrible bottle neck.
Run a rsync in the back ground while you are editing pictures, when is the sync going to catch up exactly.
I know the ext4 guys are getting hot under the collar, but surely they can understand that people are going to wonder at how good the ext4 is at deciding on the best action on the fly. Journal now or journal after a delay?
Those data losses, why respond with "If you know a better way tell us...", well I know one that might be better, don't lose data.