Linus Torvalds Upset over Ext3 and Ext4
Linus Torvalds, Ted Ts'o, Alan Cox, Ingo Molnar, Andrew Morton and other Linux kernel developers are embroiled in a contentious discussion over the sense -- or nonsense -- of journaling and delayed allocation before a commit in the ext3 and ext4 filesystems. Heavy words are flying.
It all started with a request for help from Jesper Krogh in one of the first responses to Torvalds's announcement March 24 of Kernel 2.6.29 on the gmane.linux.kernel mailing list. Krogh reported a significant delay when writing from cache with the ext3 filesystem, despite faster hardware and extensive RAM. Was there a way to autotune it? Ingo Molnar opined that Krogh's wait time of 10 minutes was totally unacceptable, "it is the year 2009, not 1959." His personal "pain threshold" is about one second: "the historic limit for the hung tasks check was 10 seconds, then 60 seconds."
Ted Ts'o, groundbreaking in the filesystem's development, chimed in to the forum. It was just recently that he had been confronted by users over data loss upon installing their apps on the new ext4 filesystem. Ts'o set himself intensely on the problem with the source research and detailed explanation. Again he described the delayed effect in writing data. Synchronization in ext3 occurs every five seconds, whereas ext4 normally writes from cache every two minutes. Ts'o got pretty defensive: "People can call file system developers idiots if it makes them feel better --- sure, OK, we all suck. If someone wants to try to create a better file system, show us how to do better, or send us some patches."
Torvalds, for one, didn't seem too excited about the delayed synchronization. He writes on the mailing list, "Doesn't at least ext4 default to the insane model of 'data is less important than metadata, and it doesn't get journalled'? And ext3 with 'data=writeback' does the same, no? Both of which are -- as far as I can tell -- total brain damage. At least with ext3 it's not the default mode." To avoid the synchronization problem Ts'o had recommended at least temporarily migrating ext4 to a few separate systems only. Torvalds considered this to be "crappy" advice and that "we might as well go back to ext2 then."
In his response, Ts'o fell back on the performance benefits thanks to delayed allocation, as had been allowed earlier under POSIX. By his experience, the difference between five seconds and three minutes "wasn't that big of a deal" in practice, "at least in the days when people were proud of their Linux systems having 2-3 year uptimes." Plus there was a remedy: "For precious files, applications that use fsync() will be safe." If this were a problem for some, they could "turn off delayed allocation with the nodelalloc mount option."
Kernel chief Torvalds is hardly convinced by these arguments. In his view, "if you write your metadata earlier (say, every 5 sec) and the real data later (say, every 30 sec), you're actually more likely to see corrupt files than if you try to write them together... This is why I absolutely detest the idiotic ext3 writeback behavior. It literally does everything the wrong way around -- writing data later than the metadata that points to it. Whoever came up with that solution was a moron. No ifs, buts, or maybes about it."
Comments
comments powered by DisqusSubscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
Ubuntu 25.04 Coming Soon
Ubuntu 25.04 (Plucky Puffin) has been given an April release date with many notable updates.
-
Gnome Developers Consider Dropping RPM Support
In a move that might shock a lot of users, the Gnome development team has proposed the idea of going straight up Flatpak.
-
openSUSE Tumbleweed Ditches AppArmor for SELinux
If you're an openSUSE Tumbleweed user, you can expect a major change to the distribution.
-
Plasma 6.3 Now Available
Plasma desktop v6.3 has a couple of pretty nifty tricks up its sleeve.
-
LibreOffice 25.2 Has Arrived
If you've been hoping for a release that offers more UI customizations, you're in for a treat.
-
TuxCare Has a Big AlmaLinux 9 Announcement in Store
TuxCare announced it has successfully completed a Security Technical Implementation Guide for AlmaLinux OS 9.
-
First Release Candidate for Linux Kernel 6.14 Now Available
Linus Torvalds has officially released the first release candidate for kernel 6.14 and it includes over 500,000 lines of modified code, making for a small release.
-
System76 Refreshes Meerkat Mini PC
If you're looking for a small form factor PC powered by Linux, System76 has exactly what you need in the Meerkat mini PC.
-
Gnome 48 Alpha Ready for Testing
The latest Gnome desktop alpha is now available with plenty of new features and improvements.
-
Wine 10 Includes Plenty to Excite Users
With its latest release, Wine has the usual crop of bug fixes and improvements, along with some exciting new features.
You might as well use XFS
What FS Does Linus Use/LIke?
THANKS
Ext3/4 reliability
Mechanism for ext3/ext4 data loss?
I can see that if there is a power failure when data is in memory and it hasn't been written to a journal somewhere - it can be lost. A fairly old fix to this is to write the journal to a battery backed memory on the disk controller. If this write can be done before the main power supply capacitors are depleted, there shouldn't be any loss. Maybe there is a less expensive way to do it.
Delayed sync
Unless it is a day or time when your machine is busy.
But imagine the situation on a fresh install or the copying of huge amounts of data, I can't help feeling that cacheing system is going to be a terrible bottle neck.
Run a rsync in the back ground while you are editing pictures, when is the sync going to catch up exactly.
I know the ext4 guys are getting hot under the collar, but surely they can understand that people are going to wonder at how good the ext4 is at deciding on the best action on the fly. Journal now or journal after a delay?
Those data losses, why respond with "If you know a better way tell us...", well I know one that might be better, don't lose data.