Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
New Reiser4 feature - Different transaction model per volume
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 248
Location: Serbia

PostPosted: Tue Mar 11, 2014 6:39 am    Post subject: New Reiser4 feature - Different transaction model per volume Reply with quote

So as you all know there are currently 2 types of filesystems by transaction model: journaling and COW
Journaling decrease fragmentation so it's better on HDDs and COW decrease IOs so it's better for SSDs
Now with same filesystem you can choose (per volume with only a mount option) if you want COW, journaling or hybrid type.

Reiser4 is slowly coming back, now with 3 active devs, maybe it's time for some benchmarks :)

Quote:
Hi all,

I am glad to announce a new unique feature of simple reiser4 volumes.

As you probably know, all other file systems implement only a single
transaction model. That is, they all are either only journalling
(ext3/4, ReiserFS(v3), XFS, jfs, ...), or only "write-anywhere"
(ZFS, Btrfs, etc).

However, journalling file systems are not the best choice for SSD
drives (as they issue larger number of IOs because of double writes -
first you should write to journal, and then to the permanent location
on disk. As you guess, larger number of IOs means performance drop and
reduced life of SSD drives.

As to "write-anywhere" file systems: they work badly with HDD drives.
Indeed, in accordance with this transaction model you can not
overwrite blocks on disk. Instead, you should write the modified
buffers to different location, and after making sure that they have
been written successfully, deallocate old blocks (sometimes this
transaction model is called "Copy-on-Write", but we will use the
historically first name "Write-Anywhere"). Such mandatory relocations
lead to rapid external fragmentation, especially when you perform a
lot of overwrites at random offsets. Respectively, the performance
rapidly degrades. To improve the situation you need to incessantly run
defragmentation tools.

Reiser4 users now can choose a transaction model which is most
suitable for their devices. This is very simple: just specify it by
respective mount option. With the patch applied you will have 3
options:

1) Journalling (mount option "txmod=journal").

In this mode all overwritten buffers (nodes) will be committed via
journal (I remind that instead of obsolete "journal block devices"
Reiser4 uses more advanced technique of wandering logs).

This mode is for HDD users, who complained about fragmentation of
reiser4 volumes. I imagine, that this is not a 100% panacea against
fragmentation, but it is better than nothing: in this mode the
situation with fragmentation has to be not worse than in ReiserFS(v3)!
Alas, the 100% panacea (reiser4 repacker) is still a long-term todo.

2) Write-Anywhere, aka Copy-on-Write (mount option "txmod=wa")

All modified nodes in this mode will get new location on disk (like
in ZFS, Btrfs, etc). In this mode reiser4 doesn't make active
attempts to defragment atoms. In this mode reiser4 will issue minimal
number of IOs, however reiser4 volumes will be rapidly fragmented.
This option is only for SSD users.

3) Hybrid transaction model (mount option "txmod=hybrid")

This is the default model suggested by Hans Reiser and Josh MacDonald
in ~2002. This model uses an advanced feature of reiser4 transaction
manager, so-called "compound checkpoints", which means that a part of
dirty nodes is committed via journal (overwrite), and another part is
committed via write-anywhere technique (i.e. gets another location on
disk). All relocate-overwrite decisions in this mode are results of
attempts to defragment locality of atoms that are to be committed.
Clean nodes of this locality also can be involved to the commit
process (their location on disk will be changed, if it provides
excellent results).

In this model number of issued IOs is not so large as in traditional
Journalling model, and fragmentation is not so rapid as in traditional
Write-Anywhere (CoW) model.

However, such local defragmentation doesn't help a lot in some cases
of workload, and I periodically get complaints from users about
degradation of reiser4 volumes. So, this model is for HDD users, who
don't perform a lot of random overwrites. Once the repacker is ready,
I'll recommend this mode for all HDD users (just because pure
journalling is anyway suboptimal for HDD drives).


WARNING!!! WARNING!!! WARNING!!!


Only default (hybrid) mode is safe. Other ones (Journalling and
Write-Anywhere) need more testing - don't use them for important data
for now.


Implementation details


We introduce a new layer/interface TXMOD (Transaction MODel) called
at flush time for reiser4 atoms. Every plugin of this interface is
a high-level block allocator, which assigns block numbers to dirty
nodes, and, thereby, decides, how those nodes will be committed.

Every dirty node of reiser4 atom can be committed by either of the
following two ways:
1) via journal;
2) using "write-anywhere" technique.

If the allocator doesn't change on-disk location of a node, then this
node will be committed using journalling technique (overwrite).
Otherwise, it will be committed via write-anywhere technique (relocate)

relocate <---- allocate --- > overwrite

So, in our interpretation the two traditional "classic" strategies in
committing transactions (journalling and "write-anywhere") are just
two boundary cases: 1) when all nodes are overwritten, and 2) when all
nodes are relocated.

Besides those 2 boundary cases we can implement the infinite set of
their various combinations, so that user can choose what is really
suitable for his needs.


How it looks in practice


Let's create a large enough file on a reiser4 partition (let it be a
645K /etc/services):

# mkfs.reiser4 -o create=reg40 /dev/sdb5
# mount /dev/sdb5 /mnt
# cp /etc/services /mnt/.
# umount /mnt
# debugfs.reiser4 -t /dev/sdb5

NODE (23) LEVEL=2 ITEMS=2 SPACE=3968 MKFS ID=0x4ed8c6de FLUSH=0x0
#0 NPTR (nodeptr40): [29:1(SD):0:2a:0] OFF=28, LEN=8, flags=0x0 [24]
------------------------------------------------------------------------------
#1 EXTENT (extent40): [2a:4(FB):73657276696365:10000:0] OFF=36, LEN=16, flags=0x0
UNITS=1 [25(162)]
==============================================================================

We can see that file data is represented by a single extent of 162
blocks starting at block #25. Let's overwrite first 100K of this file
in journalling transaction model:

# mount /dev/sdb5 -o txmod=journal /mnt
# dd if=/dev/zero of=/mnt/services bs=100K count=1 conv=notrunc
# umount /mnt
# debugfs.reiser4 -t /dev/sdb5

NODE (23) LEVEL=2 ITEMS=2 SPACE=3968 MKFS ID=0x4ed8c6de FLUSH=0x0
#0 NPTR (nodeptr40): [29:1(SD):0:2a:0] OFF=28, LEN=8, flags=0x0 [24]
------------------------------------------------------------------------------
#1 EXTENT (extent40): [2a:4(FB):73657276696365:10000:0] OFF=36, LEN=16, flags=0x0
UNITS=1 [25(162)]
==============================================================================

We can see that overwritten nodes occupy the same location on disk,
and our extent hasn't beed destroyed (fragmented). Moreover, the
modified parent node occupies the same location on disk (block #23).

Let's now overwrite first 100K of this file in Write-Anywhere
(Copy-on-Write) transaction mode:

# mount /dev/sdb5 -o txmod=wa /mnt
# dd if=/dev/zero of=/mnt/services bs=100K count=1 conv=notrunc
# umount /mnt
# debugfs.reiser4 -t /dev/sdb5

NODE (213) LEVEL=2 ITEMS=2 SPACE=3952 MKFS ID=0x4ed8c6de FLUSH=0x0
#0 NPTR (nodeptr40): [29:1(SD):0:2a:0] OFF=28, LEN=8, flags=0x0 [187]
------------------------------------------------------------------------------
#1 EXTENT (extent40): [2a:4(FB):73657276696365:10000:0] OFF=36, LEN=32, flags=0x0
UNITS=2 [188(25) 50(137)]
==============================================================================

We can see, that first 100K (25 blocks) has been relocated in
accordance with "Write-Anywhere" transaction model: initial extent has
been split into 2 ones: first unit consists of 25 relocated blocks,
which start at block #188, and second unit consists of 137 blocks,
which occupy the same location on disk. Modified parent also got new
location (block #213 - was #23).

Let's calculate total number of IOs issued when overwriting the file
in different modes:

1) Journalling

50 blocks were submitted for data modification (25 has been
written to journal, and 25 to permanent location);
2 blocks were submitted to modify parent (block #23 in the dump)
(1 to journal, and 1 to permanent location);
2 blocks to modify bitmap (1 to journal, and 1 to permanent location)
2 blocks to modify superblock (1 to journal, and 1 to permanent
location)
--------------------
Total: 56 blocks.

2) Write-Anywhere (Copy-on-Write)

25 blocks were submitted (relocated) for data modifications;
1 block was submitted to modify parent, which got new location #213;
2 blocks were submitted to modify bitmap (1 to journal, and 1 to
permanent location);
2 blocks were submitted to modify superblock (1 to journal, and 1 to
permanent location);
NOTE: system blocks (bitmaps, superblock, etc) can not be relocated in
reiser4, so we always commit them via journal.
---------------------
Total: 30 blocks.

So we have 56 IOs issued in journalling mode against 30 IOs in
Write-Anywhere. However, fragmentation is a payment for the smaller
number of IOs in Write-Anywhere mode (see the last dump, where we have
2 extents). So this transaction model is only for SSD drives, as they
are not sensitive to external fragmentation. Again, "journal" is for
HDD, and "wa" is for SSD, please, don't confuse!

----------------------------------------------------------------------
MOUNT OPTION INTENDED FOR DEFAULT
----------------------------------------------------------------------
txmod=journal HDD users no
----------------------------------------------------------------------
txmod=wa SSD users no
----------------------------------------------------------------------
txmod=hybrid HDD users, who don't perform yes
a lot of random overwrites
----------------------------------------------------------------------

Please, find the patch against reiser4-for-3.13.1 here:
http://sourceforge.net/projects/reiser4/files/patches/

As usual, bugreports, comments, questions, experiences (and not only
negative ones) are welcome.

Thank you for choosing Reiser4!

Edward.

_________________
Reiser4 Gentoo FAQ [25Dec2016]
Back to top
View user's profile Send private message
Bones McCracker
Veteran
Veteran


Joined: 14 Mar 2006
Posts: 1611
Location: U.S.A.

PostPosted: Fri Mar 21, 2014 1:17 am    Post subject: Reply with quote

Excellent! Glad to see they are still innovating.
_________________
patrix_neo wrote:
The human thought: I cannot win.
The ratbrain in me : I can only go forward and that's it.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri Mar 28, 2014 6:07 pm    Post subject: Reply with quote

the reiser3 & reiser4 filesystems were badass (== innovative and reliable) right from the start and will always be - at least from experience

seems like I have an additional filesystem to test on this /var/tmp zram partition :)


thanks for posting dusanc,

Thanks Edward & everyone else contributing ! 8)
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
Bones McCracker
Veteran
Veteran


Joined: 14 Mar 2006
Posts: 1611
Location: U.S.A.

PostPosted: Fri Mar 28, 2014 9:41 pm    Post subject: Reply with quote

I have a box still running reiserfs. It's been running for eight years now without problems.
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6097
Location: Dallas area

PostPosted: Fri Mar 28, 2014 9:51 pm    Post subject: Reply with quote

I have been running reiser3 for quite a few years now, well, except for /boot which is ext2.

I looked at ext4 for my backup raid, the performance was better than r3 but it took more disk space for itself, so I left it reiser3.

I hope that they continue to improve reiser4 and that the performance can increase substantially over r3.
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum