Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Reiser4 Gentoo FAQ [25Sep2016]
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6, 7  
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
sPHERE911
n00b
n00b


Joined: 23 Mar 2008
Posts: 50

PostPosted: Tue Nov 05, 2013 7:02 pm    Post subject: Reply with quote

Just wanted to share my experience with this filesystem.

I purchased a Asus UX51vz with 2x256gb SSD's (sandforce), and after speaking directly to Edward by mail, he adviced me to use the default mount options and be sure to properly align the partitions, and since I use a sandforce controller which compresses data on the fly already, I disabled the compression-plugin with "mkfs.reiser4 -o create=reg40"

I set up the partitions with software raid0 via mdadm, and works flawless so far.

I have used btrfs for the last couple of months and I must say that I have had much more problems with btrfs than reiser4 (actually none so far, and the fsck.reiser4 works great too!)


Running kernel 3.11.1 with pf-patchset and reiser4 patches.
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Tue May 06, 2014 7:40 am    Post subject: Reply with quote

Great news for Reiser4 SSD users, there's reiser4-for-3.14.1.patch with included COW support https://forums.gentoo.org/viewtopic-t-986096.html and also patches for trim support https://forums.gentoo.org/viewtopic-t-990364.html got published on the m-l so I've updated the FAQ.

Now where are those benchmarks when you need them..... ;)
_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Sun Nov 02, 2014 9:39 am    Post subject: Reply with quote

So reiser4-for-3.17 is out and now the old undeletable directory bug is fixed, you can use trim, COW on your SSDs and reiser4progs-1.0.9 can make block discard on whole new reiser4 to be formated partition with option -d
And I've updated the FAQ :)

Quote:
Reiser4-for-3.17.2:
Ivan Shapovalov . Space grabbing fixes
Reiser4-for-3.17: . Port for Linux-3.17
Reiser4-for-3.16.2: Ivan Shapovalov . Add basic discard support for SSD devices
Reiser4-for-3.16.1: Edward Shishkin . Port for Linux-3.16; . Fix the problem of non-deletable directories.

_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
WWWW
Tux's lil' helper
Tux's lil' helper


Joined: 30 Nov 2014
Posts: 143

PostPosted: Tue Dec 09, 2014 9:32 pm    Post subject: Reply with quote

Interesting filesystem, with COW and crypt. ZFS open source can't do crypt.

I should give it a try. Does it perform well under virtualization?

Does R4 compare to btrfs and zfs or more like traditional xfs/etx4/etc filesystems?

The strong points of brfs/zfs are the automatic backups and such. Even LVM doesn't have automatic backup.

How would be a complete solution with this fs? LVM+R4+rsync?

grazzie
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Wed Dec 10, 2014 8:33 am    Post subject: Reply with quote

WWWW wrote:
Interesting filesystem, with COW and crypt. ZFS open source can't do crypt.

Well crypt support is still not usable in R4, only compression part.

WWWW wrote:

I should give it a try. Does it perform well under virtualization?

TBH I don't use it under VM so I don't know


WWWW wrote:
Does R4 compare to btrfs and zfs or more like traditional xfs/etx4/etc filesystems?


Interesting question. Lets say it's between them.
btrfs and zfs pack all the bells and whistles in the filesystem with all pros and cons that go with it.
xfs/extX/jfs etc. use external code for additional features with all pros and cons that go with it.

R4 is more like later ones, but it has compression, partial checksum etc. in FS code. The idea is to only code in those features that you have to.
There's one feature in R4 none others have and that is that you can choose what type of FS you'd like (journaling, COW or hybrid) per partition so you can have a SSD with COW R4 and HDDs with Hybrid or Journaling R4 filesystem in same machine.

WWWW wrote:

The strong points of brfs/zfs are the automatic backups and such. Even LVM doesn't have automatic backup.

How would be a complete solution with this fs? LVM+R4+rsync?

grazzie


Well I use R4+ Rsnapshot for backup/snapshots. IMHO a more robust solution.

I'll add your questions to the FAQ, thanks.
_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Tue Feb 17, 2015 9:52 pm    Post subject: New Reiser4 feature Reply with quote

Reiser4 just got one new feature, again unique among Linux FSes: Precise real-time discard

When executing online discard/TRIM FSes generate garbage over time since erase units in general don't coincide with file system blocks. Precise real-time discard for Reiser4 doesn't generate garbage.

Quote:
Precise real-time discard in Reiser4
for SSD devices


Efficient implementation of real-time discard which
doesn't lead to accumulation of garbage on disk (set of
erase units which are marked as free in the file system
space map, but discard requests wasn't issued for them),
and, hence, rids of need to periodically run fstrim
(batch discard) on the device



Introduction



Real-time discard support(*) means that file system issues discard
requests, i.e. informs the block layer about extents of freed space.

Currently all Linux file systems with announced feature of real-time
discard support issue so-called "lazy" (or "non-precise") discard
requests. It means that such file systems report exactly about blocks
that were freed. Since erase units in general don't coincide with file
system blocks, such "lazy" technique leads to accumulation of garbage
on disk.

DEFINITION. Garbage is a set of erase units on disk, which are marked
free in the file system space map, but discard requests for them were
not issued.

Indeed, for example, if erase unit is larger than file system block,
then it can happen that "lazy" discard request contains partial erase
units, so that the block layer will round up the start and round down
the end of such discard request. This is because on the one hand trim
operation is defined only for whole erase units. On the other hand,
the block layer doesn't know the status of erase unit, which is freed
only partially, and hence it makes an assumption that it its other
part is "busy" in the file system's space map (the alternative
assumption can lead to data corruption). Note, however, that if such
"forced" assumption is incorrect, and the whole erase unit becomes
free, then such erase unit will become a garbage.

With lazy discard policy user needs to run special tools to clean up
the accumulated garbage.

So, it would be nice to check the status of partially freed erase
units and issue discard request for such unit, if its other part is
also free (marked as free in the file system's space map). The block
layer is not able to perform such checks for obvious reasons: this is
a business of the file system. Below we prove that such checking of
partially freed erase units and issuing discard requests for the
padded extents (we'll call it "precise discard requests") doesn't lead
to accumulation of garbage.

Efficient issuing precise discard requests without performance drop
and ugly workarounds is possible only if the file system possesses an
advanced transaction manager like the one of Reiser4.

Initial idea of precise discard and its implementation of complexity
N_u (where N_u is total number of erase units (including partial ones)
in the resulted set of sorted and merged discard requests) belongs to
Ivan Shapovalov.

Edward Shishkin suggested implementation of complexity 2*N_e, where
N_e is total number of extents in such resulted set.

(*) For more details about trim/discard see
http://en.wikipedia.org/wiki/Trim_(computing)



1. (De)allocation, discard units and alignment.
Non-precise and precise coordinates



The minimal unit of all (de)allocation operations in a file system is
a file system block of blk_size.

The minimal unit of all discard operations is a so-called erase unit
of EUS size.

Every file system block can be addressed by its (block) number.
In this case we'll say about addressing in the system of non-precise
coordinates 0Y.

In contrast with non-precise coordinates we'll also consider a system
0X of precise coordinates, where every individual byte on the disk can
be addressed.

In the system 0Y we'll consider (non-precise) extents of blocks (U,V),
where U is the number of the start block, and V is the width of the
extent (in blocks).

In the system 0X we'll consider precise extents of bytes [AB], where A
(A < B) is offset of the first byte and (B-1) is offset of the last
byte of the extent. So, the length of such segment is B-A.

Erase unit size in bytes (EUS) is a property of SSD drive.
Generally erase units don't coincide with file system blocks, so
we'll address erase units in the system OX of precise coordinates
by precise extents. In particular, every erase unit of some SSD
partition is represented in precise coordinates as extent
[EUO + N * EUS, EUO + (N+1)*EUS] for some natural N, where
EUO is the offset of the first complete erase unit (0 <= EUO < EUS).
That is, EUO is a property of individual partitions of SSD drives.
EUO is also called as "alignment".



3. Lazy (non-precise) discard policy.
Accumulation of garbage on disk



The policy of lazy (non-precise) discard is rather simple: if any
extent of blocks (U,V) is freed by the file system, then we issue
discard request for the extent [U * blk_size, (U+V) * blk_size].

Suppose now that EUS != blk_size, or EUO != 0

Suppose, the file system deallocates an extent (2,5) and issue the
respective "lazy" discard request [2 * blk_size, 7 * blk_size],
see the picture below. The block layer assumes that the neighboring
blocks #1 and #7 are busy, and, hence, issues discard request for
the smaller segment [AB].

Note, however, that if this assumption is incorrect, and blocks #1
and (or) #7 were actually free, then after freeing the extent (2,5)
we'll have that the whole erase units [A - EUS, A] and [B, B + EUS]
are marked as free in the file system space map, and hence, will
replenish the garbage. So, the lazy (non-precise) discard policy
leads to accumulation of garbage on disk.



* * * * * * * * * > Y
0 1 2 3 4 5 6 7 8
0 blk_size 3*blk_size
*-------*-------*-------*-------*-------*-------*-------*-------*--> X
---+--------+--------+--------+--------+--------+--------+--------+> X
0 EUO A-EUS A B B+EUS



Comment. There are 2 independent "sources" of garbage in "lazy"
discard policy:

1) "bad" values of erase unit size (EUS != blk_size);
2) "bad" values of alignment (EUO != 0);



4. Precise discard



The idea is to check all "partially deallocated" erase units. If the
whole such unit is marked as free in the file system space map, then
we (file system) issue a discard request for the whole unit. That is,
in contrast with the lazy discard policy, the file system provides
correct status of every partially deallocated discard unit and issues
precise discard request for the larger (padded) extents.

Let's consider the previous example. In accordance with the precise
discard policy file system checks the status of blocks #1 and #7.

If both blocks #1 and #7 are free, then file system issues a discard
request [A - EUS, B + EUS].

If block #1 is free and block #7 is busy, then file system issues
a discard request [A - EUS, B].

If block #1 is busy and block #7 is free, then the file system will
issue discard request [A, B + EUS].

Finally, if both blocks #1 and #7 are busy, then the file system
issues discard request [A, B].

Note that block layer won't restrict such "precise" discard requests,
and, moreover, the following statement takes place:

THEOREM. The policy of precise discard doesn't lead to accumulation
of garbage on disk.

Proof (sketch). Indeed, suppose that disk doesn't contain "garbage".
That is dicard request was issued for every erase unit, which is
marked as free in the file system space map.

Suppose, the file system deallocates extent (2, 5).

If the block #1 is busy and block #7 is free, then, in accordance with
precise discard policy, the file system issues "precise" discard
request [A, B + EUS]. Note that we must not discard the unit
[A - EUS, A], since it contains bytes of the busy block #1. Also,
note that we don't need to discard other units due to the assumption,
that before deallocation disk didn't contain garbage.
Thus, we have that discard requests have been issued for every
erase unit, which is marked as free in the file system space map.

By the similar way we can prove that "precise" discard policy doesn't
leave garbage on disk in other 3 cases (when block #1 is free and
block #7 is busy, both blocks #1 and #7 are free, and both blocks #1
and #7 are busy.



5. Implementation of precise discard



The straightforward solution is to check the status of partially
deallocated erase units in the file system's space map. However,
efficient implementation of such solution requires an advanced
transaction manager and not less advanced block allocator.
In particular, you need to make sure that nobody will occupy the
other parts of your partially deallocated erase units while you are
issuing precise discard requests for them (otherwise, data
corruption is possible).

Reiser4 block allocator manages the following in-memory
data-structures:

. working space map (W)
. commit space map (C)
. deallocation set (D)

Allocation in Reise4 is always going on the working space map:

(1) W' = alloc(W, R); - allocate a set R of block numbers in the
working space map W.

Deallocation is a bit more complicated: all freed block numbers at
first are recorded in a special data structure - deallocation set D:

(2) D' = dealloc(D, R);

Before committing a transaction we update the commit space
map C at so-called pre_commit_hook():

(3) C' = apply(C, D);

After committing the transaction, that is after issuing all write
requests (including the commit space map C) we prepare and
issue discard requests in so-called post_write_back_hook():

(4) prepare_and_issue_discard_requests();

After issuing discard requests we update the working space map:

(5) W' = apply(W, D');


Handling paddings of partial erase units


When preparing discard set at stage (4) we check head (tail) padding
of every partial erase unit. If it is free, we allocate it at the
working space map:

W" = alloc(W', R');

At the same time we record the allocated paddings to the deallocation
set:

D" = dealloc(D', R');

Updating the working space map at the stage (5) automatically
deallocates the paddings:

apply(W", D") = W" \ R' = (W' + R')\ R'= W'.



6. How to test



Apply the patch against reiser4-for-3.17.3:
http://sourceforge.net/projects/reiser4/files/patches/3.17.3-reiser4-precise-discard-support.patch.gz

Format a reiser4 partition with reiser4progs-1.0.9. Use mkfs option
-d to "discard" the whole partition on your SSD drive at format time.

We recommend to use compression for SSD drives (by default it is
turned on).

Mount a reiser4 partition with mount option "discard".
Find a kernel message about discard support:
reiser4: sdX: enable discard support (erase unit Y bytes,
alignment Z bytes)

We recommend to use Write-Anywhere (AKA Copy-On-Write) transaction
model for SSD drives (mount option "txmod=wa").

Also we recommend to use mount option "noatime" for SSD drives.

_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
ulenrich
Veteran
Veteran


Joined: 10 Oct 2010
Posts: 1358

PostPosted: Tue Feb 17, 2015 10:30 pm    Post subject: Reply with quote

@dusanc
very interesting , but a bit over my level of knowledge, why is some space registered as freed at first garbage:
Is it because in contrary to spining disks the ssd technique needs to nullify before something can be reused? If this is the case, perhaps the older lazy trim technique has a performance advantage? For example think of some actual writing operation at the same block a R4 real-time discard happens. Did someone measure performance already using this feature?
_________________
fun2gen2
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Tue Feb 17, 2015 10:55 pm    Post subject: Reply with quote

ulenrich wrote:
@dusanc
very interesting , but a bit over my level of knowledge, why is some space registered as freed at first garbage:
Is it because in contrary to spining disks the ssd technique needs to nullify before something can be reused?

Well yes, SSDs need to first erase the block before writing to it. But the problem is more complicated than that as write amplification may happen and make things much slower.
ulenrich wrote:
If this is the case, perhaps the older lazy trim technique has a performance advantage?

And what if you use up all the reserved empty space before you do a lazy trim? Here's a nice article on Anandtech about performance drop because SSD uses up all free blocks and having to perform a read-modify-write for all subsequent writes (write amplification goes up, performance goes down).
_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
ulenrich
Veteran
Veteran


Joined: 10 Oct 2010
Posts: 1358

PostPosted: Wed Feb 18, 2015 12:58 am    Post subject: Reply with quote

dusanc wrote:
But the problem is more complicated than that as write amplification may happen and make things much slower.
Wow, I need a month to study the technology described at wikipedia. At first sight seems to me a filesystem should recognize the specifica of the actual SSD model and handle it special :( ... unbelievable ...
_________________
fun2gen2
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Wed Feb 18, 2015 5:38 am    Post subject: Reply with quote

ulenrich wrote:
dusanc wrote:
But the problem is more complicated than that as write amplification may happen and make things much slower.
Wow, I need a month to study the technology described at wikipedia. At first sight seems to me a filesystem should recognize the specifica of the actual SSD model and handle it special :( ... unbelievable ...

It's specific to how NAND flash works so it's common for all SSDs. That's why a controller in SSD is so important, it tries to minimise the penalties.
But the controller doesn't know what happens inside FS and that's why FSes have to have additional features like above aka SSD support .
_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Fri Aug 21, 2015 11:05 am    Post subject: New Reiser4 feature Reply with quote

Reiser4 got 2 new features: Reiser4 (meta)data checksums and Auto-punching holes on commit


Quote:
Reiser4 (meta)data checksums


1. Why protect (meta)data?


We want to be protected against hardware problems such as data rot in
memory and decay of storage media. We want to be sure that our data
structures are consistent, because working with corrupted data
structures is dangerous.

Strictly speaking, such protection is not a business of a file system.
It would be more logical to assume that this is a business of the
upper and the lower subsystems. To be precisely, protection against
data rot in memory is a business of the memory controller, and
protection against decay of storage media is a business of the block
device controller/driver.

However, frequently the mentioned subsystems don't provide such
protection for various reasons. As the result the file system suffers
(becomes corrupted, inconsistent), and poor users start to blame file
system developers.


2. Why "inline" checksums?


Reiser4 stores per-node checksum right in the node that we want to
protect. This is much more efficient than using dedicated data
structures for checksums, as we don't need to launch expensive search
procedures every time when we need to access a checksum. Using
dedicated data structures to store checksums is a design mistake.


3. When we check/update per-node checksums?


Let's start from protection against storage media decay. If someone
wants protection against data rot in memory, then let me know.

Since we implement protection against storage media decay, it is
enough to check [update] a checksum right after IO completion [before
submitting IO request]. We don't need to update a checksum after every
modification. So, updating checksums in Reiser4 is a delayed action.
Reiser4 updates per-node checksum at commit time right before writing
the node to disk. At the moment of checksum update any process
modifying this node will be blocked on an attempt to acquire an
exclusive access:

longterm_lock_znode -> try_capture_block

Thus, updated checksum won't be "spoiled" before hitting the disk.

Checksum verification is going right after read IO completion in the
->parse() method of node plugin.


4. How we handle corruptions


If node's checksum verification failed, them further working with such
node is dangerous. Currently user has 2 options for online handling
this situation:

1) kernel panic (default behavior);
2) remount reiser4 partition as read-only (if mount option
"onerror=remount-ro" was specified).
In both cases user should repair his partition offline by fsck.

TODO: Online failover mode is in plans.

For this mode we need to support mirror(s). Every in-memory replica
gets updated at the moment of the checksum update. At the finish of
transaction commit all replicas have to be written to the mirror.
If checksum verification failed, then we issue a read IO request for
the replica block of the mirror.
Comment. Mirrors can be internal (when we allocate replicas on the
same partition) and external (when we allocate replicas on different
device).


5. Why use crc32c for checksums?


Modern CPUs have instructions, which allow to compute a full 32-bit
CRC step in 3 cycles.


6. How to protect data?


Currently we don't support checksums for unformatted blocks, where
bodies of large files are stored.
If you want to protect your data (not only metadata), then you have
3 options:

1) Make sure that reiser4 stores bodies of your files in fragments

(i.e. "inline" data chunks). Fragments are always stored in formatted
nodes, which are protected by checksums.

It is possible with mkfs option "formatting=tails" for files managed
by unix_file plugin (if you don't use compression) or
"compressMode=latt" for files managed by cryptcompress plugin (if you
use compression).

NOTE. This option will lead to performance degradation (especially for
delete operations).

2) Protect your data by yourself.

If a file system guarantees consistency of metadata, then data
protection can be successfuly implemented in the user-space. Indeed,
since file body is uniquely determined by extent pointers, which are
guaranteed to be consistent, then checking consistency of the file's
body in user space is always a correct operation. So, feel free to
check your data in the user-space: we have provided basis for this.

3) Implement checksums for unformatted nodes in reiser4.

This option requires a new format for extent pointers (which will
include a 32-bit field for checksum), and, respectively, a new item
plugin (extent-pointer-with-checksum, or so).


7. How to enable checksum support in reiser4


Specify mkfs.reiser4 option "-o node=node41" when formatting your
partition and mount as usual. We recommend to use mount option
"onerror=remount-ro", so reiser4 won't panic on failed checksum
verification.


8. Compatibility with other features


Checksums are compatible with all reiser4 features.

Adding a checksum support is a great example of how reiser4 resists
the problem of creeping featurism. We just added a new node plugin,
which manages nodes of a new format (node41) with a 32-bit field for
the checksum. The new plugin mostly reuses methods of the old one
(node40) as you can see from the following patches:

http://marc.info/?l=reiserfs-devel&m=142359111509525&w=2
http://marc.info/?l=reiserfs-devel&m=142359112409527&w=2


9. TODO


A. Failover via mirroring (see section 4 for implementation hints).

B. Maintain checksums for the superblock and bitmap blocks.

Comment. We already have such support for bitmap blocks, however, it
uses adler32 and checksums update/verification is not invoked for some
historical reasons. I suggest to replace adler32 with crc32c and
trigger the update/verification.

Comment. For superblock protection we need to add a 32-bit field to
the disk superblock and update/verify it like in the case of formatted
nodes.


Quote:
Auto-punching holes on commit


Storing zeros on disk is a rather stupid business. Indeed, right before
writing data to disk we can convert zeros to holes (this is abstract
objects described in POSIX), and, hence, save a lot of disk space.

Compressing zeros before storing them on disk is even more stupid
business: checking for zeros is less expensive procedure than
compression transform, so in addition we can save a lot of CPU
resources.

I'll remind how reiser4 implements holes.
The unix file plugin represents them via extent pointers marked by
some special way. The situation with cryptcompress file plugin is more
simple: it represents holes as literal holes (that is, absence of any
items of specific keys). It means that we can simply check and remove
all items, which represent a logical chunk filled with zeros. This is
exactly what we do now at flush time right before commit.

The best time for such check is atom's flush, which is to complete all
delayed actions. Specifically, it calls a static machine ->convert_node()
for all dirty formatted nodes. This machine scans all items of a node
and calls ->convert() method of every such item.

We used this framework for transparent compression on commit
(specifically to replace old fragments that compose compressed file's
body with the new ones). Now we use it also to punch holes at logical
chunks filled with zeros. That is, instead of replacing old items, we
just remove them from tree. Think of hole punching like of one more
delayed action.

I have implemented hole punching only for cryptcompress plugin. It also
can be implemented for "classic" unix-file plugin, which doesn't compress
data. However, it will be more complicated because of more complicated
format of holes. Finally, I think that having such feature only for one
file plugin is enough.


Solved Problems:


When flushing modified dirty pages, the process should be able to find
in the tree a respective item group to be replaced with new data. So we
should handle possible races when one process checks/creates the items
and the flushing process deletes those items during hole punching
procedure. To avoid this situation we maintain a special "economical"
counter of checked-in modifications for every logical cluster in struct
jnode. If the counter is greater than 1, then we simply don't punch a
hole.


Mount option "dont_punch_holes"


Since hole punching is useful feature for both HDD and SSD, I enabled it
by default. To turn it off use the mount option "dont_punch_holes". The
changes are backward and forward compatible, so no new format is needed.

_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6069
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Mon Aug 31, 2015 4:42 pm    Post subject: Reply with quote

Looks like reiser4 might be ripe for some testing fun :D


Reiser4: Format 4.0.1: Meta(data) checksums


Edward Shishkin wrote:
Hi all,

This is the first release of new software versions of reiser4 kernel
module and reiser4prods, which invoke the builtin mechanism of
tracking compatibility basing on versions of reiser4 plugin library.
More about this can be found here:
https://reiser4.wiki.kernel.org/index.php/Reiser4_development_model

Reiser4progs of the new software release version number 4.0.1
are provided by the package reiser4progs-1.1.0. Please find here:
https://sourceforge.net/projects/reiser4/files/reiser4-utils/reiser4progs

Reiser4 kernel module of the new software release version number
4.0.1 is provided by the patch reiser4-for-4.1.5. Please find here:
https://sourceforge.net/projects/reiser4/files/reiser4-for-linux-4.x/

At mount time disk format version of your partition's superblock will
be upgraded by the kernel, and you will be suggested to complete the
upgrade offline by fsck:

# dmesg | grep [rR]eiser4
Loading Reiser4 (format release: 4.0.1).
reiser4: sda5: found disk format 4.0.0.
reiser4: sda5: upgrading disk format to 4.0.1.
reiser4: sda5: use 'fsck.reiser4 --fix' to complete disk format upgrade.

You may ignore this suggestion and continue to work as usual.
However, I would suggest to find a time and complete the upgrade.

NOTE: after mount in the new kernel you won' t be able to
check your partition by reiser4progs-1.0.9 and older versions, so
please upgrade your reiser4progs BEFORE. Everyone who relies on
liveCDs with reiser4progs-1.0.9, or older versions: be careful!

More about reiser4 (meta)data checksums can be found here:
https://reiser4.wiki.kernel.org/index.php/Reiser4_checksums

Thanks,
Edward.

_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Sun Sep 25, 2016 9:53 am    Post subject: Reply with quote

New feature Logical Volumes:
http://marc.info/?l=reiserfs-devel&m=147475719301215&w=4

Quote:


Reiser4 will support logical (compound) volumes. For now we have
implemented the simplest ones - mirrors. As a supplement to existing
checksums it will provide a failover - an important feature, which
will reduce number of cases when your volume needs to be repaired by
fsck.

Reiser4 subvolume is a component of logical volume. Subvolume is
always associated with a physical, or logical (built of RAID, LVM,
etc means) block device. Every subvolume possesses:

. volume ID;
. subvolume ID;
. mirror ID;
. number of replicas.

mirror ID is a serial number from 0 till 65535. Subvolume with mirror
ID 0 has a special name - original. Other ones are called replicas.
We use to say "original A has a replica B" (or "B replicates A",
which is the same), iff A and B possess the same subvolume ID.
Original with all its replicas are called "mirrors".

For subvolumes we have introduced a special disk format plugin
"format41". In accordance with Reiser4 development model it means
forward incompatibility. We have introduced it intentionally, for
protection. Indeed, for clear reasons users must not have possibility
to RW-mount separate replicas (without originals).
The multi-device extension is backward compatible: all volumes of the
old format (format40) are supported as logical volumes composed of
only one (original) subvolume.


Registration and activation of subvolumes


For now every Reiser4 logical volume has only one original subvolume.
Number of replicas can be 0, or more. Logical volume can be mount
by usual mount command. Simply specify any its subvolume (the
original, or some its replica). The only condition is that original
and all its replicas should be registered in the system. If original,
or some its replica are not registered, then mount will fail with a
respective kernel message.

Currently there is no tool to register specified subvolume (TBD).
However, mount command always tries to register the specified device.
The registration policy is "sticky". It means that your device won't
be unregistered after umount, as well as failed mount. (You will be
able to unregister it mandatory by a special tool - TBD).

Procedure of registration reads the master super-block of the
subvolume and puts the subvolume header to a specilal list of
registered subvolumes.

Mounting a logical volume activates all its registered components.
Procedure of activation reads format super-block of the subvolume, and
performs other actions like initialization of space maps, transaction
replay, etc. as specified by the method ->init_format() of respective
disk format plugin. Pointer to an activated subvolume is placed to a
special table of active subvolumes.


Mirror operations


So original and mirrors actually represent RAID0 on the filesystem
level.

COMMENT. We aren't engaged in marketing fraud on collecting all
features of the block layer's RAID and LVM. Reiser4 mirrors implement
a failover, that block layers's RAID0 is not able to provide.

It will be possible to "upgrade", or "downgrade" a reiser4 array of
mirrors by attaching / detaching online one, or more replicas by
special user-space tools (mirror.reiser4, TBD). Also by those tools it
will be possible to swap original with any its replica, or make a new
original from any replica, if the old one is lost for some reasons.

Fsck will refuse to check/repir replica. Fsck is supposed to work only
with original subvolumes. After mounting an fsck-ed original, kernel
will automatically run a special on-line backgroud procedure (scrub)
in order to synchronize the repaired original with all its replicas.

Once in a while user has to check his array of mirrors by running
scrub in the background mode.

WARNING: Bear in mind once and forever: Replica is not a backup!!!


Technical Notes


1. Reiser4 Transaction Design document is transferred to logical
volumes without any modifications, but with a small addition. Atom is
now composed of per-subvolume components.

2. By design all mirrors differ only in mirror-IDs which are stored in
master super-block. Format super-blocks of mirrors are identical. This
approach provides best performance and full parallelism in issuing IO
requests for mirrors. The minus is a small compromise in design,
according to which master super-block doesn't participate in
transactions. It means that mirror operations on upgrading/degrading/
swapping can not spawn usual transactions, which can be committed
and (re)played using existing transaction manager. That is, mirror
operations won't survive a system crash. If a system crash happens
during a mirror operation, then the mirror structure should be
checked/fixed offline by the mirror tools (kernel will refuse to mount
unchecked array of mirrors). Fortunately, all critical mirror
operations issue small number of IO requests, so that probability of
their interruption is close to zero.

3. We don't commit transactions on all mirrors, only on the original
subvolume (this is the single functional difference of original and
its replicas). Transaction (re)play, of course, is going on all
mirrors using the wandering maps/blocks of the original subvolume.


How to test the new features


Checkout branch "format41" of the upstream reiser4 and reiser4progs
git repos on https://github.com/edward6 Build and install as usual.

Mirrors can be created by mkfs.reiser4 option -m. If this option is
specified, then the first listed device will be the original, other
ones - replicas. All devices of an array should have the same size.
Further we'll avoid that restriction.

IMPORTANT: when creating mirrors specify node41 plugin (with checksum
support). Otherwise, your mirrors won't be more useful than block
layer's RAID0.

Register all your mirrors, trying to "mount" them one-by-one in any
order. If you have N mirrors (i.e. one original and N-1 replicas),
then first N-1 mount commands will fail. Of course, it is not too
graceful, but this is temporal solution. The N-th "attempt" should
succeed. Have a fun. Unmount as usual.


Example


Suppose we have 2 partitions /dev/sda7 and /dev/sda8 of equal
size. Let's create an array of 2 mirrors:

# mkfs.reiser4 -my -o node=node41 /dev/sda7 /dev/sda8

Take a look at original subvolume:

# debugfs.reiser4 /dev/sda7

Take a look at replica:

# debugfs.reiser4 /dev/sda8

Find differences ;)

Register the original subvolume

# mount /dev/sda7 /mnt
mount: wrong fs type, bad option, bad superblock blablabla....
# dmesg
reiser4[mount(20914)]: check_active_replicas
(fs/reiser4/init_volume.c:268)[edward-1750]:
WARNING: /dev/sda7 requires replicas, which are not registered.

Register the replica and mount the array:

#mount /dev/sda8 /mnt
#dmesg

reiser4: registered subvolume (/dev/sda8)
reiser4 (sda8): found disk format 4.0.1.
reiser4 (/dev/sda7): using Hybrid Transaction Model.

Let's copy a file /etc/services to our array of mirrors:

# cp /etc/services /mnt/.

Unmount the array:

# umount /mnt

Find a root block: it goes the first in the tree dump:

# debugfs.reiser4 -t /dev/sda7

In our case the root block has blocknumber #79

Let's now take a look on how our failover works. The death defying
act: we erase the root block of the original subvolume:

# dd if=/dev/zero of=/dev/sda7 bs=4096 count=1 seek=79

We know that the mount procedure load the root block. Let's try to
mount our array with the corrupted root block:

# mount /dev/sda8 /mnt

Everything works..
Take a look at kernel messages:

# dmesg
reiser4[mount(21224)]: parse_node41
(fs/reiser4/plugin/node/node41.c:79)[edward-1645]:
WARNING: block 79 (/dev/sda7): bad checksum. Please, scrub the volume.


TODO


1) Mirror tools (upgrade/downgrade a mirror array, swap original and
specified replica, convert replica to an original, visualization of
mirror
arrays, etc);
2) Scrub (online background checking and synchronizaton of mirrors);
3) Checksumming format super-block;
4) Issuing discard requests for replicas on SSD devices.

All items are very simple to implement. If anyone cares, then I'll
provide details.

_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6069
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sun Sep 25, 2016 10:14 am    Post subject: Reply with quote

Don't forget the reiser4 github repository :)

https://github.com/edward6/reiser4


and the reiser4progs , libaal repos (https://github.com/edward6?tab=repositories)
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
dusanc
Apprentice
Apprentice


Joined: 19 Sep 2005
Posts: 235
Location: Serbia

PostPosted: Sun Sep 25, 2016 8:43 pm    Post subject: Reply with quote

kernelOfTruth wrote:
Don't forget the reiser4 github repository :)

https://github.com/edward6/reiser4


and the reiser4progs , libaal repos (https://github.com/edward6?tab=repositories)


Thanks, added to FAQ.
_________________
Reiser4 Gentoo FAQ [25Sep2016]
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6, 7
Page 7 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum