-

CVE-2025-68358

In the Linux kernel, the following vulnerability has been resolved:

btrfs: fix racy bitfield write in btrfs_clear_space_info_full()

From the memory-barriers.txt document regarding memory barrier ordering
guarantees:

 (*) These guarantees do not apply to bitfields, because compilers often
     generate code to modify these using non-atomic read-modify-write
     sequences.  Do not attempt to use bitfields to synchronize parallel
     algorithms.

 (*) Even in cases where bitfields are protected by locks, all fields
     in a given bitfield must be protected by one lock.  If two fields
     in a given bitfield are protected by different locks, the compiler's
     non-atomic read-modify-write sequences can cause an update to one
     field to corrupt the value of an adjacent field.

btrfs_space_info has a bitfield sharing an underlying word consisting of
the fields full, chunk_alloc, and flush:

struct btrfs_space_info {
        struct btrfs_fs_info *     fs_info;              /*     0     8 */
        struct btrfs_space_info *  parent;               /*     8     8 */
        ...
        int                        clamp;                /*   172     4 */
        unsigned int               full:1;               /*   176: 0  4 */
        unsigned int               chunk_alloc:1;        /*   176: 1  4 */
        unsigned int               flush:1;              /*   176: 2  4 */
        ...

Therefore, to be safe from parallel read-modify-writes losing a write to
one of the bitfield members protected by a lock, all writes to all the
bitfields must use the lock. They almost universally do, except for
btrfs_clear_space_info_full() which iterates over the space_infos and
writes out found->full = 0 without a lock.

Imagine that we have one thread completing a transaction in which we
finished deleting a block_group and are thus calling
btrfs_clear_space_info_full() while simultaneously the data reclaim
ticket infrastructure is running do_async_reclaim_data_space():

          T1                                             T2
btrfs_commit_transaction
  btrfs_clear_space_info_full
  data_sinfo->full = 0
  READ: full:0, chunk_alloc:0, flush:1
                                              do_async_reclaim_data_space(data_sinfo)
                                              spin_lock(&space_info->lock);
                                              if(list_empty(tickets))
                                                space_info->flush = 0;
                                                READ: full: 0, chunk_alloc:0, flush:1
                                                MOD/WRITE: full: 0, chunk_alloc:0, flush:0
                                                spin_unlock(&space_info->lock);
                                                return;
  MOD/WRITE: full:0, chunk_alloc:0, flush:1

and now data_sinfo->flush is 1 but the reclaim worker has exited. This
breaks the invariant that flush is 0 iff there is no work queued or
running. Once this invariant is violated, future allocations that go
into __reserve_bytes() will add tickets to space_info->tickets but will
see space_info->flush is set to 1 and not queue the work. After this,
they will block forever on the resulting ticket, as it is now impossible
to kick the worker again.

I also confirmed by looking at the assembly of the affected kernel that
it is doing RMW operations. For example, to set the flush (3rd) bit to 0,
the assembly is:
  andb    $0xfb,0x60(%rbx)
and similarly for setting the full (1st) bit to 0:
  andb    $0xfe,-0x20(%rax)

So I think this is really a bug on practical systems.  I have observed
a number of systems in this exact state, but am currently unable to
reproduce it.

Rather than leaving this footgun lying around for the future, take
advantage of the fact that there is room in the struct anyway, and that
it is already quite large and simply change the three bitfield members to
bools. This avoids writes to space_info->full having any effect on
---truncated---
Verknüpft mit AI von unstrukturierten Daten zu bestehenden CPE der NVD
Diese Information steht angemeldeten Benutzern zur Verfügung. Login Login
Daten sind bereitgestellt durch das CVE Programm von einer CVE Numbering Authority (CNA) (Unstrukturiert).
HerstellerLinux
Produkt Linux
Default Statusunaffected
Version < db4ae18e1b31e0421fb5312e56aefa382bbc6ece
Version 957780eb2788d8c218d539e19a85653f51a96dc1
Status affected
Version < 6f442808a86eef847ee10afa9e6459494ed85bb3
Version 957780eb2788d8c218d539e19a85653f51a96dc1
Status affected
Version < 742b90eaf394f0018352c0e10dc89763b2dd5267
Version 957780eb2788d8c218d539e19a85653f51a96dc1
Status affected
Version < 38e818718c5e04961eea0fa8feff3f100ce40408
Version 957780eb2788d8c218d539e19a85653f51a96dc1
Status affected
HerstellerLinux
Produkt Linux
Default Statusaffected
Version 4.8
Status affected
Version < 4.8
Version 0
Status unaffected
Version <= 6.12.*
Version 6.12.68
Status unaffected
Version <= 6.17.*
Version 6.17.13
Status unaffected
Version <= 6.18.*
Version 6.18.2
Status unaffected
Version <= *
Version 6.19-rc1
Status unaffected
Zu dieser CVE wurde keine CISA KEV oder CERT.AT-Warnung gefunden.
EPSS Metriken
Typ Quelle Score Percentile
EPSS FIRST.org 0.03% 0.063
CVSS Metriken
Quelle Base Score Exploit Score Impact Score Vector String
Es wurden noch keine Informationen zu CWE veröffentlicht.