bcachefs format on a big-endian (s390x) machine crashes down in the
rhashtable code imported from the kernel. The reason this occurs
lies within the rht_lock() -> bit_spin_lock() code, the latter of
which casts bitmaps down to 32-bits to satisfy the requirements of
the futex interface.
The specific problem here is that a 64 -> 32 bit cast doesn't refer
to the lower 8 bytes on a big endian machine, which means setting
bit number 0 in the 32-bit map actually corresponds to bit 32 in the
64-bit map. The rhashtable code specifically uses bit zero of the
bucket pointer for exclusion and uses native bitops elsewhere (i.e.
__rht_ptr()) to identify NULL pointers. If bit 32 of the pointer is
set by the locking code instead of bit 0, an otherwise NULL pointer
looks like a non-NULL value and results in a segfault.
The bit spinlock code imported by the kernel is originally intended
to work with unsigned long. The kernel code is able to throttle the
cpu directly when under lock contention, while the userspace
implementation relies on the futex primitives to emulate reliable
blocking. Since the futex interface introduces the 32-bit
requirement, isolate the associated userspace hack to that
particular code.
Restore the native bitmap behavior of the bit spinlock code to
address the rhashtable problem described above. Since this is not
compatible with the futex interface, create a futex wrapper
specifically to convert the native bitmap type to a 32-bit virtual
address and mask value for the purposes of waiting/waking when under
lock contention.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Hi
Here I'm sending a small compile fix for bcachefs-tools.
Without this patch, I get this error:
cargo build --release --manifest-path rust-src/Cargo.toml
Compiling bch_bindgen v0.1.0
(/usr/src/git/bcachefs-tools/rust-src/bch_bindgen)
error: failed to run custom build command for `bch_bindgen v0.1.0
(/usr/src/git/bcachefs-tools/rust-src/bch_bindgen)`
Caused by:
process didn't exit successfully:
`/usr/src/git/bcachefs-tools/rust-src/target/release/build/bch_bindgen-733e88995ce9eab7/build-script-build`
(exit status: 101)
--- stderr
warning: optimization flag '-fkeep-inline-functions' is not supported
[-Wignored-optimization-argument]
../../include/linux/bit_spinlock.h:20:3: error: call to undeclared
function 'futex'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
../../include/linux/bit_spinlock.h:28:2: error: call to undeclared
function 'futex'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
../../include/linux/bit_spinlock.h:39:2: error: call to undeclared
function 'futex'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
The futex() function is declared in
/usr/include/x86_64-linux-gnu/urcu/futex.h
It is not declared in linux/futex.h, so we need to include urcu/futex.h
Mikulas
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
cpu_relax() is supposed to be a compiler barrier - this fixes a bug with
btree_write_buffer_flush() getting stuck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- Spin up a background thread to call the shrinkers every 1 second
- Memory allocations will only call reclaim after a failed allocation,
not every single time
This will be a major performance boost on allocation intensive
workloads.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The musl C library does not define __attribute_const__. Add it to
include/linux/compiler.h with a guard to fix build against musl libc.
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
This would correspond to GFP_RECLAIM in the kernel - but we don't
distinguish between different types of reclaim here.
This solves a deadlock in the btree node memory allocation path - we
allocate with the btree node cache lock held but without GFP_KERNEL set.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
printk_ratelimited was behind an #ifdef CONFIG_PRINTK, which we don't
define, so it was a complete noop - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We're reading device metadat in mostly sequential order - buffered IO
will be faster than O_DIRECT.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
After memory allocation failure, don't rely on /proc/meminfo to figure
out how much memory we should free - instead unconditionally free 1/8th
of each cache.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bcachefs assumes kmalloc & krealloc give out allocations that are
naturally aligned, like the kernel allocators do.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
posix_memalign doesn't have the restriction that size must be a multiply
of alignment.
This also reverts the fix in commit f3fdbbfa92.
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
* Add missing linux/stddef.h includes
* Explicitly cast PAGE_SIZE to size_t. PAGE_SIZE is defined without UL
suffix in musl
* Musl doesn't define PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP, so
initialize the mutexes with pthread_once.
The ktime_get_coarse_real_ts64() implementation for userspace was using
CLOCK_MONOTONIC instead of CLOCK_REALTIME_COARSE. This resulted in
files being timestamped with a time close to the unix epoch.
Signed-off-by: Justin Husted <sigstop@gmail.com>