Commit Graph

1324513 Commits

Author SHA1 Message Date
Kent Overstreet
36e5b64d0d bcachefs: fix trace_copygc
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
3987c7c064 bcachefs: Internal reads can now correct errors
Rework the read path so that BCH_READ_NODECODE reads now also self-heal
after a read error and a successful retry - prerequisite for scrub.

- __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE
  and a bounce.

  Normally, we don't want a BCH_READ_NODECODE read to ever allocate a
  split bch_read_bio: we want to maintain the relationship between the
  bch_read_bio and the data_update it's embedded in.

  But correcting read errors requires allocating a split/bounce rbio
  that's embedded in a promote_op. We do still have a 1-1 relationship,
  i.e. we only allocate a single split/bounce if it's a
  BCH_READ_NODECODE, so things hopefully don't get too crazy.

- __bch2_read_extent() now is allowed to allocate the promote_op for
  rewriting after a failed read, even if it's BCH_READ_NODECODE.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
93f9132789 bcachefs: Don't self-heal if a data update is already rewriting
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
98dbb4c946 bcachefs: Don't start promotes from bch2_rbio_free()
we don't want to block completion of the read - starting a promote calls
into the write path, which will block.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
f4f4fcfccb bcachefs: Bail out early on alloc_nowait data updates
If a data update doesn't want to block on allocations (promotes, self
healing on read error) - check if the allocation would fail before
kicking off the data update and calling into the write path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
03c79816fb bcachefs: Rework init order in bch2_data_update_init()
Initialize the write op first, so that in the next patch we can check if
the allocator would block (for BCH_WRITE_alloc_nowait ops) and bail out
before taking nocow locks/dev refs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
6af747d93f bcachefs: Self healing writes are BCH_WRITE_alloc_nowait
If a drive is failing and we're moving data off of it, we can't
necessairly depend on capacity/disk reservation calculations to avoid
deadlocking/blocking on the allocator.

And, we don't want to queue up infinite self healing moves anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
28db3b1ff5 bcachefs: Promotes should use BCH_WRITE_only_specified_devs
Promotes, like most other internal moves, should only go to the
specified target and not fall back to allocating from the full
filesystem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
bb4e6bb2cc bcachefs: Be stricter in bch2_read_retry_nodecode()
Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that
the read is embedded in a a data_update - and we can check in the retry
path if the extent has changed and bail out.

This likely fixes some subtle bugs with read errors and data moves.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
69f2a1f4cd bcachefs: cleanup redundant code around data_update_op initialization
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
9c5f636502 bcachefs: bch2_update_unwritten_extent() no longer depens on wbio
Prep work for improving bch2_data_update_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
14df8b7256 bcachefs: promote_op uses embedded bch_read_bio
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
652f84c589 bcachefs: data_update now embeds bch_read_bio
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
39978c54a2 bcachefs: rbio_init() cleanup
Move more initialization to rbio_init(), to assist in further cleanups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
4ea76e0604 bcachefs: rbio_init_fragment()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:49 -05:00
Kent Overstreet
1c13830967 bcachefs: Rename BCH_WRITE flags fer consistency with other x-macros enums
The uppercase/lowercase style is nice for making the namespace explicit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:48 -05:00
Kent Overstreet
201db6cdd7 bcachefs: x-macroize BCH_READ flags
Will be adding a bch2_read_bio_to_text().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:48 -05:00
Kent Overstreet
a9f0387e12 bcachefs: Avoid holding btree locks when blocking on IO
Read retries are done synchronously, so we definitely shouldn't be
holding any locks (even the srcu lock for btree key cache reclaim) when
submitting the IO.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:48 -05:00
Kent Overstreet
e710f00775 bcachefs: kill bch_read_bio.devs_have
Dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:48 -05:00
Kent Overstreet
3b0256021b bcachefs: bch2_moving_ctxt_to_text() -> bch2_moving_ctxt_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:48 -05:00
Kent Overstreet
f917016f69 bcachefs: Reduce stack frame size of __bch2_str_hash_check_key()
We don't need all the helpers inlined here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:57:48 -05:00
Kent Overstreet
a858175227 bcachefs: Fix btree_trans_peek_key_cache()
BTREE_ITER_cached_nofill has some tricky corner cases; it's used
internally for iterators that aren't walking the key cache, but need to
be coherent with the key cache.

It tells traverse to look up and lock the key cache entry if present,
but don't create one if it doesn't exist.

That means we have to have a BTREE_ITER_UPTODATE path (because after
traverse the path has to be UPTODATE, or we pop assertions) that doesn't
point to anything (which is the less bad option, taken by the previous
fix).

The previous fix for this path missed an issue that can happen in
bch2_trans_peek_key_cache(): we can't set should_be_locked on a path
that doesn't point to anything and doesn't hold locks.

Fixes: bd5b09727f ("bcachefs: Don't set btree_path to updtodate if we don't fill")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:26:25 -05:00
Kent Overstreet
ff0b7ed607 bcachefs: Fix check_inode_hash_info_matches_root()
Can't use memcmp() when the struct contains padding.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-15 15:28:23 -05:00
Kent Overstreet
a4e11cea27 bcachefs: Document issue with bch_stripe layout
We've got a problem with bch_stripe that is going to take an on disk
format rev to fix - we can't access the block sector counts if the
checksum type is unknown.

Document it for now, there are a few other things to fix as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:31 -05:00
Kent Overstreet
78423deb51 bcachefs: Fix self healing on read error
We were incorrectly checking if there'd been an io error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:31 -05:00
Alan Huang
5dd21b2712 bcachefs: Pop all the transactions from the abort one
The transaction is going to abort, so there will be no cycle involving
this transaction anymore.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:25 -05:00
Alan Huang
b169138d48 bcachefs: Only abort the transactions in the cycle
When the cycle doesn't involve the initiator of the cycle detection,
we might choose a transaction that is not involved in the cycle to abort.
It shouldn't be that since it won't break the cycle, this patch
therefore chooses the transaction in the cycle to abort.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:18 -05:00
Alan Huang
6853a5e5d4 bcachefs: Introduce lock_graph_pop_from
This patch introduces a helper function called lock_graph_pop_from,
it pops the graph from i.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:13 -05:00
Alan Huang
b5c3dcd0db bcachefs: Convert open-coded lock_graph_pop_all to helper
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:08 -05:00
Alan Huang
0ef9ab34f4 bcachefs: Do not allow no fail lock request to fail
If the transaction chose itself as a victim before and restarted, it
might request a no fail lock request this time. But it might be added to
others' lock graph and be chose as the victim again, it's no longer safe
without additional check. We can also convert the cycle detector to be
fully RCU-based to solve that unsoundness, but the latency added to trans_put
and additional memory required may not worth it.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:08 -05:00
Alan Huang
cdc419dbf2 bcachefs: Merge the condition to avoid additional invocation
If the lock has been acquired and unlocked, we don't have to do clear
and wakeup again, though harmless since we hold the intent lock. Merge
the condition might be clearer.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:08 -05:00
Alan Huang
9c13cc9c7d Revert "bcachefs: Fix bch2_btree_node_upgrade()"
This reverts commit 62448afee7.

six_lock_tryupgrade fails only if there is an intent lock held,
it won't fail no matter how many read locks are held.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:08 -05:00
Hongbo Li
c72deb03ff bcachefs: bcachefs_metadata_version_directory_size
This adds another metadata version for accounting directory size.
For the new version of the filesystem, when new subdirectory items
are created or deleted, the parent directory's size will change
accordingly. For the old version of the existed file system, running
fsck will automatically upgrade the metadata version, and it will
do the check and recalculationg of the directory size.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-13 14:58:38 -05:00
Hongbo Li
e614a6c52d bcachefs: make directory i_size meaningful
The isize of directory is 0 in bcachefs if the directory is empty.
With more child dirents created, its size ought to change. Many
other filesystems changed as that (ie. xfs and btrfs). And many of
them changed as the size of child dirent name. Although the directory
size may not seem to convey much, we can still give it some meaning.

The formula of dentry size as follow:
    occupied_size = 40 + ALIGN(9 + namelen, 8)

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-13 14:58:38 -05:00
Kent Overstreet
4204e3bf63 bcachefs: check_unreachable_inodes is not actually PASS_ONLINE yet
check_unreachable_inodes does work in online mode, with the one caveat
that it assumes check_dirents has also run - and check_dirents is not
PASS_ONLINE yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
ae153f2e11 bcachefs: Don't use BTREE_ITER_cached when walking alloc btree during fsck
No need to pull the whole alloc btree into the btree key cache.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
15734b5e6f bcachefs: Check for dirents to overwritten inodes
This fixes various "dirent to missing inode" errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
d3d0fac57d bcachefs: bch2_btree_iter_peek_slot() handles navigating to nonexistent depth
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
bd5b09727f bcachefs: Don't set btree_path to updtodate if we don't fill
This fixes various locking asserts, and a null ptr deref in
bch2_btree_iter_peek_path().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
cf67f46641 bcachefs: __bch2_btree_pos_to_text()
Factor out a version of bch2_btree_pos_to_text() that doesn't take a
pointer to a in-memory btree node, to be used for btree node scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
0a46ea9d46 bcachefs: printbuf_reset() handles tabstops
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
5906dcb993 bcachefs: Silence read-only errors when deleting snapshots
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
8b1f46bff3 bcachefs: Dropped superblock write is no longer a fatal error
Just emit a warning if errors=continue or fix_safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
8cfdc6ce1f bcachefs: bch2_trans_node_drop()
Factor out a small common helper.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
0971a72c3d bcachefs: bch2_trans_unlock_write()
New helper for dropping all write locks; which is distinct from the
helper the transaction commit path uses, which is faster and only
touches updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
e1911d7a69 bcachefs: btree_node_unlock() can now drop write locks
Prep work for reworking btree node locking during interior btree
updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
9a5232ef0a bcachefs: six locks: write locks can now be held recursively
This is needed for the interior update locking rework, where we'll be
holding node write locks for the duration of the update - which is
needed for synchronizing with online check_allocations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
8f3aaa5d5d bcachefs: bch2_fs_btree_gc_init()
Now returns errors, prep work for check_allocations_done_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
cb3f34982c bcachefs: Assert that btree write buffer only touches the right btrees
More asserts, more better.

Also, clean up the per-btree flags a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
bdedae70f5 bcachefs: bch2_inum_path() now crosses subvolumes correctly
The dirent that points to a subvolume root is in the parent subvolume.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00