add Documentation directory

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-12-08 00:00:12 +03:00 · 2025-12-04 12:25:53 -05:00 · 2025-12-04 12:25:53 -05:00 · 9694ddaba1
commit 9694ddaba1
parent 7d5817d9c2
6 changed files with 546 additions and 0 deletions
--- a/Documentation/CodingStyle.rst
+++ b/Documentation/CodingStyle.rst
@ -0,0 +1,186 @@
 .. SPDX-License-Identifier: GPL-2.0
 bcachefs coding style
 =====================
 Good development is like gardening, and codebases are our gardens. Tend to them
 every day; look for little things that are out of place or in need of tidying.
 A little weeding here and there goes a long way; don't wait until things have
 spiraled out of control.
 Things don't always have to be perfect - nitpicking often does more harm than
 good. But appreciate beauty when you see it - and let people know.
 The code that you are afraid to touch is the code most in need of refactoring.
 A little organizing here and there goes a long way.
 Put real thought into how you organize things.
 Good code is readable code, where the structure is simple and leaves nowhere
 for bugs to hide.
 Assertions are one of our most important tools for writing reliable code. If in
 the course of writing a patchset you encounter a condition that shouldn't
 happen (and will have unpredictable or undefined behaviour if it does), or
 you're not sure if it can happen and not sure how to handle it yet - make it a
 BUG_ON(). Don't leave undefined or unspecified behavior lurking in the codebase.
 By the time you finish the patchset, you should understand better which
 assertions need to be handled and turned into checks with error paths, and
 which should be logically impossible. Leave the BUG_ON()s in for the ones which
 are logically impossible. (Or, make them debug mode assertions if they're
 expensive - but don't turn everything into a debug mode assertion, so that
 we're not stuck debugging undefined behaviour should it turn out that you were
 wrong).
 Assertions are documentation that can't go out of date. Good assertions are
 wonderful.
 Good assertions drastically and dramatically reduce the amount of testing
 required to shake out bugs.
 Good assertions are based on state, not logic. To write good assertions, you
 have to think about what the invariants on your state are.
 Good invariants and assertions will hold everywhere in your codebase. This
 means that you can run them in only a few places in the checked in version, but
 should you need to debug something that caused the assertion to fail, you can
 quickly shotgun them everywhere to find the codepath that broke the invariant.
 A good assertion checks something that the compiler could check for us, and
 elide - if we were working in a language with embedded correctness proofs that
 the compiler could check. This is something that exists today, but it'll likely
 still be a few decades before it comes to systems programming languages. But we
 can still incorporate that kind of thinking into our code and document the
 invariants with runtime checks - much like the way people working in
 dynamically typed languages may add type annotations, gradually making their
 code statically typed.
 Looking for ways to make your assertions simpler - and higher level - will
 often nudge you towards making the entire system simpler and more robust.
 Good code is code where you can poke around and see what it's doing -
 introspection. We can't debug anything if we can't see what's going on.
 Whenever we're debugging, and the solution isn't immediately obvious, if the
 issue is that we don't know where the issue is because we can't see what's
 going on - fix that first.
 We have the tools to make anything visible at runtime, efficiently - RCU and
 percpu data structures among them. Don't let things stay hidden.
 The most important tool for introspection is the humble pretty printer - in
 bcachefs, this means `*_to_text()` functions, which output to printbufs.
 Pretty printers are wonderful, because they compose and you can use them
 everywhere. Having functions to print whatever object you're working with will
 make your error messages much easier to write (therefore they will actually
 exist) and much more informative. And they can be used from sysfs/debugfs, as
 well as tracepoints.
 Runtime info and debugging tools should come with clear descriptions and
 labels, and good structure - we don't want files with a list of bare integers,
 like in procfs. Part of the job of the debugging tools is to educate users and
 new developers as to how the system works.
 Error messages should, whenever possible, tell you everything you need to debug
 the issue. It's worth putting effort into them.
 Tracepoints shouldn't be the first thing you reach for. They're an important
 tool, but always look for more immediate ways to make things visible. When we
 have to rely on tracing, we have to know which tracepoints we're looking for,
 and then we have to run the troublesome workload, and then we have to sift
 through logs. This is a lot of steps to go through when a user is hitting
 something, and if it's intermittent it may not even be possible.
 The humble counter is an incredibly useful tool. They're cheap and simple to
 use, and many complicated internal operations with lots of things that can
 behave weirdly (anything involving memory reclaim, for example) become
 shockingly easy to debug once you have counters on every distinct codepath.
 Persistent counters are even better.
 When debugging, try to get the most out of every bug you come across; don't
 rush to fix the initial issue. Look for things that will make related bugs
 easier the next time around - introspection, new assertions, better error
 messages, new debug tools, and do those first. Look for ways to make the system
 better behaved; often one bug will uncover several other bugs through
 downstream effects.
 Fix all that first, and then the original bug last - even if that means keeping
 a user waiting. They'll thank you in the long run, and when they understand
 what you're doing you'll be amazed at how patient they're happy to be. Users
 like to help - otherwise they wouldn't be reporting the bug in the first place.
 Talk to your users. Don't isolate yourself.
 Users notice all sorts of interesting things, and by just talking to them and
 interacting with them you can benefit from their experience.
 Spend time doing support and helpdesk stuff. Don't just write code - code isn't
 finished until it's being used trouble free.
 This will also motivate you to make your debugging tools as good as possible,
 and perhaps even your documentation, too. Like anything else in life, the more
 time you spend at it the better you'll get, and you the developer are the
 person most able to improve the tools to make debugging quick and easy.
 Be wary of how you take on and commit to big projects. Don't let development
 become product-manager focused. Often time an idea is a good one but needs to
 wait for its proper time - but you won't know if it's the proper time for an
 idea until you start writing code.
 Expect to throw a lot of things away, or leave them half finished for later.
 Nobody writes all perfect code that all gets shipped, and you'll be much more
 productive in the long run if you notice this early and shift to something
 else. The experience gained and lessons learned will be valuable for all the
 other work you do.
 But don't be afraid to tackle projects that require significant rework of
 existing code. Sometimes these can be the best projects, because they can lead
 us to make existing code more general, more flexible, more multipurpose and
 perhaps more robust. Just don't hesitate to abandon the idea if it looks like
 it's going to make a mess of things.
 Complicated features can often be done as a series of refactorings, with the
 final change that actually implements the feature as a quite small patch at the
 end. It's wonderful when this happens, especially when those refactorings are
 things that improve the codebase in their own right. When that happens there's
 much less risk of wasted effort if the feature you were going for doesn't work
 out.
 Always strive to work incrementally. Always strive to turn the big projects
 into little bite sized projects that can prove their own merits.
 Instead of always tackling those big projects, look for little things that
 will be useful, and make the big projects easier.
 The question of what's likely to be useful is where junior developers most
 often go astray - doing something because it seems like it'll be useful often
 leads to overengineering. Knowing what's useful comes from many years of
 experience, or talking with people who have that experience - or from simply
 reading lots of code and looking for common patterns and issues. Don't be
 afraid to throw things away and do something simpler.
 Talk about your ideas with your fellow developers; often times the best things
 come from relaxed conversations where people aren't afraid to say "what if?".
 Don't neglect your tools.
 The most important tools (besides the compiler and our text editor) are the
 tools we use for testing. The shortest possible edit/test/debug cycle is
 essential for working productively. We learn, gain experience, and discover the
 errors in our thinking by running our code and seeing what happens. If your
 time is being wasted because your tools are bad or too slow - don't accept it,
 fix it.
 Put effort into your documentation, commit messages, and code comments - but
 don't go overboard. A good commit message is wonderful - but if the information
 was important enough to go in a commit message, ask yourself if it would be
 even better as a code comment.
 A good code comment is wonderful, but even better is the comment that didn't
 need to exist because the code was so straightforward as to be obvious;
 organized into small clean and tidy modules, with clear and descriptive names
 for functions and variables, where every line of code has a clear purpose.
--- a/Documentation/SubmittingPatches.rst
+++ b/Documentation/SubmittingPatches.rst
@ -0,0 +1,105 @@
 Submitting patches to bcachefs
 ==============================
 Here are suggestions for submitting patches to bcachefs subsystem.
 Submission checklist
 --------------------
 Patches must be tested before being submitted, either with the xfstests suite
 [0]_, or the full bcachefs test suite in ktest [1]_, depending on what's being
 touched. Note that ktest wraps xfstests and will be an easier method to running
 it for most users; it includes single-command wrappers for all the mainstream
 in-kernel local filesystems.
 Patches will undergo more testing after being merged (including
 lockdep/kasan/preempt/etc. variants), these are not generally required to be
 run by the submitter - but do put some thought into what you're changing and
 which tests might be relevant, e.g. are you dealing with tricky memory layout
 work? kasan, are you doing locking work? then lockdep; and ktest includes
 single-command variants for the debug build types you'll most likely need.
 The exception to this rule is incomplete WIP/RFC patches: if you're working on
 something nontrivial, it's encouraged to send out a WIP patch to let people
 know what you're doing and make sure you're on the right track. Just make sure
 it includes a brief note as to what's done and what's incomplete, to avoid
 confusion.
 Rigorous checkpatch.pl adherence is not required (many of its warnings are
 considered out of date), but try not to deviate too much without reason.
 Focus on writing code that reads well and is organized well; code should be
 aesthetically pleasing.
 CI
 --
 Instead of running your tests locally, when running the full test suite it's
 preferable to let a server farm do it in parallel, and then have the results
 in a nice test dashboard (which can tell you which failures are new, and
 presents results in a git log view, avoiding the need for most bisecting).
 That exists [2]_, and community members may request an account. If you work for
 a big tech company, you'll need to help out with server costs to get access -
 but the CI is not restricted to running bcachefs tests: it runs any ktest test
 (which generally makes it easy to wrap other tests that can run in qemu).
 Other things to think about
 ---------------------------
 - How will we debug this code? Is there sufficient introspection to diagnose
  when something starts acting wonky on a user machine?
  We don't necessarily need every single field of every data structure visible
  with introspection, but having the important fields of all the core data
  types wired up makes debugging drastically easier - a bit of thoughtful
  foresight greatly reduces the need to have people build custom kernels with
  debug patches.
  More broadly, think about all the debug tooling that might be needed.
 - Does it make the codebase more or less of a mess? Can we also try to do some
  organizing, too?
 - Do new tests need to be written? New assertions? How do we know and verify
  that the code is correct, and what happens if something goes wrong?
  We don't yet have automated code coverage analysis or easy fault injection -
  but for now, pretend we did and ask what they might tell us.
  Assertions are hugely important, given that we don't yet have a systems
  language that can do ergonomic embedded correctness proofs. Hitting an assert
  in testing is much better than wandering off into undefined behaviour la-la
  land - use them. Use them judiciously, and not as a replacement for proper
  error handling, but use them.
 - Does it need to be performance tested? Should we add new performance counters?
  bcachefs has a set of persistent runtime counters which can be viewed with
  the 'bcachefs fs top' command; this should give users a basic idea of what
  their filesystem is currently doing. If you're doing a new feature or looking
  at old code, think if anything should be added.
 - If it's a new on disk format feature - have upgrades and downgrades been
  tested? (Automated tests exists but aren't in the CI, due to the hassle of
  disk image management; coordinate to have them run.)
 Mailing list, IRC
 -----------------
 Patches should hit the list [3]_, but much discussion and code review happens
 on IRC as well [4]_; many people appreciate the more conversational approach
 and quicker feedback.
 Additionally, we have a lively user community doing excellent QA work, which
 exists primarily on IRC. Please make use of that resource; user feedback is
 important for any nontrivial feature, and documenting it in commit messages
 would be a good idea.
 .. rubric:: References
 .. [0] git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
 .. [1] https://evilpiepirate.org/git/ktest.git/
 .. [2] https://evilpiepirate.org/~testdashboard/ci/
 .. [3] linux-bcachefs@vger.kernel.org
 .. [4] irc.oftc.net#bcache, #bcachefs-dev
--- a/Documentation/casefolding.rst
+++ b/Documentation/casefolding.rst
@ -0,0 +1,108 @@
 .. SPDX-License-Identifier: GPL-2.0
 Casefolding
 ===========
 bcachefs has support for case-insensitive file and directory
 lookups using the regular `chattr +F` (`S_CASEFOLD`, `FS_CASEFOLD_FL`)
 casefolding attributes.
 The main usecase for casefolding is compatibility with software written
 against other filesystems that rely on casefolded lookups
 (eg. NTFS and Wine/Proton).
 Taking advantage of file-system level casefolding can lead to great
 loading time gains in many applications and games.
 Casefolding support requires a kernel with the `CONFIG_UNICODE` enabled.
 Once a directory has been flagged for casefolding, a feature bit
 is enabled on the superblock which marks the filesystem as using
 casefolding.
 When the feature bit for casefolding is enabled, it is no longer possible
 to mount that filesystem on kernels without `CONFIG_UNICODE` enabled.
 On the lookup/query side: casefolding is implemented by allocating a new
 string of `BCH_NAME_MAX` length using the `utf8_casefold` function to
 casefold the query string.
 On the dirent side: casefolding is implemented by ensuring the `bkey`'s
 hash is made from the casefolded string and storing the cached casefolded
 name with the regular name in the dirent.
 The structure looks like this:
 * Regular:    [dirent data][regular name][nul][nul]...
 * Casefolded: [dirent data][reg len][cf len][regular name][casefolded name][nul][nul]...
 (Do note, the number of NULs here is merely for illustration; their count can
 vary per-key, and they may not even be present if the key is aligned to
 `sizeof(u64)`.)
 This is efficient as it means that for all file lookups that require casefolding,
 it has identical performance to a regular lookup:
 a hash comparison and a `memcmp` of the name.
 Rationale
 ---------
 Several designs were considered for this system:
 One was to introduce a dirent_v2, however that would be painful especially as
 the hash system only has support for a single key type. This would also need
 `BCH_NAME_MAX` to change between versions, and a new feature bit.
 Another option was to store without the two lengths, and just take the length of
 the regular name and casefolded name contiguously / 2 as the length. This would
 assume that the regular length == casefolded length, but that could potentially
 not be true, if the uppercase unicode glyph had a different UTF-8 encoding than
 the lowercase unicode glyph.
 It would be possible to disregard the casefold cache for those cases, but it was
 decided to simply encode the two string lengths in the key to avoid random
 performance issues if this edgecase was ever hit.
 The option settled on was to use a free-bit in d_type to mark a dirent as having
 a casefold cache, and then treat the first 4 bytes the name block as lengths.
 You can see this in the `d_cf_name_block` member of union in `bch_dirent`.
 The feature bit was used to allow casefolding support to be enabled for the majority
 of users, but some allow users who have no need for the feature to still use bcachefs as
 `CONFIG_UNICODE` can increase the kernel side a significant amount due to the tables used,
 which may be decider between using bcachefs for eg. embedded platforms.
 Other filesystems like ext4 and f2fs have a super-block level option for casefolding
 encoding, but bcachefs currently does not provide this. ext4 and f2fs do not expose
 any encodings than a single UTF-8 version. When future encodings are desirable,
 they will be added trivially using the opts mechanism.
 dentry/dcache considerations
 ----------------------------
 Currently, in casefolded directories, bcachefs (like other filesystems) will not cache
 negative dentry's.
 This is because currently doing so presents a problem in the following scenario:
 - Lookup file "blAH" in a casefolded directory
 - Creation of file "BLAH" in a casefolded directory
 - Lookup file "blAH" in a casefolded directory
 This would fail if negative dentry's were cached.
 This is slightly suboptimal, but could be fixed in future with some vfs work.
 References
 ----------
 (from Peter Anvin, on the list)
 It is worth noting that Microsoft has basically declared their
 "recommended" case folding (upcase) table to be permanently frozen (for
 new filesystem instances in the case where they use an on-disk
 translation table created at format time.)  As far as I know they have
 never supported anything other than 1:1 conversion of BMP code points,
 nor normalization.
 The exFAT specification enumerates the full recommended upcase table,
 although in a somewhat annoying format (basically a hex dump of
 compressed data):
 https://learn.microsoft.com/en-us/windows/win32/fileio/exfat-specification
--- a/Documentation/errorcodes.rst
+++ b/Documentation/errorcodes.rst
@ -0,0 +1,30 @@
 .. SPDX-License-Identifier: GPL-2.0
 bcachefs private error codes
 ----------------------------
 In bcachefs, as a hard rule we do not throw or directly use standard error
 codes (-EINVAL, -EBUSY, etc.). Instead, we define private error codes as needed
 in fs/bcachefs/errcode.h.
 This gives us much better error messages and makes debugging much easier. Any
 direct uses of standard error codes you see in the source code are simply old
 code that has yet to be converted - feel free to clean it up!
 Private error codes may subtype another error code, this allows for grouping of
 related errors that should be handled similarly (e.g. transaction restart
 errors), as well as specifying which standard error code should be returned at
 the bcachefs module boundary.
 At the module boundary, we use bch2_err_class() to convert to a standard error
 code; this also emits a trace event so that the original error code be
 recovered even if it wasn't logged.
 Do not reuse error codes! Generally speaking, a private error code should only
 be thrown in one place. That means that when we see it in a log message we can
 see, unambiguously, exactly which file and line number it was returned from.
 Try to give error codes names that are as reasonably descriptive of the error
 as possible. Frequently, the error will be logged at a place far removed from
 where the error was generated; good names for error codes mean much more
 descriptive and useful error messages.
--- a/Documentation/future/idle_work.rst
+++ b/Documentation/future/idle_work.rst
@ -0,0 +1,79 @@
 Idle/background work classes design doc
 =======================================
 Right now, our behaviour at idle isn't ideal, it was designed for servers that
 would be under sustained load, to keep pending work at a "medium" level, to
 let work build up so we can process it in more efficient batches, while also
 giving headroom for bursts in load.
 But for desktops or mobile - scenarios where work is less sustained and power
 usage is more important - we want to operate differently, with a "rush to
 idle" so the system can go to sleep. We don't want to be dribbling out
 background work while the system should be idle.
 The complicating factor is that there are a number of background tasks, which
 form a hierarchy (or a digraph, depending on how you divide it up) - one
 background task may generate work for another.
 Thus proper idle detection needs to model this hierarchy.
 - Foreground writes
 - Page cache writeback
 - Copygc, rebalance
 - Journal reclaim
 When we implement idle detection and rush to idle, we need to be careful not
 to disturb too much the existing behaviour that works reasonably well when the
 system is under sustained load (or perhaps improve it in the case of
 rebalance, which currently does not actively attempt to let work batch up).
 SUSTAINED LOAD REGIME
 ---------------------
 When the system is under continuous load, we want these jobs to run
 continuously - this is perhaps best modelled with a P/D controller, where
 they'll be trying to keep a target value (i.e. fragmented disk space,
 available journal space) roughly in the middle of some range.
 The goal under sustained load is to balance our ability to handle load spikes
 without running out of x resource (free disk space, free space in the
 journal), while also letting some work accumululate to be batched (or become
 unnecessary).
 For example, we don't want to run copygc too aggressively, because then it
 will be evacuating buckets that would have become empty (been overwritten or
 deleted) anyways, and we don't want to wait until we're almost out of free
 space because then the system will behave unpredicably - suddenly we're doing
 a lot more work to service each write and the system becomes much slower.
 IDLE REGIME
 -----------
 When the system becomes idle, we should start flushing our pending work
 quicker so the system can go to sleep.
 Note that the definition of "idle" depends on where in the hierarchy a task
 is - a task should start flushing work more quickly when the task above it has
 stopped generating new work.
 e.g. rebalance should start flushing more quickly when page cache writeback is
 idle, and journal reclaim should only start flushing more quickly when both
 copygc and rebalance are idle.
 It's important to let work accumulate when more work is still incoming and we
 still have room, because flushing is always more efficient if we let it batch
 up. New writes may overwrite data before rebalance moves it, and tasks may be
 generating more updates for the btree nodes that journal reclaim needs to flush.
 On idle, how much work we do at each interval should be proportional to the
 length of time we have been idle for. If we're idle only for a short duration,
 we shouldn't flush everything right away; the system might wake up and start
 generating new work soon, and flushing immediately might end up doing a lot of
 work that would have been unnecessary if we'd allowed things to batch more.
 To summarize, we will need:
 - A list of classes for background tasks that generate work, which will
   include one "foreground" class.
 - Tracking for each class - "Am I doing work, or have I gone to sleep?"
 - And each class should check the class above it when deciding how much work to issue.
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@ -0,0 +1,38 @@
 .. SPDX-License-Identifier: GPL-2.0
 ======================
 bcachefs Documentation
 ======================
 Subsystem-specific development process notes
 --------------------------------------------
 Development notes specific to bcachefs. These are intended to supplement
 :doc:`general kernel development handbook </process/index>`.
 .. toctree::
   :maxdepth: 1
   :numbered:
   CodingStyle
   SubmittingPatches
 Filesystem implementation
 -------------------------
 Documentation for filesystem features and their implementation details.
 At this moment, only a few of these are described here.
 .. toctree::
   :maxdepth: 1
   :numbered:
   casefolding
   errorcodes
 Future design
 -------------
 .. toctree::
   :maxdepth: 1
   future/idle_work