Use ip4h_dscp() instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use ip4h_dscp() instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use ip4h_dscp() instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmcr+e0ACgkQ1w0aZmrP
KyFS9RAAoB5S0NZBNf60fpG6b5ZlfDEfoOUXfp91CRT1WY7KaCekH19aLf5O5HAg
iJNjdOs7R4YCMfUfbWzDj0ZYnayz0h618Nin/EufIJbAMoOnBMnb12r5DUhnMpnC
M9noDQJzyXPhlE1gR3py7I9VgwdqfRa3+EfS1uTm1NL9tv7MLuej+Z6nnmQ2Sw2/
tUfuhLyKvs3iIiegeojrsTGix4YnMNrIrQUqpJXq7jCfXHFPz10MmlR7fZuBK99s
QuphQ9Onf7SXow2bxkdhB2cS4i3+BK5fLFXHDW9cLROL+PKPenAsdnKcsXe5ViL5
Ck1/N4KxydXt3djmDWfvXFF2/BaxMHnO5S9p+1ZAE18auCz6KKzefjBo134rftgz
GN8Fu8A2OQ9pZzod/21M0m2BDAoOG3kKRq6MjoUydfc8/T0q/FVtInprjh1twIvg
3uZludlQUKANVeH3bRR16fE0Z5k8fdTX3BXxRgQNo9hrDjpnmAyJmh6XTCuS9XWJ
L0VR94QLnN/yi7U7EWdEk2VV944Pfj6aeNjFC8AWHbhP+DzvELFFZvDvQ7AHaiK1
fOZfuX4nO4i2WrnCHeVLhYxdkvGixyWxorMYcDNmRDSQirQLaARMaAMFfgI4TKkm
R/wjJKXitntN9cLiiHuParkVQZAiFH1AwuvxQgCI5uaN6Gxpivo=
=TkS8
-----END PGP SIGNATURE-----
Merge tag 'nf-next-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following series contains Netfilter updates for net-next:
1) Make legacy xtables configs user selectable, from Breno Leitao.
2) Fix a few sparse warnings related to percpu, from Uros Bizjak.
3) Use strscpy_pad, from Justin Stitt.
4) Use nft_trans_elem_alloc() in catchall flush, from Florian Westphal.
5) A series of 7 patches to fix false positive with CONFIG_RCU_LIST=y.
Florian also sees possible issue with 10 while module load/removal
when requesting an expression that is available via module. As for
patch 11, object is being updated so reference on the module already
exists so I don't see any real issue.
Florian says:
"Unfortunately there are many more errors, and not all are false positives.
First patches pass lockdep_commit_lock_is_held() to the rcu list traversal
macro so that those splats are avoided.
The last two patches are real code change as opposed to
'pass the transaction mutex to relax rcu check':
Those two lists are not protected by transaction mutex so could be altered
in parallel.
This targets nf-next because these are long-standing issues."
netfilter pull request 24-11-07
* tag 'nf-next-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: must hold rcu read lock while iterating object type list
netfilter: nf_tables: must hold rcu read lock while iterating expression type list
netfilter: nf_tables: avoid false-positive lockdep splats with basechain hook
netfilter: nf_tables: avoid false-positive lockdep splats in set walker
netfilter: nf_tables: avoid false-positive lockdep splats with flowtables
netfilter: nf_tables: avoid false-positive lockdep splats with sets
netfilter: nf_tables: avoid false-positive lockdep splat on rule deletion
netfilter: nf_tables: prefer nft_trans_elem_alloc helper
netfilter: nf_tables: replace deprecated strncpy with strscpy_pad
netfilter: nf_tables: Fix percpu address space issues in nf_tables_api.c
netfilter: Make legacy configs user selectable
====================
Link: https://patch.msgid.link/20241106234625.168468-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This option makes legacy Netfilter Kconfig user selectable, giving users
the option to configure iptables without enabling any other config.
Make the following KConfig entries user selectable:
* BRIDGE_NF_EBTABLES_LEGACY
* IP_NF_ARPTABLES
* IP_NF_IPTABLES_LEGACY
* IP6_NF_IPTABLES_LEGACY
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We need to init l3mdev unconditionally, else main routing table is searched
and incorrect result is returned unless strict (iif keyword) matching is
requested.
Next patch adds a selftest for this.
Fixes: 2a8a7c0eaa ("netfilter: nft_fib: Fix for rpath check with VRF devices")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1761
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If CONFIG_BRIDGE_NETFILTER is not enabled, which is the case for x86_64
defconfig, then building nf_reject_ipv4.c and nf_reject_ipv6.c with W=1
using gcc-14 results in the following warnings, which are treated as
errors:
net/ipv4/netfilter/nf_reject_ipv4.c: In function 'nf_send_reset':
net/ipv4/netfilter/nf_reject_ipv4.c:243:23: error: variable 'niph' set but not used [-Werror=unused-but-set-variable]
243 | struct iphdr *niph;
| ^~~~
cc1: all warnings being treated as errors
net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_reset6':
net/ipv6/netfilter/nf_reject_ipv6.c:286:25: error: variable 'ip6h' set but not used [-Werror=unused-but-set-variable]
286 | struct ipv6hdr *ip6h;
| ^~~~
cc1: all warnings being treated as errors
Address this by reducing the scope of these local variables to where
they are used, which is code only compiled when CONFIG_BRIDGE_NETFILTER
enabled.
Compile tested and run through netfilter selftests.
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Closes: https://lore.kernel.org/netfilter-devel/20240906145513.567781-1-andriy.shevchenko@linux.intel.com/
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we are allocating an array, using kmemdup_array() to take care about
multiplication and possible overflows.
Also it makes auditing the code easier.
Signed-off-by: Yan Zhen <yanzhen@vivo.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmbHtrsACgkQ1V2XiooU
IOTZ8g/+OW8W468NmBHA2zrTWei19irA1iCBLvXMPakice0+ADU51eqVp6uUrCeP
iBUZGMtCq4WzFBrAEBePK3UNxxMHquWvAsA0kO/XW95KVM++s9ykF62q89jugMb3
CADEv/TxgJrkzpLWclxHNTCWMKpURijlkjT+kCMR4fKbeQnB6e/jI+2sdl7l5iRG
tHHm8ieewNNKE+jlSUJUrPEIM3tXRaZh9+JmbClfsF6wUw7qLmT/7P92aHBX4Owp
tpqi/Xc5/2k+Ud96a8u1NYrLLG+L70uz3SaeE7PvhaRavFuYftk2XLB4L2umtEfb
ZCZO/lCadH23XrVAUs5EtCDk4Tu3rZdTDsKYm2qS66uBsh/e6hg+j/cIPSO8jsNq
5Zbs/XzPFJ1PUpXVy8Sfs9vxH+cDuiqhy9nfKrbQotsqtoW+z52UoFH4WAjfmpqb
XMI+yeSTXYl1KIo2LV408VFRFRGcstBvXE7bOn7ufSrltRZcFdx7wqQBgVbh1zvA
1NTzIguZ+Lf2hPcNLPQd/f2vghKRTI7gUwzlDRw6so6NOWUM6/yV5KyoiZKekHjC
S6+M8cdiyMH8DmSsvAb46YKtDxYuIHqLVxuVqjfBHrMo1hLIo5smMCeCRA1Vabd5
/E4DTwpN5tVWX+HZl1wcAtQpXhcTktWM0qGSPnlRS11gwbAGWvk=
=O8tu
-----END PGP SIGNATURE-----
Merge tag 'nf-next-24-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following batch contains Netfilter updates for net-next:
Patch #1 fix checksum calculation in nfnetlink_queue with SCTP,
segment GSO packet since skb_zerocopy() does not support
GSO_BY_FRAGS, from Antonio Ojea.
Patch #2 extend nfnetlink_queue coverage to handle SCTP packets,
from Antonio Ojea.
Patch #3 uses consume_skb() instead of kfree_skb() in nfnetlink,
from Donald Hunter.
Patch #4 adds a dedicate commit list for sets to speed up
intra-transaction lookups, from Florian Westphal.
Patch #5 skips removal of element from abort path for the pipapo
backend, ditching the shadow copy of this datastructure
is sufficient.
Patch #6 moves nf_ct_netns_get() out of nf_conncount_init() to
let users of conncoiunt decide when to enable conntrack,
this is needed by openvswitch, from Xin Long.
Patch #7 pass context to all nft_parse_register_load() in
preparation for the next patch.
Patches #8 and #9 reject loads from uninitialized registers from
control plane to remove register initialization from
datapath. From Florian Westphal.
* tag 'nf-next-24-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: don't initialize registers in nft_do_chain()
netfilter: nf_tables: allow loads only when register is initialized
netfilter: nf_tables: pass context structure to nft_parse_register_load
netfilter: move nf_ct_netns_get out of nf_conncount_init
netfilter: nf_tables: do not remove elements if set backend implements .abort
netfilter: nf_tables: store new sets in dedicated list
netfilter: nfnetlink: convert kfree_skb to consume_skb
selftests: netfilter: nft_queue.sh: sctp coverage
netfilter: nfnetlink_queue: unbreak SCTP traffic
====================
Link: https://patch.msgid.link/20240822221939.157858-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a similar fashion to the iptables rpfilter match, unmask the upper
DSCP bits of the DS field of the currently tested packet so that in the
future the FIB lookup could be performed according to the full DSCP
value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The rpfilter match performs a reverse path filter test on a packet by
performing a FIB lookup with the source and destination addresses
swapped.
Unmask the upper DSCP bits of the DS field of the tested packet so that
in the future the FIB lookup could be performed according to the full
DSCP value.
No functional changes intended since the upper DSCP bits are masked when
comparing against the TOS selectors in FIB rules and routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240821125251.1571445-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As part of its functionality, the nftables FIB expression module
performs a FIB lookup, but unlike other users of the FIB lookup API, it
does so without masking the upper DSCP bits. In particular, this differs
from the equivalent iptables match ("rpfilter") that does mask the upper
DSCP bits before the FIB lookup.
Align the module to other users of the FIB lookup API and mask the upper
DSCP bits using IPTOS_RT_MASK before the lookup.
No regressions in nft_fib.sh:
# ./nft_fib.sh
PASS: fib expression did not cause unwanted packet drops
PASS: fib expression did drop packets for 1.1.1.1
PASS: fib expression did drop packets for 1c3::c01d
PASS: fib expression forward check with policy based routing
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
syzbot reports:
general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
[..]
RIP: 0010:nf_tproxy_laddr4+0xb7/0x340 net/ipv4/netfilter/nf_tproxy_ipv4.c:62
Call Trace:
nft_tproxy_eval_v4 net/netfilter/nft_tproxy.c:56 [inline]
nft_tproxy_eval+0xa9a/0x1a00 net/netfilter/nft_tproxy.c:168
__in_dev_get_rcu() can return NULL, so check for this.
Reported-and-tested-by: syzbot+b94a6818504ea90d7661@syzkaller.appspotmail.com
Fixes: cc6eb43385 ("tproxy: use the interface primary IP address as a default value for --on-ip")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
At the beginning in 2009 one patch [1] introduced collecting drop
counter in nf_conntrack_in() by returning -NF_DROP. Later, another
patch [2] changed the return value of tcp_packet() which now is
renamed to nf_conntrack_tcp_packet() from -NF_DROP to NF_DROP. As
we can see, that -NF_DROP should be corrected.
Similarly, there are other two points where the -NF_DROP is used.
Well, as NF_DROP is equal to 0, inverting NF_DROP makes no sense
as patch [2] said many years ago.
[1]
commit 7d1e04598e ("netfilter: nf_conntrack: account packets drop by tcp_packet()")
[2]
commit ec8d540969 ("netfilter: conntrack: fix dropping packet after l4proto->packet()")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In my recent commit, I missed that do_replace() handlers
use copy_from_sockptr() (which I fixed), followed
by unsafe copy_from_sockptr_offset() calls.
In all functions, we can perform the @optlen validation
before even calling xt_alloc_table_info() with the following
check:
if ((u64)optlen < (u64)tmp.size + sizeof(tmp))
return -EINVAL;
Fixes: 0c83842df4 ("netfilter: validate user input for expected length")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20240409120741.3538135-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap reports arptables build failure:
arp_tables.c:(.text+0x20): undefined reference to `xt_find_table'
... because recent change removed a 'select' on the xtables core.
Add a "depends" clause on arptables to resolve this.
Kernel test robot reports another build breakage:
iptable_nat.c:(.text+0x8): undefined reference to `ipt_unregister_table_exit'
... because of a typo, the nat table selected ip6tables.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/netfilter-devel/d0dfbaef-046a-4c42-9daa-53636664bf6d@infradead.org/
Fixes: a9525c7f62 ("netfilter: xtables: allow xtables-nft only builds")
Fixes: 4654467dc7 ("netfilter: arptables: allow xtables-nft only builds")
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Florian Westphal <fw@strlen.de>
Add hidden IP(6)_NF_IPTABLES_LEGACY symbol.
When any of the "old" builtin tables are enabled the "old" iptables
interface will be supported.
To disable the old set/getsockopt interface the existing options
for the builtin tables need to be turned off:
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_FILTER is not set
CONFIG_IP_NF_NAT is not set
CONFIG_IP_NF_MANGLE is not set
CONFIG_IP_NF_RAW is not set
CONFIG_IP_NF_SECURITY is not set
Same for CONFIG_IP6_NF_ variants.
This allows to build a kernel that only supports ip(6)tables-nft
(iptables-over-nftables api).
In the future the _LEGACY symbol will become visible and the select
statements will be turned into 'depends on', but for now be on safe side
so "make oldconfig" won't break things.
Signed-off-by: Florian Westphal <fw@strlen.de>
Allows to build kernel that supports the arptables mangle target
via nftables' compat infra but without the arptables get/setsockopt
interface or the old arptables filter interpreter.
IOW, setting IP_NF_ARPFILTER=n will break arptables-legacy, but
arptables-nft will continue to work as long as nftables compat
support is enabled.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Phil Sutter <phil@nwl.cc>
An skb can be added to a neigh->arp_queue while waiting for an arp
reply. Where original skb's skb->dev can be different to neigh's
neigh->dev. For instance in case of bridging dnated skb from one veth to
another, the skb would be added to a neigh->arp_queue of the bridge.
As skb->dev can be reset back to nf_bridge->physindev and used, and as
there is no explicit mechanism that prevents this physindev from been
freed under us (for instance neigh_flush_dev doesn't cleanup skbs from
different device's neigh queue) we can crash on e.g. this stack:
arp_process
neigh_update
skb = __skb_dequeue(&neigh->arp_queue)
neigh_resolve_output(..., skb)
...
br_nf_dev_xmit
br_nf_pre_routing_finish_bridge_slow
skb->dev = nf_bridge->physindev
br_handle_frame_finish
Let's use plain ifindex instead of net_device link. To peek into the
original net_device we will use dev_get_by_index_rcu(). Thus either we
get device and are safe to use it or we don't get it and drop skb.
Fixes: c4e70a87d9 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a preparation patch for replacing physindev with physinif on
nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve
device, when needed, and it requires net to be available.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp: fix usec timestamps with TCP fastopen
- tcp_sigpool: fix some off by one bugs
- tcp: fix possible out-of-bounds reads in tcp_hash_fail()
- tcp: fix SYN option room calculation for TCP-AO
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmVNRnAACgkQMUZtbf5S
IrsYaA/+IUoYi96/oLtvvrET6HIbXeMaLKef0UlytEicQKy8h5EWlhcTZPhQEY0g
dtaKOemQsO0dQTma4eQBiBDHeCeSkitgD9p7fh0i+//QFYWSFqHrBiF2mlToc/ZQ
T1p4BlVL7D2Xsr1Lki93zk+EhFGEy2KroYgrWbZc9TWE5ap9PtSVF9eqeHAVCmZ7
ocre/eo4pqUM9rAHIAyhoL+0xtVQ59dBevbJC0qYcmflhafr82Gtdveo6pBBKuYm
GhwbRrAXER3Neav9c6NHqat4zsMwGpC27SiN9dYWm6dlkeS9U9t2PUu71OkJGVfw
VaSE+utkC/WmzGbuiUIjqQLBrRe372ItHCr78BfSRMshS+RBTHtoK7njeH8Iv67E
RsMeCyVNj9dtGlOQG5JAv8IoCQ1WbMw9B36Yzw3ip/MmDX/ntXz7Dcr4ZMZ6VURS
CHhHFZPnmMykMXkT6SIlxeAg2r8ELtESzkvLimdTVFPAlk3cPkibKJbh3F/tEqXS
PDb3y0uoEgRQBAsWXXx9FQEvv9rTL6YrzbMhmJBIIEoNxppQYQ7FZBJX9utAVp5B
1GdyqhR6IRTaKb9cMRj/K1xPwm2KgCw9xj9pjKdAA7QUMslXbFp8blv1rIkFGshg
hiNXmPcI8wo0j+0lZYktEcIERL5y6c8BgK2NnPU6RULua96tuQ4=
=k6Wk
-----END PGP SIGNATURE-----
Merge tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp:
- fix usec timestamps with TCP fastopen
- fix possible out-of-bounds reads in tcp_hash_fail()
- fix SYN option room calculation for TCP-AO
- tcp_sigpool: fix some off by one bugs
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come"
* tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
net: ti: icss-iep: fix setting counter value
ptp: fix corrupted list in ptp_open
ptp: ptp_read should not release queue
net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
net: kcm: fill in MODULE_DESCRIPTION()
net/sched: act_ct: Always fill offloading tuple iifidx
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
drivers/net/ppp: use standard array-copy-function
net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN
virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
r8169: respect userspace disabling IFF_MULTICAST
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
net: phylink: initialize carrier state at creation
test/vsock: add dobule bind connect test
test/vsock: refactor vsock_accept
...
API:
- Add virtual-address based lskcipher interface.
- Optimise ahash/shash performance in light of costly indirect calls.
- Remove ahash alignmask attribute.
Algorithms:
- Improve AES/XTS performance of 6-way unrolling for ppc.
- Remove some uses of obsolete algorithms (md4, md5, sha1).
- Add FIPS 202 SHA-3 support in pkcs1pad.
- Add fast path for single-page messages in adiantum.
- Remove zlib-deflate.
Drivers:
- Add support for S4 in meson RNG driver.
- Add STM32MP13x support in stm32.
- Add hwrng interface support in qcom-rng.
- Add support for deflate algorithm in hisilicon/zip.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmVB3vgACgkQxycdCkmx
i6dsOBAAykbnX8BpnpnOXYywE9ZWrl98rAk51MK0N9olZNfg78zRPIv7fFxFdC20
SDJrDSNPmn0Qvaa5e0EfoAdklsm0k2GkXL/BwPKMKWUsyIoJVYI3WrBMnjBy9xMp
yfME+h0bKoXJCZKnYkIUSGUejmUPSyRlEylrXoFlH/VWYwAaii/x9zwreQoF+0LR
KI24A1q8AYs6Dw9HSfndaAub9GOzrqKYs6fSaMG+77Y4UC5aoi5J9Bp2G3uVyHay
x/0bZtIxKXS9wn+LeG/3GspX23x/I5VwBOdAoMigrYmAIaIg5qgyMszudltTAs4R
zF1Kh7WsnM5+vpnBSeigzo+/GGOU3QTz8y3tBTg+3ZR7GWGOwQLiizhOYqCyOfAH
pIm6c++sZw/OOHiL69Nt4HeLKzGNYYWk3s4X/B/6cqoouPfOsfBaQobZNx9zfy7q
ZNEvSVBjrFX/L6wDSotny1LTWLUNjHbmLaMV5uQZ/SQKEtv19fp2Dl7SsLkHH+3v
ldOAwfoJR6QcSwz3Ez02TUAvQhtP172Hnxi7u44eiZu2aUboLhCFr7aEU6kVdBCx
1rIRVHD1oqlOEDRwPRXzhF3I8R4QDORJIxZ6UUhg7yueuI+XCGDsBNC+LqBrBmSR
IbdjqmSDUBhJyM5yMnt1VFYhqKQ/ZzwZ3JQviwW76Es9pwEIolM=
=IZmR
-----END PGP SIGNATURE-----
Merge tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Add virtual-address based lskcipher interface
- Optimise ahash/shash performance in light of costly indirect calls
- Remove ahash alignmask attribute
Algorithms:
- Improve AES/XTS performance of 6-way unrolling for ppc
- Remove some uses of obsolete algorithms (md4, md5, sha1)
- Add FIPS 202 SHA-3 support in pkcs1pad
- Add fast path for single-page messages in adiantum
- Remove zlib-deflate
Drivers:
- Add support for S4 in meson RNG driver
- Add STM32MP13x support in stm32
- Add hwrng interface support in qcom-rng
- Add support for deflate algorithm in hisilicon/zip"
* tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (283 commits)
crypto: adiantum - flush destination page before unmapping
crypto: testmgr - move pkcs1pad(rsa,sha3-*) to correct place
Documentation/module-signing.txt: bring up to date
module: enable automatic module signing with FIPS 202 SHA-3
crypto: asymmetric_keys - allow FIPS 202 SHA-3 signatures
crypto: rsa-pkcs1pad - Add FIPS 202 SHA-3 support
crypto: FIPS 202 SHA-3 register in hash info for IMA
x509: Add OIDs for FIPS 202 SHA-3 hash and signatures
crypto: ahash - optimize performance when wrapping shash
crypto: ahash - check for shash type instead of not ahash type
crypto: hash - move "ahash wrapping shash" functions to ahash.c
crypto: talitos - stop using crypto_ahash::init
crypto: chelsio - stop using crypto_ahash::init
crypto: ahash - improve file comment
crypto: ahash - remove struct ahash_request_priv
crypto: ahash - remove crypto_ahash_alignmask
crypto: gcm - stop using alignmask of ahash
crypto: chacha20poly1305 - stop using alignmask of ahash
crypto: ccm - stop using alignmask of ahash
net: ipv6: stop checking crypto_ahash_alignmask
...
Per section 4.c. of the IETF Trust Legal Provisions, "Code Components"
in IETF Documents are licensed on the terms of the BSD-3-Clause license:
https://trustee.ietf.org/documents/trust-legal-provisions/tlp-5/
The term "Code Components" specifically includes ASN.1 modules:
https://trustee.ietf.org/documents/trust-legal-provisions/code-components-list-3/
Add an SPDX identifier as well as a copyright notice pursuant to section
6.d. of the Trust Legal Provisions to all ASN.1 modules in the tree
which are derived from IETF Documents.
Section 4.d. of the Trust Legal Provisions requests that each Code
Component identify the RFC from which it is taken, so link that RFC
in every ASN.1 module.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
These checks assume that the caller only returns NF_DROP without
any errno embedded in the upper bits.
This is fine right now, but followup patches will start to propagate
such errors to allow kfree_skb_drop_reason() in the called functions,
those would then indicate 'errno << 8 | NF_STOLEN'.
To not break things we have to mask those parts out.
Signed-off-by: Florian Westphal <fw@strlen.de>
IP_NODEFRAG socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to be able to enable/disable IP packet defrag from core
bpf/netfilter code. In other words, execute code from core that could
possibly be built as a module.
To help avoid symbol resolution errors, use glue hooks that the modules
will register callbacks with during module init.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/f6a8824052441b72afe5285acedbd634bd3384c1.1689970773.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
icmp/icmp6 matches are baked into ip(6)_tables.ko.
This means that even if iptables-nft is used, a rule like
"-p icmp --icmp-type 1" will load the ip(6)tables modules.
Move them to xt_tcpdudp.ko instead to avoid this.
This will also allow to eventually add kconfig knobs to build kernels
that support iptables-nft but not iptables-legacy (old set/getsockopt
interface).
Signed-off-by: Florian Westphal <fw@strlen.de>
The xtables packet traverser performs an unconditional local_bh_disable(),
but the nf_tables evaluation loop does not.
Functions that are called from either xtables or nftables must assume
that they can be called in process context.
inet_twsk_deschedule_put() assumes that no softirq interrupt can occur.
If tproxy is used from nf_tables its possible that we'll deadlock
trying to aquire a lock already held in process context.
Add a small helper that takes care of this and use it.
Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/
Fixes: 4ed8eb6570 ("netfilter: nf_tables: Add native tproxy support")
Reported-and-tested-by: Major Dávid <major.david@balasys.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix broken listing of set elements when table has an owner.
2) Fix conntrack refcount leak in ctnetlink with related conntrack
entries, from Hangyu Hua.
3) Fix use-after-free/double-free in ctnetlink conntrack insert path,
from Florian Westphal.
4) Fix ip6t_rpfilter with VRF, from Phil Sutter.
5) Fix use-after-free in ebtables reported by syzbot, also from Florian.
6) Use skb->len in xt_length to deal with IPv6 jumbo packets,
from Xin Long.
7) Fix NETLINK_LISTEN_ALL_NSID with ctnetlink, from Florian Westphal.
8) Fix memleak in {ip_,ip6_,arp_}tables in ENOMEM error case,
from Pavel Tikhomirov.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: x_tables: fix percpu counter block leak on error path when creating new netns
netfilter: ctnetlink: make event listener tracking global
netfilter: xt_length: use skb len to match in length_mt6
netfilter: ebtables: fix table blob use-after-free
netfilter: ip6t_rpfilter: Fix regression with VRF interfaces
netfilter: conntrack: fix rmmod double-free race
netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack()
netfilter: nf_tables: allow to fetch set elements when table has an owner
====================
Link: https://lore.kernel.org/r/20230222092137.88637-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Here is the stack where we allocate percpu counter block:
+-< __alloc_percpu
+-< xt_percpu_counter_alloc
+-< find_check_entry # {arp,ip,ip6}_tables.c
+-< translate_table
And it can be leaked on this code path:
+-> ip6t_register_table
+-> translate_table # allocates percpu counter block
+-> xt_register_table # fails
there is no freeing of the counter block on xt_register_table fail.
Note: xt_percpu_counter_free should be called to free it like we do in
do_replace through cleanup_entry helper (or in __ip6t_unregister_table).
Probability of hitting this error path is low AFAICS (xt_register_table
can only return ENOMEM here, as it is not replacing anything, as we are
creating new netns, and it is hard to imagine that all previous
allocations succeeded and after that one in xt_register_table failed).
But it's worth fixing even the rare leak.
Fixes: 71ae0dff02 ("netfilter: xtables: use percpu rule counters")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We are not allowed to return an error at this point.
Looking at the code it looks like ret is always 0 at this
point, but its not.
t = find_table_lock(net, repl->name, &ret, &ebt_mutex);
... this can return a valid table, with ret != 0.
This bug causes update of table->private with the new
blob, but then frees the blob right away in the caller.
Syzbot report:
BUG: KASAN: vmalloc-out-of-bounds in __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
Read of size 4 at addr ffffc90005425000 by task kworker/u4:4/74
Workqueue: netns cleanup_net
Call Trace:
kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
__ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
ebt_unregister_table+0x35/0x40 net/bridge/netfilter/ebtables.c:1372
ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613
...
ip(6)tables appears to be ok (ret should be 0 at this point) but make
this more obvious.
Fixes: c58dd2dd44 ("netfilter: Can't fail and free after table replacement")
Reported-by: syzbot+f61594de72d6705aea03@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
iptables/nftables support responding to tcp packets with tcp resets.
The generated tcp reset packet passes through both output and postrouting
netfilter hooks, but conntrack will never see them because the generated
skb has its ->nfct pointer copied over from the packet that triggered the
reset rule.
If the reset rule is used for established connections, this
may result in the conntrack entry to be around for a very long
time (default timeout is 5 days).
One way to avoid this would be to not copy the nf_conn pointer
so that the rest packet passes through conntrack too.
Problem is that output rules might not have the same conntrack
zone setup as the prerouting ones, so its possible that the
reset skb won't find the correct entry. Generating a template
entry for the skb seems error prone as well.
Add an explicit "closing" function that switches a confirmed
conntrack entry to closed state and wire this up for tcp.
If the entry isn't confirmed, no action is needed because
the conntrack entry will never be committed to the table.
Reported-by: Russel King <linux@armlinux.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Marked as 'to be removed soon' since kernel 4.1 (2015).
Functionality was superseded by the 'cluster' match, added in kernel
2.6.30 (2009).
clusterip_tg_check still has races that can give
proc_dir_entry 'ipt_CLUSTERIP/10.1.1.2' already registered
followed by a WARN splat.
Remove it instead of trying to fix this up again.
clusterip uapi header is left as-is for now.
Signed-off-by: Florian Westphal <fw@strlen.de>
nf_conn:mark can be read from and written to in parallel. Use
READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted
compiler optimizations.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a 'reset' flag just like with nft_object_ops::dump. This will be
useful to reset "anonymous stateful objects", e.g. simple rule counters.
No functional change intended.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently netfilter's rpfilter and fib modules implicitely initialise
->flowic_uid with 0. This is normally the root UID. However, this isn't
the case in user namespaces, where user ID 0 is mapped to a different
kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get
the root UID of the user namespace, thus keeping the same behaviour
whether or not we're running in a user namepspace.
Note, this is similar to commit 8bcfd0925e ("ipv4: add missing
initialization for flowi4_uid"), which fixed the rp_filter sysctl.
Fixes: 622ec2c9d5 ("net: core: add UID to flows, rules, and routes")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use the introduced field for correct operation with VRF devices instead
of conditionally overwriting flowic_oif. This is a partial revert of
commit b575b24b8e ("netfilter: Fix rpfilter dropping vrf packets by
mistake"), implementing a simpler solution.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Analogous to commit b575b24b8e ("netfilter: Fix rpfilter
dropping vrf packets by mistake") but for nftables fib expression:
Add special treatment of VRF devices so that typical reverse path
filtering via 'fib saddr . iif oif' expression works as expected.
Fixes: f6d0cbcf09 ("netfilter: nf_tables: add fib expression")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Commit 91a178258a ("netfilter: rpfilter: Convert
rpfilter_lookup_reverse to new dev helper") removed the need for the
'ret' variable. This went unnoticed because of the __maybe_unused
annotation.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
We will soon introduce an optional per-netns ehash.
This means we cannot use tcp_hashinfo directly in most places.
Instead, access it via net->ipv4.tcp_death_row.hashinfo.
The access will be valid only while initialising tcp_hashinfo
itself and creating/destroying each netns.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>