Merge tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - The locking around cpuset hotplug processing has always been a bit of
   mess which was worked around by making hotplug processing
   asynchronous. The asynchronity isn't great and led to other issues.

   We tried to make the behavior synchronous a while ago but that led to
   lockdep splats. Waiman took another stab at cleaning up and making it
   synchronous. The patch has been in -next for well over a month and
   there haven't been any complaints, so fingers crossed.

 - Tracepoints added to help understanding rstat lock contentions.

 - A bunch of minor changes - doc updates, code cleanups and selftests.

* tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits)
  cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints
  selftests/cgroup: Drop define _GNU_SOURCE
  docs: cgroup-v1: Update page cache removal functions
  selftests/cgroup: fix uninitialized variables in test_zswap.c
  selftests/cgroup: cpu_hogger init: use {} instead of {NULL}
  selftests/cgroup: fix clang warnings: uninitialized fd variable
  selftests/cgroup: fix clang build failures for abs() calls
  cgroup/cpuset: Remove outdated comment in sched_partition_write()
  cgroup/cpuset: Fix incorrect top_cpuset flags
  cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice
  cgroup/cpuset: Statically initialize more members of top_cpuset
  cgroup: Avoid unnecessary looping in cgroup_no_v1()
  cgroup, legacy_freezer: update comment for freezer_css_offline()
  docs, cgroup: add entries for pids to cgroup-v2.rst
  cgroup: don't call cgroup1_pidlist_destroy_all() for v2
  cgroup_freezer: update comment for freezer_css_online()
  cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release
  cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints
  cgroup/pids: Remove superfluous zeroing
  docs: cgroup-v1: Fix description for css_online
  ...
This commit is contained in:
Linus Torvalds
2024-05-15 17:06:08 -07:00
27 changed files with 358 additions and 223 deletions

View File

@@ -1208,52 +1208,6 @@ void __init cpuhp_threads_init(void)
kthread_unpark(this_cpu_read(cpuhp_state.thread));
}
/*
*
* Serialize hotplug trainwrecks outside of the cpu_hotplug_lock
* protected region.
*
* The operation is still serialized against concurrent CPU hotplug via
* cpu_add_remove_lock, i.e. CPU map protection. But it is _not_
* serialized against other hotplug related activity like adding or
* removing of state callbacks and state instances, which invoke either the
* startup or the teardown callback of the affected state.
*
* This is required for subsystems which are unfixable vs. CPU hotplug and
* evade lock inversion problems by scheduling work which has to be
* completed _before_ cpu_up()/_cpu_down() returns.
*
* Don't even think about adding anything to this for any new code or even
* drivers. It's only purpose is to keep existing lock order trainwrecks
* working.
*
* For cpu_down() there might be valid reasons to finish cleanups which are
* not required to be done under cpu_hotplug_lock, but that's a different
* story and would be not invoked via this.
*/
static void cpu_up_down_serialize_trainwrecks(bool tasks_frozen)
{
/*
* cpusets delegate hotplug operations to a worker to "solve" the
* lock order problems. Wait for the worker, but only if tasks are
* _not_ frozen (suspend, hibernate) as that would wait forever.
*
* The wait is required because otherwise the hotplug operation
* returns with inconsistent state, which could even be observed in
* user space when a new CPU is brought up. The CPU plug uevent
* would be delivered and user space reacting on it would fail to
* move tasks to the newly plugged CPU up to the point where the
* work has finished because up to that point the newly plugged CPU
* is not assignable in cpusets/cgroups. On unplug that's not
* necessarily a visible issue, but it is still inconsistent state,
* which is the real problem which needs to be "fixed". This can't
* prevent the transient state between scheduling the work and
* returning from waiting for it.
*/
if (!tasks_frozen)
cpuset_wait_for_hotplug();
}
#ifdef CONFIG_HOTPLUG_CPU
#ifndef arch_clear_mm_cpumask_cpu
#define arch_clear_mm_cpumask_cpu(cpu, mm) cpumask_clear_cpu(cpu, mm_cpumask(mm))
@@ -1494,7 +1448,6 @@ out:
*/
lockup_detector_cleanup();
arch_smt_update();
cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}
@@ -1728,7 +1681,6 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
out:
cpus_write_unlock();
arch_smt_update();
cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}