Reduce ResolvedTargetData lock contention by sharding targets map

In the parallel writing phase of 'gn gen', multiple threads access a
single 'std::shared_mutex' and 'UniqueVector' inside ResolvedTargetData
at high frequency. Although shared locks (read locks) are used, the
atomic reference count operations within the shared mutex cause massive
cache line bouncing (lock contention) on multi-core systems, severely
degrading parallel performance.

To resolve this, this CL shards the shared targets map and its mutex
into 128 independent shards. We also use std::hash to distribute targets
evenly across shards and avoid pointer alignment biases (which would
happen with simple pointer bit shifts).

We conducted hyperfine benchmarks on Chromium 'out/Default' across
different shard counts to find the optimal value, compared to the
original baseline:
- Base (No sharding, 1 lock): 2.716s ± 0.059s
- 8 shards: 2.723s ± 0.056s (Lock contention unresolved)
- 16 shards: 2.755s ± 0.096s
- 32 shards: 2.495s ± 0.011s (approx. 8.1% faster than Base)
- 64 shards: 2.531s ± 0.062s
- 128 shards: 2.443s ± 0.089s (approx. 10.0% faster than Base / -0.27s, Optimal spot)
- 256 shards: 2.507s ± 0.064s
- 512 shards: 2.523s ± 0.093s
- 1024 shards: 2.534s ± 0.073s

128 shards is chosen as it provides the best performance scaling for
high-core workstations while keeping minimal memory overhead.

Cumulative CPU time of 'file_write_ninja' phase: approx. 33% reduction.

Bug: 502431091
Change-Id: I366057a87e17e3438f028e1142928ba55e47dc69
Reviewed-on: https://gn-review.googlesource.com/c/gn/+/22801
Reviewed-by: Matt Stark <msta@google.com>
Commit-Queue: Takuto Ikuta <tikuta@google.com>
diff --git a/src/gn/resolved_target_data.cc b/src/gn/resolved_target_data.cc
index 628e6a4..db5a8ed 100644
--- a/src/gn/resolved_target_data.cc
+++ b/src/gn/resolved_target_data.cc
@@ -8,20 +8,22 @@
 
 ResolvedTargetData::TargetInfo* ResolvedTargetData::GetTargetInfo(
     const Target* target) const {
+  size_t shard_idx = GetShardIndex(target);
+  Shard& shard = shards_[shard_idx];
   {
-    std::shared_lock<std::shared_mutex> lock(map_mutex_);
-    size_t index = targets_.IndexOf(target);
+    std::shared_lock<std::shared_mutex> lock(shard.mutex);
+    size_t index = shard.targets.IndexOf(target);
     if (index != UniqueVector<const Target*>::kIndexNone) {
-      return infos_[index].get();
+      return shard.infos[index].get();
     }
   }
 
-  std::unique_lock<std::shared_mutex> lock(map_mutex_);
-  auto ret = targets_.PushBackWithIndex(target);
+  std::unique_lock<std::shared_mutex> lock(shard.mutex);
+  auto ret = shard.targets.PushBackWithIndex(target);
   if (ret.first) {
-    infos_.push_back(std::make_unique<TargetInfo>(target));
+    shard.infos.push_back(std::make_unique<TargetInfo>(target));
   }
-  return infos_[ret.second].get();
+  return shard.infos[ret.second].get();
 }
 
 void ResolvedTargetData::ComputeLibInfo(TargetInfo* info) const {
diff --git a/src/gn/resolved_target_data.h b/src/gn/resolved_target_data.h
index a184e3d..0046481 100644
--- a/src/gn/resolved_target_data.h
+++ b/src/gn/resolved_target_data.h
@@ -357,9 +357,25 @@
   // on demand (hence the mutable qualifier). Implemented with a
   // UniqueVector<> and a parallel vector of unique TargetInfo
   // instances for best performance.
-  mutable std::shared_mutex map_mutex_;
-  mutable UniqueVector<const Target*> targets_;
-  mutable std::vector<std::unique_ptr<TargetInfo>> infos_;
+  // We shard the TargetInfo map to reduce lock contention under the
+  // high-concurrency parallel writing phase of 'gn gen'. 128 shards is chosen
+  // as the sweet spot based on benchmarking, providing optimal scaling for
+  // high-core workstation counts (up to 128 threads) with negligible memory
+  // overhead from empty shards.
+  static constexpr size_t kNumShards = 128;
+  struct Shard {
+    mutable std::shared_mutex mutex;
+    UniqueVector<const Target*> targets;
+    std::vector<std::unique_ptr<TargetInfo>> infos;
+  };
+
+  // We use std::hash to distribute targets evenly across shards and avoid
+  // pointer alignment biases.
+  size_t GetShardIndex(const Target* target) const {
+    return std::hash<const Target*>()(target) % kNumShards;
+  }
+
+  mutable Shard shards_[kNumShards];
 };
 
 #endif  // TOOLS_GN_RESOLVED_TARGET_DATA_H_