Update and add metadata documentation

Change-Id: Ic3cadb3ad5523301245ad8565a1bc9a12b79b02d
Reviewed-on: https://gn-review.googlesource.com/c/gn/+/5640
Reviewed-by: Brett Wilson <brettw@chromium.org>
Commit-Queue: Julie Hockett <juliehockett@google.com>
diff --git a/docs/reference.md b/docs/reference.md
index 3a98061..1601d01 100644
--- a/docs/reference.md
+++ b/docs/reference.md
@@ -154,6 +154,7 @@
     *   [input_conversion: Processing input from exec_script and read_file.](#io_conversion)
     *   [label_pattern: Matching more than one label.](#label_pattern)
     *   [labels: About labels.](#labels)
+    *   [metadata_collection: About metadata and its collection.](#metadata_collection)
     *   [ninja_rules: How Ninja build rules are named.](#ninja_rules)
     *   [nogncheck: Annotating includes for checking.](#nogncheck)
     *   [output_conversion: Specifies how to transform a value to output.](#io_conversion)
@@ -6079,13 +6080,13 @@
 ### <a name="var_walk_keys"></a>**walk_keys**: Key(s) for managing the metadata collection walk.
 
 ```
-  Defaults to [].
+  Defaults to [""].
 
   These keys are used to control the next step in a collection walk, acting as
   barriers. If a specified key is defined in a target's metadata, the walk will
   use the targets listed in that value to determine which targets are walked.
 
-  If no walk_keys are specified for a generated_file target (i.e. "[]"), the
+  If no walk_keys are specified for a generated_file target (i.e. "[""]"), the
   walk will touch all deps and data_deps of the specified target recursively.
 
   See "gn help generated_file".
@@ -6822,6 +6823,147 @@
     //net  ->  //net:net
     //tools/gn  ->  //tools/gn:gn
 ```
+### <a name="metadata_collection"></a>**Metadata Collection**
+
+```
+  Metadata is information attached to targets throughout the dependency tree. GN
+  allows for the collection of this data into files written during the generation
+  step, enabing users to expose and aggregate this data based on the dependency
+  tree.
+```
+
+#### **generated_file targets**
+
+```
+  Similar to the write_file() function, the generated_file target type
+  creates a file in the specified location with the specified content. The
+  primary difference between the function and the target type is that the
+  write_file function does the file write at parse time, while the
+  generated_file target type writes at target resolution time. See
+  "gn help generated_file" for more detail.
+
+  When written at target resolution time, the generated_file enables GN to
+  collect and write aggregated metadata from dependents.
+
+  A generated_file target can declare either 'contents' (to write statically
+  known contents to a file) or 'data_keys'(to aggregate metadata and write the
+  result to a file). It can also specify 'walk_keys' (to restrict the metadata
+  collection), 'output_conversion', and 'rebase'.
+```
+
+#### **Collection and Aggregation**
+
+```
+  Targets can declare a 'metadata' variable containing a scope, and this
+  metadata is collected and written to file by generated_file aggregation
+  targets. The 'metadata' scope must contain only list values, since the
+  aggregation step collects a list of these values.
+
+  During the target resolution, generated_file targets will walk their
+  dependencies recursively, collecting metadata based on the specified
+  'data_keys'. 'data_keys' is specified as a list of strings, used by the walk
+  to identify which variables in dependencies' 'metadata' scopes to collect.
+
+  The walk begins with the listed dependencies of the 'generated_file' target,
+  for each checking the "metadata" scope for any of the "data_keys". If
+  present, the data in those variables is appended to the aggregate list. Note
+  that this means that if more than one walk key is specified, the data in all
+  of them will be aggregated into one list. From there, the walk will then
+  recurse into the dependencies of each target it encounters, collecting the
+  specified metadata for each.
+
+  For example:
+
+    group("a") {
+      metadata = {
+        doom_melon = [ "enable" ]
+        my_files = [ "foo.cpp" ]
+        my_extra_files = [ "bar.cpp" ]
+      }
+
+      deps = [ ":b" ]
+    }
+
+    group("b") {
+      metadata = {
+        my_files = [ "baz.cpp" ]
+      }
+    }
+
+    generated_file("metadata") {
+      outputs = [ "$root_build_dir/my_files.json" ]
+      data_keys = [ "my_files", "my_extra_files" ]
+
+      deps = [ ":a" ]
+    }
+
+  The above will produce the following file data:
+
+    foo.cpp
+    bar.cpp
+    baz.cpp
+
+  The dependency walk can be limited by using the "walk_keys". This is a list of
+  labels that should be included in the walk. All labels specified here should
+  also be in one of the deps lists. These keys act as barriers, where the walk
+  will only recurse into targets listed here. An empty list in all specified
+  barriers will end that portion of the walk.
+
+    group("a") {
+      metadata = {
+        my_files = [ "foo.cpp" ]
+        my_files_barrier [ ":b" ]
+      }
+
+      deps = [ ":b", ":c" ]
+    }
+
+    group("b") {
+      metadata = {
+        my_files = [ "bar.cpp" ]
+      }
+    }
+
+    group("c") {
+      metadata = {
+        my_files = [ "doom_melon.cpp" ]
+      }
+    }
+
+    generated_file("metadata") {
+      outputs = [ "$root_build_dir/my_files.json" ]
+      data_keys = [ "my_files", "my_extra_files" ]
+
+      deps = [ ":a" ]
+    }
+
+  The above will produce the following file data (note that `doom_melon.cpp` is
+  not included):
+
+    foo.cpp
+    bar.cpp
+
+  A common example of this sort of barrier is in builds that have host tools
+  built as part of the tree, but do not want the metadata from those host tools
+  to be collected with the target-side code.
+```
+
+#### **Common Uses**
+
+```
+  Metadata can be used to collect information about the different targets in the
+  build, and so a common use is to provide post-build tooling with a set of data
+  necessary to do aggregation tasks. For example, if each test target specifies
+  the output location of its binary to run in a metadata field, that can be
+  collected into a single file listing the locations of all tests in the
+  dependency tree. A local build tool (or continuous integration infrastructure)
+  can then use that file to know which tests exist, and where, and run them
+  accordingly.
+
+  Another use is in image creation, where a post-build image tool needs to know
+  various pieces of information about the components it should include in order
+  to put together the correct image.
+```
 ### <a name="ninja_rules"></a>**Ninja build rules**
 
 #### **The "all" and "default" rules**
diff --git a/tools/gn/command_help.cc b/tools/gn/command_help.cc
index 78c9161..7145da4 100644
--- a/tools/gn/command_help.cc
+++ b/tools/gn/command_help.cc
@@ -13,6 +13,7 @@
 #include "tools/gn/input_conversion.h"
 #include "tools/gn/label.h"
 #include "tools/gn/label_pattern.h"
+#include "tools/gn/metadata.h"
 #include "tools/gn/ninja_build_writer.h"
 #include "tools/gn/output_conversion.h"
 #include "tools/gn/parser.h"
@@ -83,6 +84,8 @@
   PrintShortHelp("label_pattern: Matching more than one label.",
                  "label_pattern");
   PrintShortHelp("labels: About labels.", "labels");
+  PrintShortHelp("metadata_collection: About metadata and its collection.",
+                 "metadata_collection");
   PrintShortHelp("ninja_rules: How Ninja build rules are named.",
                  "ninja_rules");
   PrintShortHelp("nogncheck: Annotating includes for checking.", "nogncheck");
@@ -93,7 +96,8 @@
                  "runtime_deps");
   PrintShortHelp("source_expansion: Map sources to outputs for scripts.",
                  "source_expansion");
-  PrintShortHelp("switches: Show available command-line switches.", "switch_list");
+  PrintShortHelp("switches: Show available command-line switches.",
+                 "switch_list");
 }
 
 void PrintSwitchHelp() {
@@ -108,7 +112,8 @@
   Do "gn help --the_switch_you_want_help_on" for more. Individual commands may
   take command-specific switches not listed here. See the help on your specific
   command for more.
-)", "switch_list");
+)",
+                "switch_list");
 
   if (is_markdown)
     OutputString("```\n", DECORATION_NONE);
@@ -195,6 +200,7 @@
   PrintLongHelp(kInputOutputConversion_Help, "io_conversion");
   PrintLongHelp(kLabelPattern_Help, "label_pattern");
   PrintLongHelp(kLabels_Help, "labels");
+  PrintLongHelp(kMetadata_Help, "metadata_collection");
   PrintLongHelp(kNinjaRules_Help, "ninja_rules");
   PrintLongHelp(kNoGnCheck_Help, "nogncheck");
   PrintLongHelp(kRuntimeDeps_Help, "runtime_deps");
@@ -335,6 +341,9 @@
   };
   random_topics["label_pattern"] = []() { PrintLongHelp(kLabelPattern_Help); };
   random_topics["labels"] = []() { PrintLongHelp(kLabels_Help); };
+  random_topics["metadata_collection"] = []() {
+    PrintLongHelp(kMetadata_Help);
+  };
   random_topics["ninja_rules"] = []() { PrintLongHelp(kNinjaRules_Help); };
   random_topics["nogncheck"] = []() { PrintLongHelp(kNoGnCheck_Help); };
   random_topics["runtime_deps"] = []() { PrintLongHelp(kRuntimeDeps_Help); };
diff --git a/tools/gn/metadata.cc b/tools/gn/metadata.cc
index 5091645..bc4c272 100644
--- a/tools/gn/metadata.cc
+++ b/tools/gn/metadata.cc
@@ -6,6 +6,143 @@
 
 #include "tools/gn/filesystem_utils.h"
 
+const char kMetadata_Help[] =
+    R"(Metadata Collection
+
+  Metadata is information attached to targets throughout the dependency tree. GN
+  allows for the collection of this data into files written during the generation
+  step, enabing users to expose and aggregate this data based on the dependency
+  tree.
+
+generated_file targets
+
+  Similar to the write_file() function, the generated_file target type
+  creates a file in the specified location with the specified content. The
+  primary difference between the function and the target type is that the
+  write_file function does the file write at parse time, while the
+  generated_file target type writes at target resolution time. See
+  "gn help generated_file" for more detail.
+
+  When written at target resolution time, the generated_file enables GN to
+  collect and write aggregated metadata from dependents.
+
+  A generated_file target can declare either 'contents' (to write statically
+  known contents to a file) or 'data_keys'(to aggregate metadata and write the
+  result to a file). It can also specify 'walk_keys' (to restrict the metadata
+  collection), 'output_conversion', and 'rebase'.
+
+
+Collection and Aggregation
+
+  Targets can declare a 'metadata' variable containing a scope, and this
+  metadata is collected and written to file by generated_file aggregation
+  targets. The 'metadata' scope must contain only list values, since the
+  aggregation step collects a list of these values.
+
+  During the target resolution, generated_file targets will walk their
+  dependencies recursively, collecting metadata based on the specified
+  'data_keys'. 'data_keys' is specified as a list of strings, used by the walk
+  to identify which variables in dependencies' 'metadata' scopes to collect.
+
+  The walk begins with the listed dependencies of the 'generated_file' target,
+  for each checking the "metadata" scope for any of the "data_keys". If
+  present, the data in those variables is appended to the aggregate list. Note
+  that this means that if more than one walk key is specified, the data in all
+  of them will be aggregated into one list. From there, the walk will then
+  recurse into the dependencies of each target it encounters, collecting the
+  specified metadata for each.
+
+  For example:
+
+    group("a") {
+      metadata = {
+        doom_melon = [ "enable" ]
+        my_files = [ "foo.cpp" ]
+        my_extra_files = [ "bar.cpp" ]
+      }
+
+      deps = [ ":b" ]
+    }
+
+    group("b") {
+      metadata = {
+        my_files = [ "baz.cpp" ]
+      }
+    }
+
+    generated_file("metadata") {
+      outputs = [ "$root_build_dir/my_files.json" ]
+      data_keys = [ "my_files", "my_extra_files" ]
+
+      deps = [ ":a" ]
+    }
+
+  The above will produce the following file data:
+
+    foo.cpp
+    bar.cpp
+    baz.cpp
+
+  The dependency walk can be limited by using the "walk_keys". This is a list of
+  labels that should be included in the walk. All labels specified here should
+  also be in one of the deps lists. These keys act as barriers, where the walk
+  will only recurse into targets listed here. An empty list in all specified
+  barriers will end that portion of the walk.
+
+    group("a") {
+      metadata = {
+        my_files = [ "foo.cpp" ]
+        my_files_barrier [ ":b" ]
+      }
+
+      deps = [ ":b", ":c" ]
+    }
+
+    group("b") {
+      metadata = {
+        my_files = [ "bar.cpp" ]
+      }
+    }
+
+    group("c") {
+      metadata = {
+        my_files = [ "doom_melon.cpp" ]
+      }
+    }
+
+    generated_file("metadata") {
+      outputs = [ "$root_build_dir/my_files.json" ]
+      data_keys = [ "my_files", "my_extra_files" ]
+
+      deps = [ ":a" ]
+    }
+
+  The above will produce the following file data (note that `doom_melon.cpp` is
+  not included):
+
+    foo.cpp
+    bar.cpp
+
+  A common example of this sort of barrier is in builds that have host tools
+  built as part of the tree, but do not want the metadata from those host tools
+  to be collected with the target-side code.
+
+Common Uses
+
+  Metadata can be used to collect information about the different targets in the
+  build, and so a common use is to provide post-build tooling with a set of data
+  necessary to do aggregation tasks. For example, if each test target specifies
+  the output location of its binary to run in a metadata field, that can be
+  collected into a single file listing the locations of all tests in the
+  dependency tree. A local build tool (or continuous integration infrastructure)
+  can then use that file to know which tests exist, and where, and run them
+  accordingly.
+
+  Another use is in image creation, where a post-build image tool needs to know
+  various pieces of information about the components it should include in order
+  to put together the correct image.
+)";
+
 bool Metadata::WalkStep(const BuildSettings* settings,
                         const std::vector<std::string>& keys_to_extract,
                         const std::vector<std::string>& keys_to_walk,
diff --git a/tools/gn/metadata.h b/tools/gn/metadata.h
index 06bd495..7b6168a 100644
--- a/tools/gn/metadata.h
+++ b/tools/gn/metadata.h
@@ -11,6 +11,8 @@
 #include "tools/gn/scope.h"
 #include "tools/gn/source_dir.h"
 
+extern const char kMetadata_Help[];
+
 // Metadata about a particular target.
 //
 // Metadata is a collection of keys and values relating to a particular target.
diff --git a/tools/gn/variables.cc b/tools/gn/variables.cc
index b35443d..74781aa 100644
--- a/tools/gn/variables.cc
+++ b/tools/gn/variables.cc
@@ -2042,13 +2042,13 @@
 const char kWalkKeys_Help[] =
     R"(walk_keys: Key(s) for managing the metadata collection walk.
 
-  Defaults to [].
+  Defaults to [""].
 
   These keys are used to control the next step in a collection walk, acting as
   barriers. If a specified key is defined in a target's metadata, the walk will
   use the targets listed in that value to determine which targets are walked.
 
-  If no walk_keys are specified for a generated_file target (i.e. "[]"), the
+  If no walk_keys are specified for a generated_file target (i.e. "[""]"), the
   walk will touch all deps and data_deps of the specified target recursively.
 
   See "gn help generated_file".