I had an interesting use case as part of my work wherein we needed to generate Nested JSON out of table definitions (i.e. from the INFORMATION_SCHEMA.COLUMNS) that was then used by other system for further processing. Any changes being done on the schema, say adding or removal of column or data type change, we were doing this manually to the JSON file. This process was error prone. As the number of tables to maintain increased, the vulnerability of manually modifying the JSON as and when changes being done at the table level happened.
Looking for simpler solution when I looked around google BigQuery had some ready made operators up to the task namely –
- ARRAY_AGG – This returns an ARRAY of expression values specified and can also work with aggregation.
- STRUCT – Constructs a container of ordered fields i.e. like a list in python. Returns an ARRAY object.
Let me illustrate the use case with an example. Say we have following dataset (data taken from here)-