BigQuery TableReference Initialization Error: Too Many Arguments provided

I tend to agree now with my friend’s recent observation that google cloud documentation is a wreck and doesn’t provide proper examples for some of the basic things in its documentation sdk.

I needed to use the function TableReference. When you search for it you will land in the following page — Class TableReference where it’s not clear on how to use

So here is an example code I was using —

from google.cloud import bigquery

# Create a client object.
client = bigquery.Client()

# Get the project ID.
project_id = "my-project-id"

# Get the dataset ID.
dataset_id = "my-dataset"

# Get the table ID.
table_id = "my-table"

# Create a table reference.
table_ref = bigquery.TableReference(
    project_id=project_id,
    dataset_id=dataset_id,
    table_id=table_id,
)

print(table_ref)

This is the error I was getting

TypeError: TableReference.__init__() takes 3 positional arguments but 4 were given

Fix to be applied is pretty simple.

# Create a table reference using from_string
table_ref = bigquery.TableReference.from_string("{project_id}.{dataset_id}.{table_id}")

Generate Nested JSON using BigQuery

I had an interesting use case as part of my work wherein we needed to generate Nested JSON out of table definitions (i.e. from the INFORMATION_SCHEMA.COLUMNS) that was then used by other system for further processing. Any changes being done on the schema, say adding or removal of column or data type change, we were doing this manually to the JSON file. This process was error prone. As the number of tables to maintain increased, the vulnerability of manually modifying the JSON as and when changes being done at the table level happened.

Looking for simpler solution when I looked around google BigQuery had some ready made operators up to the task namely –

  1. ARRAY_AGG – This returns an ARRAY of expression values specified and can also work with aggregation.
  2. STRUCT – Constructs a container of ordered fields i.e. like a list in python. Returns an ARRAY object.

Let me illustrate the use case with an example. Say we have following dataset (data taken from here)-

Marvel and DC Superhero character info
Read More »