Time flies. Looking back at the first one I wrote which was almost 5 years back, I feel it did serve it purpose. Just looking at my resume and what I had done in the past with roles and responsiblities, it doesn’t give the complete picture. It’s always best to have your accomplishments and things that you achieved jotted down in time tested STAR fashion.
The circumstances of the post definitely isn’t in dire situation but I realized given there have been two shifts from my last post, it’s best I put down some brain dump. WIthout further ado, here it goes.
Woolworths Group
It was a complete happenstance through which I got the job at Woolies. In this day and age, I am not sure my profile would have been picked up. I will be writing a complete new post on it. This was my first taste into world of Cloud Platform. Up until then, the profile that I had set-up was predominantly featuring Alterxy, SQL, Python and databases like Amazon Redshift, Postegres.
My job at Woolies was moslty in the world of Airflow and SQL with little bit sprinkling of Cloud Functions, Cloud Build thrown in. Some projects were challenging, some were gruelling to work (there were bunch of sql scripts each containing 1000 lines of code) all getting executed via DAGs, some outright contentious (where you begin to wonder if you ever had a voice). With all that said, everything is an experience. Here it goes..
- Config File Generation
- Challenge – Tableau dashboards were getting refreshed via Tableau Data Sources (tds) which were generated via Airflow DAG’s. The file generation process involved having a config file for the data source that is to be published in form of a nested JSON file i.e. essentially having a bunch of metadata attributes with dashboard name, dashboard link, and then another subsection within which is essentially the data types and column names.
Any changes being done on the schema, say adding or removal of column or data type change, we were doing this manually to the JSON file. This process was error prone. As the number of tables to maintain increased, the vulnerability of manually modifying the JSON as and when changes being done at the table level happened.
I made the mistake once wherein I updated a wrong file with the column changes needing a late night debugging call with at least 8 people online and weeding out the root cause. It felt bad and it was then I decided this needed a change. - Action Taken – Setting up a google sheet which just contained header attributes and the table name whose schema details were to be generated. An external table was then created on top of it. I then created a Airflow DAG that performs the operation of iterating through each entry of the google sheet, generate the required json using the INFORMATION_SCHEMA.TABLES data and the fixed attributes and dump the file in the desired destination.
- Result – Made the entire system fully robust and entirely automated and scalable. Immaterial of the changes that are needed (be it removal of column, adding new etc) is just handled automatically.
- Challenge – Tableau dashboards were getting refreshed via Tableau Data Sources (tds) which were generated via Airflow DAG’s. The file generation process involved having a config file for the data source that is to be published in form of a nested JSON file i.e. essentially having a bunch of metadata attributes with dashboard name, dashboard link, and then another subsection within which is essentially the data types and column names.