AWS – Developer Associate Certification learnings

It’s been almost 2 weeks since I passed the certification exam and I wanted to pen down the high level details of all the components that I have studied to pass the AWS Developer Associate exam.

First off, I would like to thank Stephane Maarek and his wonderful Udemy course – Ultimate AWS Certified Developer Associate without which I am not sure I could have even inched past the priliminary set pieces.

Like every Software Developer worth their salt my fascination to learn about cloud technologies began few years back. With help of Pluralsight courses, I started off my learning. The course, as usual was of excellent calibre but one tiny teeny detail that wasn’t mentioned was the need to monitor the bill. I was of the opinion 750Hrs of free-tier would last a lifetime.

I drifted off the course for a while and forgot to turn off the EC2 instances and voila! One fine day, in my mailbox I saw a bill of AUD $160. I immediately contacted the support centre and had the account suspended.

It really scared me off for a while and I put off learning about it for quite bit of time.

Motivation – I
On and off after that experience I just dabbled with S3 storage and static websites, trying to programatically load some images using Amazon SDK. As part of my Udacity Nano degree experience, I worked on small ETL batch jobs using Python modules by first loading the data on to S3 and then on to Redshift as the final destination. The whole program though left me with a bad taste with one of the worst support system and sub par course quality, though I managed to create some portfolio projects.

As I started off my job search, I realised it’s hard to convince people that I am well acquainted with AWS technologies and I know how to work with them. Though I don’t directly work on it in my current, I am quite aware that the Cloudera offering that we have is deployed across multitude EC2 clusters and we are not using the out-of-the box EMR provided by Amazon.

Additionally, I have been quite often asked if I have certification at least.

Motivation – II
When I started searching for the certification offerings from AWS, I realised the one I really want to give is – AWS Certified Data Analytics – Speciality as I aim to become Big Data Engineer/ Developer. That certification explores whole gamut of technologies that one can utilise as part of Data Analytics of Big Data –

Collection (Kinesis, Database Migration Services (DMS))
Storage (S3, DynamoDB)
Processing (Glue, Lambda, Hive, Spark, Hue, HBase)
Analysis (Redshift, Athena)
Visualisation (QuickInsight)
Security (STS, KMS)

The ones highlighted are something that I have worked\working with. AWS mandates that I need to have a Associate certificate before I can attempt an Speciality certificate. I chose ‘Certified Developer – Associate’ out of the three options. Fielding around with friends and colleagues I could see that Udemy course was a strong first followed by ACloudGuru subscription. I took the former. It was an intense 4 week preparation that ultimately bore the results. So, without much further ado here is the recap of all the suite of products that I have learnt

#Product NameDescription
1IAM (Identity and Access Management)Access Management forms the heart and soul of AWS eco system. It has a global view and all the permissions are governed by Policies (written in JSON) format. Governance is accorded in three segments (Users, Groups, Roles)
2EC2 – Elastic Cloud ComputeEC2 is akin virtual servers on the cloud. AWS provides you whole gamut of choices depending on the 5 distinct characteristics – RAM, CPU, I/O, Network, GPU.
Additionally you can have different launch types too –
On Demand Instances – short workloads
Reserved – Minimum 1 Year
Spot Instances – short workloads, less reliable, can be kicked off the instance
Dedicated Instances – exclusive access to the hardware and not shared by anyone
Dedicated Hosts – Booking of entire physical server, control instance placement etc.
3ELB – Elastic Load BalancerLoad balancers are servers that forward internet traffic to multiple EC2 servers and essentially spread the load to downstream instances. Three types of Load Balancers are present –
Classic Load Balancer
Application Load Balancer (v2)
Network Load Balancer (v2)
4ACG – Auto Scaling GroupPurpose of ASG is to Scale Out (EC2) to match increased load or Scale In to match decreased load. Goes hand in hand with ELB’s. Trigger for scaling can be on CPU, Network or even custom metrics. Various types of scaling can be done – step scaling; scheduled scaling etc
5EBS – Elastic Block Storage
Instance Store
EFS – Elastic File System
EBS is a network drie you can attach to EC2 instance when they run and retain data in case the instances crash. They are locked to AZ. Depending on need various types of storages are available (from large to small, high latencey to low latency etc). A EBS can be attached to only one EC2 instance

Instance Store unlike EBS is like a USB attached to EC2. Available directly from the machine. On flip side, you will lose all the data if instance crashes

Elastic File System is highly scalable expensive storage that is available across multi-AZ. EFS can be attached to multiple EC2 instances.
6RDS – Relational Database Store
Managed database service from AWS stable that provides automated provisioning, continous backup, read replicas, auto-scaling (both vertically and horizontally, os patching) and so on.

Aurora is a serverless Database management from AWS which is akin to AWS RDS on steroids i.e. 5 times more performant.

ElastiCache is similar to EBS i.e. in-memory databases for RDS. It gives ability to cache requests and reduce the hits going to the DB. Remember on the cloud every read/write counts in the cost. Two Types-
Redis – Backup and Restore features
Memcached – Non-persistent
7Route 53A service akin to Traffic Police redirecting road traffic. Redirection can be done at domain level (CNAME), or to another amazon resource (Alias). Various types of routing are available –
Multi Value Routing
Geolocation Routing
Failover Routing
Weighted Routing
Failover Routing
8VPC – Virtual Private CloudVPC isn’t extensively asked for Developer Associate but high level knowledge should suffice. It’s a private network to deploy resource within which public subnet and private subnet can be set-up

NAT Gateway and Internet Gateways would be used to  communicate with www.
9Amazon S3 – Simple Storage Service

Major building blocks of AWS. Infinite storage layer to store wide variety of data. Data is stored in buckets (directories). Version controlling can be enable.
One of the most interesting things I found is the various storage classes capabilities starting from General Purpose to Glacier Deep Archive (min 180 days storage)

Serverless service to perform analytics direclty against S3 files remotely.
10CloudFrontContent Delivery Network to improve read performance, DDoS protection etc. Provides Global Edge Networks; great for static content that must be available everywhere
11ECR – Elastic Container Service
Container Management service for docker installations. ECS clustoers are logical grouping of EC2 instances

Fargate provides serverless management of container services providing high scalability without manual intervention
12Elastic BeanstalkDeveloper centric view of deploying application on AWS. Has three main components – Application, Application Version, Environment name (dev, test, prod) etc.
Provides highly flexible deployment modes – All-At-Once; Rolling; Rolling with Additional Batches; Immutable
Can make use of CLI capabilities to manage entirely via code.
13AWS CICDDevOps on AWS can be done using these components providing CI/CD
CodeCommit –
CodePipeline –
CodeBuild –
CodeDeploy –
14CloudFormationInfrastructe as Code. I absolutely LOVE this feature. It’s just mindblowing in every sense. It’s declarative way of outlining AWS infrastructure.
Create a template of the infrastructure that you desire. It’s then just a matter of creating and removing infrastructure on click of a button.
I will be focusing more on this from now on to enrich my learning
Monitoring –
All the applications sends logs to CloudWatch. Alarm can be set for notificaiton in case of unexpected Metrics.
X-Ray service provides automate trace analysis and Central Service Map Visualiation. Request tracking across distributed systems
Audits API calls made by users/ services/ AWS console. Useful to detect unauthorized calls or root cause of changes
20AWS Integration & Messages –
SQS refers to consumers polling data, data getting deleted after message being read, highly scalable service.
SNS refers to messages being pushed to subscribers, up to 10M subscribers, easy integration with SQS for fan-out pattern
Kinesis is used for streaming data services where the data gets distributed in mutliple shards. Data is read-only which then provides ability to do multiple analysis.

Alteryx – Get SheetNames from Excel file

Once the base Macro is set, we now need another macro which can spit out the sheet names from a given excel file. Perform the following steps to create this macro –
1. Drag in ‘Input Tool’ and connect to any of the existing excel file say ‘Movies.xlsx’. On connection, chose the option – Import only the list of sheet names ( as shown below ). Additionally in the configuration pane of the tool set ‘Output File Name as Field’ to ‘Full Path’

2. Drag in a ‘Formula’ tool and create a new field ‘FullPath’ with the following formula
TrimRight([FileName],'<List of Sheet Names>’)+”‘”+[Sheet Names]+”$’”
3. Drag in a ‘Select’ tool and deselect everything but the ‘FullPath’ field.
4. Drag in a ‘Macro Output’.
Here is how it should look like –

Run the workflow and ensure full path is being shown –