According to Payscale.com, someone with the skillsets proven by the Amazon Web Services Cloud Certified Solutions architect associate makes on average $130,883 in the US and Canada.
Getting certified isn't too hard but will require an AWS free tier account and some effort to study the materials because nothing AWS is intuitive, in my humble opinion. These are my notes, and in the spirit of notes, it's a WIP. I'd love to hear your thoughts in the comment section below!
Table of Contents:
Designing Resilient Architectures
- Tell me more about the exam
- Choose reliable/resilient storage
- Determine how to design decoupling mechanisms using AWS servivces
- Determine how to design a multi-tier architecture solution
- Determine how to design high availabilty and/or fault tolerant solutions.
- Example questions
- Test Axioms
I think the hardest part about writing these is creating a table of contents. I should find a way to automate this!
Tell me more about the exam!
The exam will cover the following areas:
Domain | % of exam |
---|---|
Design Resiliant Architectures | 30% |
Define Performant Solutions | 28% |
Specify Secure Applications and Architectures | 24% |
Design Cost-optimized Architectures | 18% |
total | 100% |
You can sign up for a test at https://www.aws.training/certification?src=arc-assoc and you will need a proctor either online or probably online with covid ๐.
The test will:
- validate that you possess skills to create infrastructure on AWS.
- provide a certification that will open job opportunities
- the exam validates that you can define a solution using architectural design principles based on customer requirements and that you provide implementation guidance based on best practices to the organization throughout the lifecycle of the project.
- It is a prerequisit for the professional certification which averages $148,456/year.
- Questions are multiple choice
- No penalties for guessing
- Time limit is 130 Minutes
- You can mark a question to come back later
- 65 questions on the exam
- Before submitting you can review all your answers.
- Cost to take the exam is $130USD
Choose reliable/resilient storage
This way when you have a disaster, you don't lose storage or state.
Here are some of the storage types:
EC2 Instance Store
- Instance store is ephemeral. When it is terminated, the stores are deleted.
- Only certain EC2 instances
- Fixed capacity
- Disk type and capacity depends on EC2 instance type
- Application-level durability.
๐ก Generally you want to use the instance store for caching or temporary data that you are storing somewhere else
Elastic Block Store (EBS)
- Some EBS volumes can be configured for IOPS
- Can only attach on EBS per volume at a time.
- Different types
- Encryption
- Snapshots
- Provisioned capacity
- Independent lifecycle than EC2 instance
-Multiple volumes striped to create large volumes- EBS Hardware types:
- SSD
- Lower IOPS lower throughput (good for random access)
- More Expensive
- HDD
- Higher throughput but lower IOPS (good for sequential access)
- Cheaper
๐ก Think of EBS as durable attachable storage for your EC2 instances
Amazon EFS
- File sotrage in the AWS Cloud
- Shared storage
- Petabyte-scale file system
- Elastic capacity (grows up and down with your needs)
- Supports NFS v4.0 and 4.1 (NFSv4) protocol
- Compatible with Linux-based AMIs for Amazon EC2, not supported on windows
Amazon S3
- Consistency model
- Storage classes & durability - Storage classes
- Standard S3 - cheaper for upload/download
- Standard Infrequently Accessed (IA) - more expensive for access
๐ก To give you unlimited storage in S3, it is implemented as a distributed system. You always want to ask what the consistency model is on distributed systems. Is it strongly consistent or is it eventually consistent. It's eventually consistent on updates and strongly consistent on existing storage. If you were eventually consistent and updated a file and did a read/write after, you may get the old version of the object.
- Encryption (data at rest)
- SSE-S3
- SSE-KMS
- SSE-C
- Encryption (data in transit) - HTTPS
- Versioning - this way you can see previous version of files and will protect against accidental deletion.
- Access control
- Multi-part upload
- Internet-API accessible
- Virtually unlimited capacity
- Regional Availabilty
- Highly Durable - 99.999999999% (a lot of nines)
Amazon Glacier
Ideal for long term storage
- Data backup and archive storage
- Vaults and archives
- Retrievals
- expedited
- standard
- bulk
- Encryption by default
- Amazon S3 object lifecycle policy, can move data into glacier after a set period of time.
- Regionally availablility
- Highly durable - 99.999999999%
Determine how to design decoupling mechanisms using AWS servivces
- If you lose one mechanism, you don't lose others.
- In tightly coupled systems, a loss of one service forces the whole system to be down. This presents problems upon failure or if maintenance must be performed.
- Aysynchronous operation of SQS is an example of this decoupling where messages queue and pick back up where they left off when the other services are back online.
- Scaling services are especially helpful.
- Load balancers are especially helpful in reducing the volume of requests to a single service.
- Elastic IPs are another example of how to move a VPC from one server to another. This decouples the IP from the identity of the server improving reliability.
Determine how to design a multi-tier architecture solution
"High Availability"
"EVERYTHING FAILS, ALL THE TIME" -- Werner Vogels, CTO AMAZON.COM
- Scaling helps to avoid issues and takes advantage of decoupled services
CloudFormation
CloudFormation is a tool that allows you to describe your infrustructure as JSON templates and then converts that template into a stack. This blurs the line between hardware and software.
- Declarative programming language for deploying AWS resources.
- Uses templates and stacks to provision resources.
- Create, update, and delete a set of resources as a single unit (stack).
- This promotes resiliance.
Determine how to design high availabilty and/or fault tolerant solutions.
- In the face of faults, your services will still be available.
This section is a WIP
Example questions
- A database is running on an EC2 instance. The database software has a backup feature that requires block storage. What storage option would be the lowest cost option for the backup data?
A. Amazon glacier
B. EBS Cold HDD Volume
C. Amazon S3
D. EBS Throughput Optimized HDD Volume
๐ก Glacier and S3 are not block storage. We can rule out A. & C. We can rule out D. because it not the lowest cost Volume type. The correct Answer is B.
Which of the following AWS services facilitate the implementation of loosely coupled architectures? (Select two)
A. AWS CloudFormation B. Amazon Simple Queue Service C. AWS CloudTrail D. Elastic Load Balancing E. Amazon Elastic MapReduce๐ก A can be ruled out because it's a service for standing up stacks of components. C AWS Cloud Trail is a logging service and can be ruled out and lastly, Amazon Elastic MapReduce can be ruled out because it's managed Hadoop. Two answers remain: B & D are the correct answers.
Your web service has a performance SLA (Service Level Agreement) to respond to 99% of requests in <1 second. Under normal and heavy operations, distributing requests over 4 (four) instances meets performance requirements . What architecture ensures cost efficient high availablity of your service if an Availability Zone (AZ) becomes unreachable?
A. Deploy the service on 4 (four) servers in a single Availability Zone B. Deploy the service on 6 (six) servers in a single Availability Zone C. Deploy the service on 4 (four) servers across 2 (two) availability zones. D Deploy the service on 8 (eight) servers across 2 (two) availability zones.๐ก The key words here are: "Under heavy operations, 4 instances meets requirements" And (paraphrasing) "How can we cost effectively manage that when 1 AZ is no longer available? If we have 4 or 6 servers on a single AZ and it isn't responding, we will not meet our SLA so we can rule out the 1st two options because that is not highly available! We can rule out D. because that is too much compute for our requirements because it is not cost efficient. Let's consider option 3.
- Your Web Service has a performance SLA to respond to 99% of requests in <1 second. Under normal and heavy operations, distributing requests over 4 (four) instances meets performance requirements. What architecture ensures cost efficient fault-tolerant operation of your service if an Availability Zone becomes unreachable?
A. Deploy the service on 4 (four) servers in a single Availability Zone B. Deploy the service on 6 (six) servers in a single Availability Zone C. Deploy the service on 4 (four) servers across 2 (two) availability zones. D. Deploy the service on 8 (eight) servers across 2 (two) availability zones.๐ก This question is just like the last one but instead of asking for high availability, we are asking for fault tolerance. Fault tolerance is a higher degree of availability - this means that if there is a fault, the end user will not be effected. The answer is D.
- You are planning to use CloudFormation to deploy a Linux EC2 instance in two different regions using the same base Amazon Machine Image (AMI). How can you do this using CloudFormation?
A. Use 2 different CloudFormation templates since CloudFormation templates are region specific. B. Use Mappings to specify the base AMI since AMI ID's are different in each region. C. Use parameters to specify the base AMI since AMI IDs are different in each region. D. AMI IDs are identical across Regions๐ก A can be ruled out because templates are not region specific. You can rule out C because parameters are for inputs from users and AMI IDs are difficult for users to enter. While you could do it, it'd be fragile. D can be ruled out because AMI IDs are not identical across regions (https://cloud-images.ubuntu.com/locator/ec2/) which leads us to Answer B.
- How can I access the output of print statements from Lambda?
A. SSH into Lambda and look at system logs B. Lambda writes all output to Amazon S3 C. CloudWatch Logs D. Print Statements are ignored in Lambda๐ก We can rule out A because lambda doesn't allow SSH access. B is not true at all and lastly, D can be ruled out becuase print statements are not ignore in lambda infact they get written to C. CloudWatch Logs which are invaluable for debugging lambda.
- You are running an EC2 Instance which uses EBS for storing it's state. You take an EBS snapshot every day. When the system crashes, it takes you 10 minutes to bring it up again from the snapshot. What is your RTO and RPO going to be?
A. RTO will be 1 day, RPO will be 10 minutes. B. RTO will be 10 minutes, RPO will be 1 Day. C. RTO and RPO will be 10 minutes D. RTO and RPO will be 1 day.๐ก Knowing what RTO and RPO mean is key to this question. RTO: Recovery Time Objective or how long it takes the system to be restored and is measured in minutes/hours/days. RPO: Recovery Point Objective is how much data is lost if the data is lost and it can be measured in time or data storage. The answer is B because it takes 10 minutes to restore from snapshot. The RPO will be 1 day.
Test Axioms:
๐ก Test Axioms are key take aways from answers on the test.
- Expect "Single AZ" will never be the right answer.
- Using AWS managed services should always be preferred.
- Fault tolerant and high availability are not the same things.
- Expect that everything will fail at some point and design accordingly.
Drew is a seasoned DevOps Engineer with a rich background that spans multiple industries and technologies. With foundational training as a Nuclear Engineer in the US Navy, Drew brings a meticulous approach to operational efficiency and reliability. His expertise lies in cloud migration strategies, CI/CD automation, and Kubernetes orchestration. Known for a keen focus on facts and correctness, Drew is proficient in a range of programming languages including Bash and JavaScript. His diverse experiences, from serving in the military to working in the corporate world, have equipped him with a comprehensive worldview and a knack for creative problem-solving. Drew advocates for streamlined, fact-based approaches in both code and business, making him a reliable authority in the tech industry.