moosakhalid@prod-box:~
Initializing system profile Loading kernel modules: kubernetes aws terraform Mounting incident-response.service Starting 24x7 on-call daemon Connecting to prod cluster: netflix-global 300M+ subscribers in scope — handle with care Profile loaded. Welcome back.
  ███╗   ███╗ ██████╗  ██████╗ ███████╗ █████╗
  ████╗ ████║██╔═══██╗██╔═══██╗██╔════╝██╔══██╗
  ██╔████╔██║██║   ██║██║   ██║███████╗███████║
  ██║╚██╔╝██║██║   ██║██║   ██║╚════██║██╔══██║
  ██║ ╚═╝ ██║╚██████╔╝╚██████╔╝███████║██║  ██║
  ╚═╝     ╚═╝ ╚═════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝

  ██╗  ██╗██╗  ██╗ █████╗ ██╗     ██╗██████╗
  ██║ ██╔╝██║  ██║██╔══██╗██║     ██║██╔══██╗
  █████╔╝ ███████║███████║██║     ██║██║  ██║
  ██╔═██╗ ██╔══██║██╔══██║██║     ██║██║  ██║
  ██║  ██╗██║  ██║██║  ██║███████╗██║██████╔╝
  ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝╚══════╝╚═╝╚═════╝
// Senior Site Reliability Engineer
moosa@prod:~/sre $
htop — system metrics
12+yrs_experience
300M+subscribers
$3M+cloud_savings
24×7on_call
8certifications
5courses_published
cat ./summary.txt
$ about

Senior infrastructure engineer with 12+ years of experience building and operating large-scale distributed systems supporting hundreds of millions of subscribers. Expertise in cloud-native architecture (AWS, Kubernetes), multi-region systems, GitOps, stress testing and tuning complex application configurations, and production debugging across complex service graphs.

cat ./experience.log
$ experience
Netflix
Senior Site Reliability Engineer — Critical Operations & Reliability Engineering
CURRENT 2024 – Present
  • Diagnosed and resolved complex cross-service production failures spanning application, storage, and networking layers within a large-scale microservices architecture across critical streaming paths.
  • Served on a 24×7 global on-call rotation and directed org-wide incident response strategy for high-severity events affecting 300M+ global subscribers, leading response, mitigation, and recovery.
  • Standardized incident response mechanisms and reliability patterns across global engineering teams. Set up paved-road incident response for remote teams in Poland, Australia, and CORE US.
  • Led incident command, post-incident reviews (PIRs), and toil-reduction initiatives, driving systemic fixes and long-term reliability improvements.
  • Identified long-tail reliability and security risks from deprecated Java libraries and planned a company-wide deprecation and migration campaign spanning thousands of applications.
  • Collaborated with Trust & Safety and Security Eng teams to surface impactful DDoS attacks, analyze affected call paths, and tune mitigation measures for streaming continuity.
  • Maintained and created critical KPI dashboards; tuned metric-based alerts to reduce pager fatigue.
microservices incident-command observability DDoS-mitigation PIR java
Workday
Principal Site Reliability Engineer
2023 – 2024
  • Led design and documentation of Workday's migration from legacy deployment to GitOps-based CI/CD using Terraform.
  • Spearheaded cost-saving initiatives across the Public Cloud pillar and initiated FinOps best practices across AWS and GCP.
  • Optimized systems and workflows by improving code, runbooks, automation, CI/CD, and observability.
  • Owned and maintained full lifecycle of containerized workloads on production Kubernetes clusters on AWS globally.
  • Provided technical mentorship and supported L&D team with interviewing and recruiting.
GitOps Terraform Kubernetes FinOps AWS GCP
Workday
Senior Site Reliability Engineer
2021 – 2023
  • Led and executed multi-million-dollar cost optimization projects in public cloud (AWS), achieving savings in excess of $3M per annum.
  • Led deployment, upgrades, and go-live of Workday applications in new global regions from code commit to validation.
  • Made a direct impact on company earnings by enabling efficient cloud deployment, allowing expansion to Europe, Australia, and US FedRAMP.
  • Wrote code to consolidate manual tasks and create centralized automation for credential storage across global cloud environments.
Kubernetes Helm AWS Linux MySQL Python Ansible Terraform Jenkins
A Cloud Guru
AWS / DevOps Architect
2019 – 2021
  • Conceptualized, designed, and implemented AWS and DevOps projects. Published 5 courses on acloudguru.com covering Terraform, Ansible, AWS, AWS Serverless, and AWS SSM.
  • Infrastructure as Code (IaC) SME — built public-facing project-based courses including HashiCorp Certified Terraform Associate.
  • Demonstrated competence and mentored colleagues through certification exams and knowledge sharing.
Terraform Ansible AWS Serverless course-author
Opsview Inc.
Sr. Technical Infrastructure Consultant
2018 – 2019
  • Independently designed and implemented a SaaS offering on AWS using Terraform, Ansible, Packer, and core AWS services (IAM, RDS Aurora Serverless, Route53, ACM, ELB, VPC, EC2, S3).
  • Owned cloud monitoring plugin bugs and submitted fixes in Python and Golang.
  • Implemented CIS security hardening of CentOS cloud images for automated deployments using Packer.
  • Planned and executed migrations from on-prem to cloud (AWS, Azure) environments.
AWS Terraform Ansible Packer Python Golang
Opsview Inc.
Customer Success Engineer
2017 – 2018
  • Helped enterprise customers operate, deploy, and implement Opsview's unified monitoring solution. Debugged plugins and submitted patches across cloud, Linux, database, and network modules in Python, Bash, Golang, and Perl.
Python Bash Golang monitoring
Ericsson — Managed Services RMEA
Technical Services Consultant, 2nd Level Support
2013 – 2016
  • Maintained and resolved issues across telecom operator end-to-end infrastructure. Accelerated service deployment with a 25% increase in new product launches and ~$20K revenue increase.
Linux Oracle MySQL telecom
ls -la ./certifications/
$ certifications
Certified Kubernetes Administrator (CKA)
Certified Kubernetes Application Developer (CKAD)
AWS Certified Solutions Architect – Professional
AWS Certified DevOps Engineer – Professional
AWS Certified Security – Specialty
AWS Certified Database – Specialty
AWS Certified Advanced Networking – Specialty
AWS Certification Subject Matter Expert
cat /etc/education.conf
$ education
NUCES – FAST, Lahore, Pakistan
BS Telecommunication
2008 – 2012
git log --oneline ./writing-and-projects/