Armstrong Uzoagwa

Profile

Site Reliability and Cloud Platform Engineer with 5 years of hands-on experience building, automating, and operating production infrastructure on AWS and GCP. Comfortable across the full stack of platform work: from writing Kubernetes cluster configurations and Terraform modules to debugging overnight incidents and documenting the runbooks that prevent the next one. Has worked with global teams across multiple time zones and supported over 200 customers on cloud infrastructure decisions. Focused on building systems that stay up, recover fast, and cost less to run.

Experience

Senior Platform Engineer / Site Reliability Engineer Jan 2024 to Jun 2025

Silvershell Integrated Consulting

Designed and deployed a full observability stack using Prometheus, Grafana, and Loki across production environments, cutting mean time to recovery by 40% and enabling the team to catch issues before they became incidents.
Wrote Python automation scripts integrated with REST APIs that reduced environment provisioning time by 50%, replacing a manual process that had been a consistent bottleneck across teams.
Managed Kubernetes clusters running microservice workloads at 99.9% uptime, implementing automated health checks, rollback strategies, and capacity planning to keep services stable under load.
Built enterprise CI/CD pipelines in Harness and Jenkins with SAST and DAST security scanning baked in, improving release velocity by 35% while keeping production deployments safe through canary rollouts.
Led the organisation-wide migration from GitLab to GitHub, creating reusable workflow templates that reduced pipeline maintenance effort significantly across multiple teams.
Worked alongside CloudOps, Security, and Compliance teams across time zones to define DevSecOps governance standards and translate those requirements into practical infrastructure decisions.
Designed Kubernetes migration strategy for hybrid workloads, improving deployment consistency between cloud and on-premises clusters using Terraform and Ansible automation.

DevOps Engineer (SRE Focus) Mar 2021 to Jul 2023

Darey.io / Remote

Built and maintained over 30 CI/CD pipelines using Jenkins and GitLab, integrating SonarQube for code quality gates and Artifactory for artifact management across multiple environments.
Set up Prometheus and Grafana monitoring infrastructure with actionable alerting dashboards, giving the operations team visibility they did not previously have across the platform.
Reduced average incident response time by 20% by automating diagnostic steps, writing structured runbooks, and standardising how the team approached troubleshooting.
Deployed and managed Kubernetes microservices using Helm charts, configuring resource limits, autoscaling policies, and monitoring across development, staging, and production clusters.
Provided technical guidance and cloud infrastructure support to over 200 customers on AWS and GCP, handling requirements gathering and translating business needs into working infrastructure.
Developed Python automation scripts for deployment workflows, infrastructure provisioning, and routine operational tasks to reduce manual effort across the team.

Junior DevOps Engineer Feb 2020 to Jan 2021

Cloud DXA / Contract

Automated AWS EC2 provisioning with Terraform and Python, taking setup time from one hour down to ten minutes and removing a category of manual configuration errors entirely.
Reduced AWS infrastructure costs by 20% by identifying idle resources and implementing automated monitoring to flag waste before it accumulated.
Built GitLab CI/CD pipelines with automated testing and deployment workflows, improving how quickly the development team could ship and validate code.
Deployed Prometheus and Grafana monitoring with dashboards covering CPU, memory, disk, and network across the infrastructure.

Linux Systems Administrator May 2018 to Jan 2021

Asendia UK / Part-Time

Administered Linux (Ubuntu), VMware, and Windows Server environments in enterprise production, keeping systems reliable and compliant with security requirements.
Automated routine administration tasks using Ansible playbooks and roles, reducing the manual workload and making deployments consistent and repeatable.
Monitored system performance and capacity metrics to stay ahead of resource constraints, resolving issues before they affected uptime.
Coordinated patch management and troubleshooting across production to maintain security compliance and minimise unplanned downtime.

Personal Projects

LiwoxDotNet Digital services platform: web development, cloud infrastructure, DevOps and automation. Lighthouse 95+, full observability stack, automated CI/CD.

AstroTypeScriptTailwindCloudflareTerraform

Forex Delight Live algorithmic trading platform with a custom MQL5 Expert Advisor running a multi-timeframe stochastic strategy. Profit-sharing model, user dashboard. Author: Scaling Micro Accounts in Forex.

MQL5Expert AdvisorAstroCloudflare

Smart Living Lifestyle and home content platform, independently built and deployed to Cloudflare's edge network.

AstroTypeScriptTailwindCloudflare

Construction Content platform serving the building and property trades sector.

AstroTypeScriptTailwindCloudflare

Music Original music productions and curated inspirational playlists.

AstroTypeScriptTailwindCloudflare

Key Results

40%

MTTR Reduction

Via Prometheus, Grafana, and Loki observability stack across production.

99.9%

Production Uptime

Sustained on Kubernetes workloads through proactive monitoring and automated failover.

85%

Faster Provisioning

EC2 setup reduced from 60 minutes to under 10 using Terraform and Python.

35%

Release Velocity

Faster deployments through automated CI/CD with safety controls built in.

20%

Cost Reduction

AWS spend reduced through automated resource monitoring and budget controls.

200+

Customers Supported

Cloud infrastructure guidance on AWS and GCP across multiple industries.