CloudOps Engineer


The Cloud Ops Engineer will support solutions hosted on AWS, including Linux/Windows servers running on EC2. They will be responsible for all aspects of the production lifecycle of maintenance, and administration, including but not limited to: infrastructure automation, continuous integration and deployment, product release and support, running a scalable production environment for hosting the ARCOS platform, maintaining application/database availability, and ensuring continuous 24x7 production uptime of our services. 

This job is a remote role, and you should be based in Poland.

What you'll do:

  • Design, develop and maintain scalable AWS solutions and infrastructure, including but not limited to: EC2, RDS, S3, DynamoDB, Elasticache, and Route53. 
  • Develop tooling and processes to automate the deployment of SaaS based applications and their underlying operating systems and infrastructure. 
  • Perform PostgreSQL and Oracle database administration, including maintenance, troubleshooting, tuning, optimization, installation, upgrades, backup/recovery, and data migration. 
  • Partner with Engineering, Development, Quality Assurance, Professional Services, and Technical Support to ensure the success of the assigned product offerings and schedules. 
  • Engage in Agile team practices such as daily standups, backlog refinement, release planning and sprint planning. 
  • Coordinate configuration changes, installs, and upgrades with appropriate development teams and product owners while following company change control procedures. 
  • Participate in 24x7 on-call responsibilities, maintaining the availability and performance of all customer-facing production services. 
  • Triage and participate in the resolution of complex problems, including network connectivity issues, that span multiple tiers of application/infrastructure.  
  • Actively monitor supported systems and respond promptly to security or usability concerns. 
  • Review application logs and analyze events using cloud-native services (e.g. CloudWatch, CloudTrail) or third party SIEM tools (e.g. Splunk). 
  • Upgrade systems and processes as required for enhanced functionality and security compliance. 
  • Accurately document all processes and procedures for routine and non-routine tasks.  
  • All other duties and responsibilities as assigned. 

Requirements

  • Bachelor's degree in Computer Science or related field, or equivalent work experience. 
  • 4-5 years of system administration experience, ideally in global management and operations of highly trafficked production applications. Experience working in a 24x7 SaaS environment is preferred. 
  • 4-5 years of experience designing solutions for and managing AWS services, including but not limited to: EC2, RDS, S3, DynamoDB, Elasticache, WAF/Shield, Route53, IAM and Directory Service, ECS, EKS, ECR, DNS, Parameter Store, ALB 
  • 2 years of experience with CI/CD technologies and best practices using AWS CodePipeline, CodeBuild, Github Actions or Bitbucket Pipelines. 
  • 2 years of experience with PostgreSQL, Oracle, SQL Server. 
  • Experience with hosting and supporting ESRI ArcGIS Server and FME Data Integration tools is a plus  
  • Experience with Linux and Windows system administration, automation and performance tuning. 
  • Experience with configuration management and infrastructure as code tools such as Ansible and Terraform. 
  • Experience with Apache, Nginx, Tomcat, NodeJS/PM2. 
  • Experience with scripting languages, including Bash, Python and Powershell. 
  • Knowledge of Kubernetes, Docker, Jira, Confluence. 
  • Advanced knowledge of system vulnerability management and security best practices. 
  • Solid understanding of observability, networking concepts and troubleshooting. 
  • Proven ability to work effectively with highly reliable and highly available mission critical technologies with detail and results shown while meeting deadlines. 
  • Ability to operate deployment automation, SaaS operations, internal and external SaaS infrastructure, security and cost management. 
  • Solid understanding of technical issues and opportunities related to modern cloud infrastructure and operations. Excellent written and verbal communication skills. 

 

Production Support/On-Call Duties: 
As a key member of our engineering team, you will address escalated production issues from customer support. Your responsibilities will include: 

  • Participating in a rotational on-call schedule to handle significant production issues. 
  • Rapidly diagnosing and resolving technical challenges that arise in production. 
  • Collaborating with customer support and engineering teams for seamless issue resolution. 
  • Maintaining clear communication and documentation during and after incidents. 
  • Leveraging these experiences to contribute to continuous process improvement.