PaloAltoRecruiter Since 2001
the smart solution for Palo Alto jobs

Site Reliability Engineer - Kubernetes/Terraform/Python

Company: Pangea
Location: Palo Alto
Posted on: August 6, 2022

Job Description:

Pangea is a well-funded Series A rocketship led and invested-in by veterans of the security industry. We are a product-led company, whose mission is to deliver an amazing product built specifically for developers. We are hiring talented software engineers to build a collection of cloud-agnostic security services. Engineers who are passionate about innovating in the security space and driven to deliver exceptional product experiences for developers are an ideal fit for this role.Responsibilities:

  • Ensuring the quality of orchestration and integration of tools needed to support daily operations for Cloud Applications and infrastructure.
  • Implement cloud provider capabilities and services especially as they relate to deployment, monitoring and incident/alert response.
  • Implement cloud capabilities to enable and support SLAs of the entire platform and product and 24x7 availability of services.
  • Automate and orchestrate various parts of the CICD lifecycle.
  • Lead certification efforts regarding testing and implement performance and scale testing of the product and microservices.
  • Proficient in networking and service mesh technologies like Envoy and Istio.
  • Proficient in at least one or more compliance standard (SOC2, ISO27001, PCI, HIPAA, Fedramp, etc.) and be able to implement compliance controls.
  • Proficient in infrastructure management and monitoring for delivering reliable services with required SLO, SLA and SLIs.
  • Develop documentation regarding design of implemented systems.
  • Experience with Total Cost of Ownership (TCO) & Cost of Goods Sold (COGS) analysis and benchmarking.
  • Coordinate systems design and deployment with the greater engineering team.
  • Build infrastructure as a code using Terraform, Ansible and Kubernetes.
  • Partner with developers and quality engineering teams to automate the monitoring, alerting, availability and scalability of our applications and systems.
  • Follow SRE best practices and procedures.Required skills:
    • Experience in Go and/or Python
    • Scaling and maintaining production systems on AWS and/or GCP
    • Managing Kubernetes in a large scale production environment
    • Extensive background in developing and operating large-scale cloud-based distributed applications
    • Direct experience developing/running applications on AWS and Google Cloud.
    • Laser focus and be able to design infrastructure solutions for scalability, reliability, high availability, performance, software maintainability, and operational excellence
    • Well-versed with infrastructure as code software (eg. Terraform, AWS and Google Cloud Deployment, CloudFormation).
    • Experience with Serverless Architecture is preferred. (eg. AWS-Lambda, GCP-CloudRun
    • 5 years' experience in continuous integration practices & tools (Jenkins, Travis CI, CircleCI, etc---)
    • Linux administration in a large-scale SaaS environment.
    • Experience with monitoring solutions such as: CloudWatch, Stackdriver, Prometheus, Graphite, Grafana, ELK, SignalFX, Splunk, Alert Logic, Datadog.
    • Experience with Kafka, Mesos, Spark, Storm, Cassandra, ElasticSearch, PostgreSQL, Redis, Zookeeper, Nginx.If you like building products for developers that are simple and intuitive to use, and enjoy being responsible for solving extremely complex problems, then please submit your application because we would love to speak with you.Different people approach problems differently. We need that. Pangea is committed to diversity as well as inclusion. We are an Equal Opportunity workplace and Affirmative Action employer. We do not discriminate in employment decisions on the basis of race, color, religion, gender (including pregnancy), national origin, political affiliation, sexual orientation, gender identity or expression, marital status, disability, genetic information, age, veteran status, or any other applicable legally protected characteristic. All employment decisions are made on the basis of individual qualifications, merit, and business needs.

Keywords: Pangea, Palo Alto , Site Reliability Engineer - Kubernetes/Terraform/Python, IT / Software / Systems , Palo Alto, California

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest California jobs by following @recnetCA on Twitter!

Palo Alto RSS job feeds