PaloAltoRecruiter Since 2001
the smart solution for Palo Alto jobs

Site Reliability Engineer

Company: DFINITY
Location: Palo Alto
Posted on: May 3, 2021

Job Description:

DFINITY is reimagining the Internet as a public network that hosts secure software and services. The Internet Computer is a new technology stack that will be unhackable, fast, scales to billions of users around the world, and supports a new kind of autonomous software that promises to reverse Big Techs monopolization of the internet. DFINITY was founded in 2016 by Dominic Williams and is backed by top-tier institutions including Polychain Capital and Andreessen Horowitz.As the Site Reliability Engineering, you will be providing operational support for the Internet Computer components at the application layer. This includes on-going development of systems that monitor the Internet Computers health, corrective actions in case of incidents,.Responsibilities:Select, design, build, deploy, and maintain the services used to ensure high availability of DFINITY's productIdentify opportunities to automate or improve processes by writing code, and then write the codeBake reliability and operability in to the product from the start, by participating in design and code reviews, identifying risks, problems, and mitigationsWork with other engineering and security teams to define processes that preserve the goals of the Internet Computer while remaining operationally feasible and automatableWork with product owners to set SLOs, then implement SLOs in code and observability infrastructureOn-call for production services. 12/7 (on-call is split across two sites), roughly 1 week in 6. As issues may be caused by problems in wildly different areas of the code the chief responsibility is to coordinate the response to the issue and ensure it is resolved, pulling in engineers from other teams as necessary. On-call work is compensated with generous time offThis is not a team that exists to be on-call. This is a team that elects to be on-call because it helps do the job better. Being on-call makes it easier and more motivating to identify opportunities to reduce the number of alerts the system generates.Operating, troubleshooting, and deploying software to Unix systemsRequirements:Think about things in a systemic, methodical way, especially when troubleshootingKnow when This is good enough for the next 12 months is appropriateCoordinate incident response across multiple teams -- clearly understanding and communicating what is going on, next steps, who is responsible for what, and so onWrite code. We use Rust -- you don't need to know Rust already, there'll be opportunities to learn, but experience designing and writing moderate sized applications (up to ~ 10Kloc) is necessary. Identifying opportunities to automate or improve processes by writing code, and then write the code to do it is key.Within 1 month you willUnderstand DFINITY's infrastructure and production environmentPicked a suitable starter projectSubmitted improvements to our documentation and process that you will have noticed during onboarding.Within 3 months you willHave delivered the starter projectShadowed other team members on-call, and be ready to join the on-call rotation from month 4 onwardsPro-actively identified other improvements and proposed projects to deliver themAll qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.SDL2017

Keywords: DFINITY, Palo Alto , Site Reliability Engineer, Other , Palo Alto, California

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest California jobs by following @recnetCA on Twitter!

Palo Alto RSS job feeds