- Empower large enterprise to run AI and ML at scale, leveraging the best in modern distributed systems and automation technology
- Join a company that has always been remote-friendly – work anywhere in the US or Canada including your sofa, the beach, or our Seattle waterfront office
- Experience rapid growth in the first AI startup to be funded by Google
Algorithmia is the enterprise machine learning operations (MLOps) platform. We enable ML and operations teams to work together on complex machine learning applications in one central location. We make it seamless to deploy ML at scale, with tools to connect data sources and orchestration engines, use any major framework, platform, or ML language, and govern and secure your ML architecture. Over 130,000 engineers and data scientists have used Algorithmia’s platform to date, including the United Nations, government intelligence agencies, and Fortune 500 companies.
We’re looking for a talented Site Reliability Engineer (SRE) to join a passionate, distributed team and have a massive impact by enabling Algorithmia engineering to scale it’s product development and delivery capabilities. This pivotal role intersects a wide breadth of technologies in the DevSecOps space offering an unparalleled opportunity to learn, grow, and impact the most important financial institutions, intelligence agencies, and private companies in the country.
As an SRE at Algorithmia, you will:
- Build, operate, and maintain the infrastructure, tools, and processes that enable Algorithmia to deliver higher quality software at a higher velocity
- Work cross-team to ensure Algorithmia product development can efficiently and securely leverage modern, enterprise-grade infrastructure such as storage, networking, and compute resources
- Automate your role out of existence – then do something even more amazing
- Analyze and gain deep understanding of highly complex infrastructure requirements of Algorithmia’s top tier enterprise customers and deliver solutions that meet their needs.
- Have a real career plan, with mentorship and opportunities for technical leadership, people management, or wherever your interests may be
And we might make a great match if you:
- Are fluent in managing and driving best practices for containerized distributed systems (Docker and Kubernetes) and Certified Kubernetes Administrator (CKA) completion is desirable
- Feel comfortable working at a basic level across a breadth of wide technologies including languages (Java, Scala, Go, Python, Bash, etc.), monitoring tools (Prometheus, Grafana), and cloud providers (AWS, Azure, GCP, VMWare, OpenStack, etc.)
- Have experience with configuration management tools (Ansible, Terraform) and an understanding of infrastructure as code approach to infrastructure management.
- Have experience operating data solutions that power production systems (MySQL, Redis, Etcd, ElasticSearch, RabbitMQ)
- Want to work with modern cloud technologies across multiple platforms and complex distributed systems
- Geek out on topics like chaos engineering, and think proactively about reliability, high availability, and disaster recovery
- Are passionate about automation, and believe nothing should ever be done manually twice, especially when it comes to provisioning infrastructure
- Enjoy working behind the scenes to deliver solutions that enable engineering organizations to scale by empowering engineers
- Bonus points for a love of data science, any kind of AI/ML experience, interesting public code, or the implementation of something cool on our AI marketplace (hint: free trial!)
- Are authorized to work in the US or Canada without sponsorship