We are seeking a passionate SRE to help create the next big thing in data analysis and search solutions. You will join our SRE team supporting our developers in taking care of the AlphaSense platform. The ideal candidate will be a highly skilled engineer with knowledge of code and automation. Will be working as part of the SRE team to champion and establish SRE culture in AlphaSense.

  • Mission: Elevate our product’s reliability to the level of precision associated with Swiss watch brands, targeting 99.99% uptime. Additionally, enhance existing systems and processes for optimal performance.
  • Collaboration: Engage closely with our engineering teams to comprehend their product requirements and contribute to the improvement of their software application build/test/deploy processes.
  • Responsiveness: Participate in an on-call rotation, promptly addressing AlphaSense availability incidents, and offering support for application engineers during customer incidents.
  • Documentation: Thoroughly document actions to transform findings into repeatable processes and, subsequently, into automation.
  • Troubleshooting: Debug production issues across various services and stack levels.
  • Performance Engineering: Spearhead efforts to enhance system performance by conducting performance testing and implementing resilient monitoring solutions for continuous tracking of system performance metrics. Assess and strategize for system scalability, proactively anticipating future resource demands.
  • Release Engineering: Release Engineering Management: Oversee the end-to-end release process, coordinating the planning, execution, and deployment of software releases. Implement best practices for version control, branching strategies, and release automation to ensure efficient and reliable software delivery. Collaborate with cross-functional teams to streamline release workflows, conduct pre-release testing, and facilitate seamless deployment, contributing to the overall stability and success of the software release lifecycle.

Requirements:

Must-Have:

  • Technical Proficiency: Strong experience in Kubernetes, Helm, Prometheus, Fluentd, Grafana, and other Cloud-Native solutions.
  • System Proficiency:
    • Able to identify opportunities to improve system’s reliability such as utilization, scalability, efficiency and drive the implementation.
    • Your adeptness in operating systems, networks, or hardware empowers you to quickly diagnose complex issues and identify critical system bottlenecks.
    • Mastered the craft of automation for mitigating toil and expert in troubleshooting complex system issues impacting reliability.
    • Know how complex distributed systems fail and look for ways to protect the software and system.
    • Can navigate through full stack application and build proficiency on the right tools to dig deep into the system issues.
  • System Design Proficiency:
    • You have strong experience in building effective processes for continuous improvements of Service Level Objectives (SLOs), mean time to identification (MTTI), mean time to resolution (MTTR), and mean time to failure (MTTF)
    • You can proactively identify issues with technical dependencies of projects that are owned by other teams and surface them
    • You are able to decompose reliability problems or business scenarios into solutions composed of multiple software or systems components interacting with each other.
    • Proficient in enhancing system design and architecture by recognizing key risk areas (failure domains) and strategically prioritizing improvements based on discernment, distinguishing between aspects that can be overlooked and those that yield maximum benefits for the company.
  • Collaboration Proficiency:
    • You are a true partner to stakeholders, executing against the spirit, not just the letter, of requirements.
    • You possess the capability to take ownership of creating documentation at the RFC (Request for Comments) level, which necessitates a high degree of collaboration.
  • Programming Language Proficiency: Experience in one or more of the following: Java, Go, NodeJS, React, Python

Nice to Have:

  • Experience working with public cloud providers – AWS/GCP
  • Experience working with on-call Incident Response solutions

 

AlphaSense is an equal opportunity employer. We are committed to a work environment that supports, inspires, and respects all individuals. All employees share in the responsibility for fulfilling AlphaSense’s commitment to equal employment opportunity. AlphaSense does not discriminate against any employee or applicant on the basis of race, color, sex (including pregnancy), national origin, age, religion, marital status, sexual orientation, gender identity, gender expression, military or veteran status, disability, or any other non-merit factor. This policy applies to every aspect of employment at AlphaSense, including recruitment, hiring, training, advancement, and termination.

In addition, it is the policy of AlphaSense to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations, and ordinances where a particular employee works.

 

Base Compensation Range*:  $173,000 – $191,000

Job Overview
Job alerts

Subscribe to our weekly job alerts below and never miss the latest jobs

Sign in

Sign Up

Forgotten Password

Job Quick Search

Cart

Basket

Share