At Handshake, we are assembling a diverse team of dynamic engineers who are passionate about creating high-quality, impactful products. As an ML Infrastructure Engineer, you will play a key role in driving the architecture, implementation, and evolution of our rapidly growing Machine Learning platform. Your technical expertise and leadership will be instrumental in helping millions of students discover meaningful careers, irrespective of their educational background, network, or financial resources.

Our primary focus is on building scalable ML infrastructure that empowers our Machine Learning and Relevance teams to iterate quickly and rapidly deploy ML solutions into various surfaces in our product stack.

Your experience

  • Strong software engineering foundations: Experience with object-oriented and functional programming languages in production (Python, Golang), software design patterns, testing strategies and dev/ops practices (CI/CD).
  • Cloud platform expertise: Hands-on experience with cloud-based technologies. This includes tools like BigQuery, Cloud Storage, infrastructure-as-code tooling (Terraform) and an ML stack (Vertex, SageMaker, Ray, or similar) for handling machine learning workflows.
  • ML aptitude: Experience building product features with ML frameworks like Scikit-Learn, PyTorch, TensorFlow and big data tooling like Spark.
  • Problem-solving prowess: Outstanding problem-solving skills, with the ability to navigate complex machine learning infrastructure challenges and propose innovative, effective solutions.
  • Teamwork oriented: A collaborative approach to work, coupled with the ability to communicate complex machine learning concepts effectively to both technical and non-technical stakeholders, and take input from Relevance stakeholders to guide implementation details.

Bonus areas of expertise

  • ML infrastructure experience: Experience designing, implementing, and managing components of production ML infrastructure including:
      • Real-time/batch ML model prediction and inference
      • Cluster management and deployment, deployment of GPU based training jobs
      • Familiarity with vector search tooling (Elasticsearch, Pinecone, etc…)
      • Creating or developing Feature Store components (offline/online stores and feature registry technologies)
      • Building ML search or ML recommendations systems
  • Containerization and orchestration: Familiarity with containerization technologies like Docker and container orchestration platforms like Kubernetes.
  • Generative AI (LLMs): Familiarity working with large language models such as ChatGPT, LLaMa, or Bard for text generation and Natural Language Processing (NLP) tasks.
  • Streaming data processing: Experience with streaming data processing platforms such as Apache Beam or Apache Flink

Compensation range

$150,000-$190,000

Job Overview
Job alerts

Subscribe to our weekly job alerts below and never miss the latest jobs

Sign in

Sign Up

Forgotten Password

Job Quick Search

Cart

Basket

Share