Bayesian Health, Inc. - Infrastructure Engineer
Requirements
• 5+ years of experience building and operating production cloud infrastructure on AWS as a DevOps, Infrastructure, Site Reliability Engineer, or similar role. • Proficient with Kubernetes, preferably with EKS, including cluster bootstrapping and day-2 ops. • Strong operational knowledge of relational databases such as PostgreSQL/MySQL (backups, failover, performance tuning). • Deep expertise in Terraform (or equivalent IaC) and an eye for building clean and scalable modules. • Familiarity with observability tools, particularly the Datadog • Experience building infrastructure with sensitive data that contains PHI/PII. • Knowledge of CI/CD pipelines, preferably with CircleCI • Excellent communication skills and a proven ability to collaborate with cross-functional teams (e.g., engineering, data science) to translate requirements into robust technical solutions. • Experience handling ambiguity and uncertainty in a startup. • Experience with using AI agents to optimize infrastructure management or DevOps workflows. • Experience with disaster recovery or business continuity plans. • Experience with multi account, multi cluster topologies. • Experience building systems in healthcare, life sciences, or similarly regulated industries • Chaos engineering or game-day facilitation. • Experience implementing and maintaining a GitOps framework.
Responsibilities
• As an Infrastructure Engineer, you will build and maintain the networking and infrastructure for the Bayesian platform and develop CI/CD pipelines to enable other team members such as software engineers, data scientists, etc. to accelerate their development. This role is crucial to drive expansion of our clinical AI/ML module offerings, health system enterprise-wide implementations, and revenue growth. • Design cost-optimized, fault-tolerant infrastructure for scale: Propose enhancement to our infrastructure design to enable us to expand our client base and deploy new products on our platform while managing cloud costs and ensuring reliability. • Streamline development and deployment: Define a branching and promotion strategy that allow us to comply with the regulatory change control process. Build and maintain CI/CD pipelines using GitHub for automated testing and deployment. • Establish and evangelize infrastructure best practices: Create infrastructure guidelines and templates such as Terraform modules, and educate team members in leveraging them. • Infrastructure support and maintenance: Continuous monitoring of system performance and reliability, and apply software upgrades accordingly. Collaborate with other team members in troubleshooting infrastructure issues and optimize performance. • Secure infrastructure: Partner with SecOps engineer to implement security best practices complying with HIPAA, HITRUST, FDA, and client requirements. • AI Ops Platform Architecture: Architect and build a secure, internal AI Ops platform to safely host and manage AI/ML agents for infrastructure and DevOps optimization.
Apply in one click
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT