wagey.ggwagey.ggv1.0-0f5e85e-22-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Site Reliability Engineer Role/Bloomreach - Senior Site Reliability Engineer for Datacraft team
Bloomreach

Bloomreach - Senior Site Reliability Engineer for Datacraft team

Remote - Slovakia€41.6 - €52/hour+ Equity1w ago
RemoteSeniorEMEACloud ComputingArtificial IntelligenceSite Reliability EngineerKafkaRedisApache SparkSQLKubernetesGCPPrometheusAirflowTerraformSentryJiraClaudeDatabricksConfluenceGrafanaCursorGeminiSnowflakeData QualityGoPython

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• Languages: Python (primary), Go, SQL Messaging & streaming: Apache Kafka Storage & databases: Databricks, BigQuery, Apache Iceberg, GCS, Mongo, Redis Data processing & orchestration: Apache Spark, DataFlow, Airflow / Cloud Composer Infrastructure: GCP, Kubernetes, Terraform AI / Agentic: LLM APIs, MCP, agent orchestration frameworks Observability: Grafana, Prometheus, Victoria Metrics, PagerDuty, Sentry, OpenTelemetry CI/CD & tooling: GitLab, Jira, Confluence AI coding agents: Cursor, Claude Code • Languages: • Messaging & streaming: • Storage & databases: • Data processing & orchestration: • Infrastructure: • AI / Agentic: • Observability: • CI/CD & tooling: • AI coding agents: • Impact • You can articulate how your contributions transformed the way engineers work and fostered a strong SRE/DevOps culture. • You can demonstrate how impactful reliability work connects to business success and customer outcomes. • Ownership • You embrace the you build it, you run it principle — you love owning what you ship. • You are cost-aware: effective vertical and horizontal autoscaling and detailed telemetry insights are how you demonstrate mindfulness of cloud spend. • Systematic approach • Systematic approach • Infrastructure as Code is the only thing that brings stability into chaos • You design for failure: SLOs, error budgets, and runbooks are first-class artifacts, not afterthoughts. • Data-driven • You use telemetry and metrics to give engineers actionable feedback on how applications and services behave. • You can navigate complex data platform architectures using distributed tracing and debugging. • Solid hands-on experience with GCP (BigQuery, DataProc, Cloud Composer, GCS) and Kubernetes. • Kubernetes • Experience with Python; Go is a strong advantage. • Python • Familiarity with data pipeline technologies (Kafka, Airflow/Cloud Composer, Spark, Iceberg) — you don't need to write ETL code, but you need to operate it reliably and know when something is wrong. • data pipeline technologies • Fluent use of AI coding agents (Cursor, Claude Code, Copilot, Gemini CLI, or similar) — you already use these tools daily to accelerate work. • Fluent use of AI coding agents • Comfortable with on-call rotation and 24/7 incident response. • on-call rotation • Remote-first mindset — you know how to be effective in distributed teams. • You are able to learn and adapt — essential when exploring new tech or navigating our growing codebase. • Strongly preferred • Experience operating single-DWH environments (Snowflak, Databricks or BigQuery). • single-DWH environments • Familiarity with agentic/LLM workloads — API reliability, latency SLOs, trace observability for AI systems. • agentic/LLM workloads • Experience with open table formats (Iceberg, Delta Lake) in production environments. • open table formats • Exposure to data security and compliance in the context of customer-facing DWH integrations (consent, data retention, PII handling). • data security and compliance • Personal qualities • Ownership & accountability — you take issues from detection through to resolution and follow-up prevention. • Ownership & accountability • Systematic thinking — you identify root causes, not symptoms, and document your findings so the team learns. • Systematic thinking • Collaboration & communication — you explain trade-offs and constraints clearly to both engineers and non-engineers. • Collaboration & communication • Bias for reliability — operational excellence (SLOs, oncall friendliness, proactive alerting) is not a chore, it's your craft. • Bias for reliability • Continuous improvement mindset — you are comfortable iterating, revisiting assumptions, and improving incrementally. • Continuous improvement mindset • Comfortable operating remote-first in a distributed team across Central Europe. • Comfortable operating remote-first • Your success story

Responsibilities

• a. Platform reliability & observability • Build and maintain the reliability ecosystem where engineers can safely develop, debug, and operate DataCraft services running on GCP and Kubernetes (DataProc, Cloud Composer, BigQuery, Snowflake/Databricks connectors). • Ensure end-to-end observability across the full data platform — from Kafka ingest through GCS/Iceberg staging, Airflow orchestration, to Databricks and BigQuery destinations — enabling the team to catch missing loads, SLA breaches, and data drifts before customers notice, or costs drift. • end-to-end observability • Drive scalability so services can scale vertically and horizontally based on operational and telemetric data (OpenTelemetry, Prometheus, Victoria Metrics). • Maintain team health dashboards and alerting (Grafana, PagerDuty, Sentry). • b. Infrastructure as Code & deployments • Own and evolve Terraform-based infrastructure for DataCraft services. • Terraform-based infrastructure • Automate deployments, instance setup, and operational runbooks to eliminate manual/semi-manual steps. • Maintain CI/CD pipelines (GitLab) with linters, security scans, and code quality checks, AI code reviews, enabling engineers to produce high-quality MRs. • CI/CD pipelines • c. Security & compliance • Help the team fulfill security requirements for ISO and SOC2 audits by enforcing security principles: key distribution, key rotation, authorization & authentication at the service level, data encryption in transit, data isolation, resource limitations, and audit logs. • ISO and SOC2 audits • Ensure data access controls are properly enforced across multi-DWH environments (BigQuery, Snowflake, Databricks). • d. Incident management & L3 support • Participate in and drive L3 on-call rotation and incident resolution for DataCraft services. • L3 on-call rotation • Contribute tooling for debugging, troubleshooting, and performance testing of data pipelines and orchestration layers. • Use telemetry data and distributed tracing to navigate complex, distributed service architectures. • e. Agentic platform reliability • Ensure reliability and observability of the Loomi Analytics Agent data infrastructure — LLM API gateway performance, MCP server health, and evaluation pipeline availability. • Loomi Analytics Agent data infrastructure • Monitor and alert on data quality issues that could introduce inconsistencies or hallucinations in Loomi's responses — making the agent's data access patterns reliable and debuggable. • Get to know the DataCraft team, the company, and the most important processes. • Set up your local and GCP development environment and complete the Engagement engineering onboarding. • Understand the current state of DataCraft services: pipelines, orchestration, observability gaps, and on-call runbooks. • Start contributing to the L3 on-call rotation, handling incidents, troubleshooting, and debugging — which will sharpen your understanding of the platform and surface fresh improvement ideas. • Deliver your first meaningful reliability improvement: an observability enhancement, a deployment automation, or an SLO definition for a key DataCraft service. • Own the reliability posture of at least one DataCraft domain end-to-end — able to independently design, operate, and continuously improve it. • Drive measurable improvements in MTTR, alert signal-to-noise ratio, or deployment confidence across the team. • Be a trusted reliability partner in architecture discussions — your input shapes how new DataCraft services are designed for operability from day one. • The pay range actually offered will take into account a variety of potential factors considered in compensation, including but not limited to skills, qualifications, geographic location, accomplishments, experience, credentials, internal equity and business needs, and may vary from the range listed above.

Benefits

• €41.600 - €52.000 EUR • More things you'll like about Bloomreach: • Culture: • Culture: • A great deal of freedom and trust. At Bloomreach we don’t clock in and out, and we have neither corporate rules nor long approval processes. This freedom goes hand in hand with responsibility. We are interested in results from day one. • We have defined our 5 values and the 10 underlying key behaviors that we strongly believe in. We can only succeed if everyone lives these behaviors day to day. We've embedded them in our processes like recruitment, onboarding, feedback, personal development, performance review and internal communication. • We believe in flexible working hours to accommodate your working style. • We work virtual-first with several Bloomreach Hubs available across three continents. • We organize company events to experience the global spirit of the company and get excited about what's ahead. • We encourage and support our employees to engage in volunteering activities - every Bloomreacher can take 5 paid days off to volunteer*. • The Bloomreach Glassdoor page elaborates on our stellar 4.6/5 rating. The Bloomreach Comparably page Culture score is even higher at 4.9/5 • Personal Development: • We have a People Development Program - participating in personal development workshops on various topics run by experts from inside the company. We are continuously developing & updating competency maps for select functions. • Our resident communication coach Ivo Večeřa is available to help navigate work-related communications & decision-making challenges.* • Our managers are strongly encouraged to participate in the Leader Development Program to develop in the areas we consider essential for any leader. The program includes regular comprehensive feedback, consultations with a coach and follow-up check-ins. • Bloomreachers utilize the $1,500 professional education budget on an annual basis to purchase education products (books, courses, certifications, etc.)* • Well-being: • Well-being: • The Employee Assistance Program -- with counselors -- is available for non-work-related challenges.* • Subscription to Calm - sleep and meditation app.* • We organize ‘DisConnect’ days where Bloomreachers globally enjoy one additional day off each quarter, allowing us to unwind together and focus on activities away from the screen with our loved ones. • We facilitate sports, yoga, and meditation opportunities for each other. • Extended parental leave up to 26 calendar weeks for Primary Caregivers.* • Restricted Stock Units or Stock Options are granted depending on a team member’s role, seniority, and location.* • Everyone gets to participate in the company's success through the company performance bonus.* • We offer an employee referral bonus of up to $3,000 paid out immediately after the new hire starts. • We reward & celebrate work anniversaries -- Bloomversaries!* • (*Subject to employment type. Interns are exempt from marked benefits, usually for the first 6 months.) • Excited? Join us and transform the future of commerce experiences! • If this position doesn't suit you, but you know someone who might be a great fit, share it - we will be very grateful! • Any unsolicited resumes/candidate profiles submitted through our website or to personal email accounts of employees of Bloomreach are considered property of Bloomreach and are not subject to payment of agency fees.

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X