ZeroRisk - Software Engineer – Support & Operations

Dublin, Ireland2mo ago

In Office EMEA Diagnostics Cloud Computing Software Engineer Application Support Engineer Angular Java Team Management AWS Git

Requirements

• Essential • Essential • Approximately three years of professional software engineering experience with practical exposure to both Java and Angular. • Experience debugging issues in a cloud-hosted or SaaS environment, comfortable working with incomplete information under time pressure. • Working knowledge of AWS and confidence navigating cloud infrastructure for investigative purposes. • Strong log analysis skills — able to construct queries, correlate events across services, and draw diagnostic conclusions. • Clear written communication — able to explain a production issue and its resolution to both engineers and non-technical stakeholders. • Familiarity with Git-based workflows and standard code review practices. • Desirable • Desirable • Experience building internal tooling or automation to support operational workflows. • Exposure to observability or incident management platforms such as Datadog, PagerDuty, or Grafana. • Understanding of relational database query analysis — able to identify slow or problematic queries as part of an investigation. • Familiarity with containerised deployment environments. • Ways of Working • Ways of Working • Diagnose before fixing. Understanding the root cause before touching code — and documenting that understanding — is what reduces incidents over time rather than managing them indefinitely. • Diagnose before fixing. • Every incident is a learning opportunity. A playbook update or post-incident note is part of the job, not an optional extra. • Every incident is a learning opportunity. • Build things that scale. Recurring manual investigation steps are a signal to build a tool, not a reason to repeat the same steps indefinitely. • Build things that scale. • Escalate early. If an investigation is not progressing, raise it to the Senior Engineer promptly. A delayed escalation is a worse outcome than an early one. • Escalate early. • Reporting Structure • Reporting Structure • Reports to the Scrum Master / Team Manager. Works closely with Senior Engineers for code review and escalation of complex fixes. Escalates systemic issues to the Staff Engineer. • Scrum Master / Team Manager • Senior Engineers • Staff Engineer • This role offers a clear development path toward a Senior Engineer position for engineers who broaden their full stack and infrastructure skills, or toward a specialist Site Reliability Engineer (SRE) track for those with a stronger infrastructure and observability focus.

Responsibilities

• Production Investigation & Bug Fixing • Triage and investigate production issues — querying logs, correlating events across services, and identifying root causes rather than surface symptoms. • Navigate both Angular front end and Java back end codebases to trace issues end to end, from a reported UI behaviour through to service logic and data layer. • Implement targeted code fixes for confirmed bugs, ensuring all changes are covered by tests, submitted via pull request, and reviewed before merging. • Escalate fixes that require architectural change or touch high-risk areas of the codebase to the Senior Engineer before proceeding. • Produce clear post-incident summaries covering what happened, root cause, resolution, and steps being taken to prevent recurrence. • Internal Tooling • Internal Tooling • Build and maintain internal tools that help the team investigate and manage recurring production issues more efficiently — log query utilities, diagnostic dashboards, automation scripts, and similar. • Identify manual or repetitive investigation steps that are candidates for tooling and prioritise building solutions that save meaningful time across incidents. • Maintain existing tooling to ensure it remains accurate and useful as the platform evolves. • Technology & Tooling Evaluation • Proactively research and evaluate new tools and technologies that could improve operational efficiency — observability platforms, incident management tooling, log analysis tools, and similar. • Produce concise assessments of candidate tools covering capability, integration effort, cost, and recommendation, sharing findings with the Senior Engineer and Staff Engineer. • Stay current with relevant developments in the SaaS operations and observability space. • AWS Environment • AWS Environment • Navigate the AWS environment to support production investigations — reviewing logs, metrics, and infrastructure state to identify environment-level contributors to issues. • Work with the DevOps function where infrastructure changes are required as part of issue resolution. • Playbooks & Documentation • Build and maintain a library of debugging playbooks and how-to guides covering common production issues — step-by-step enough that any engineer can follow them. • Update playbooks after every significant incident to incorporate new learnings. • Identify gaps in the playbook library and prioritise filling them based on incident frequency and impact.