atmosera - Senior Cloud Application Support Engineer (Remote
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• Expert-level proficiency in Dynatrace and Azure Insights with a focus on advanced configuration and environment optimization. • Advanced technical expertise in correlating metrics traces and logs to perform deep-dive root cause analysis. • Deep understanding of SRE principles and proven experience managing critical P1 incidents under strict SLAs. • Strong leadership and communication skills to take point on P1/P2 tickets and coordinate with high-level stakeholders. • Ability to evaluate existing support documentation and establish new standards for governance and operating procedures. • Experience in automating manual reporting processes and translating telemetry into actionable business insights. • Strategic analytical skills to detect subtle patterns of instability and prevent potential service disruptions. • Capacity to guide and mentor junior team members on technical best practices and APM tool interpretation. • Proven track record of collaborating on product strategy and managing complex client cloud environments. • Bachelor’s degree in computer science or a related technical major or equivalent professional experience in high-level cloud operations or equivalent job experience. • 5+ years of technical experience with a strong background in managed service providers or cloud hosting environments focusing on senior systems administration. • Bilingual proficiency is required to effectively collaborate across our distributed teams and client base. • Advanced certifications in Dynatrace or other APM platforms are highly preferred to demonstrate expert-level observability skills. • Microsoft Azure certifications are required within 90 days of employment based on current certificates and skill level. • Technical certificates in Azure, Windows, O365, SQL, Linux, VMware, Cisco, Palo Alto, AWS, GCP, Terraform, Dynatrace, or DevOps are a plus.
Responsibilities
• Execute expert-level real-time monitoring and incident dispositioning for critical client applications by leveraging deep technical knowledge of Dynatrace and Azure Insights. • Correlate complex data across metrics, traces, and logs to perform deep-dive root cause analysis and identify performance bottlenecks in distributed environments. • Lead the triage of complex alerting environments to filter noise and ensure that high-priority incidents are identified and managed with surgical precision. • Analyze high-level metrics and daily reports to detect subtle system variations, proactively identifying potential problems to avoid service disruptions. • Evaluate the quality of existing runbooks and spearhead the creation of new standards for operating procedures, governance, and management of client environments. • Act as the primary technical point of contact for P1 incidents, ensuring high-level communication and coordination between all technical and business stakeholders. • Drive the automation of manual reporting processes to improve operational efficiency and provide more accurate insights into environment health and performance. • Enforce SRE best practices and SLA compliance, guiding the team on the proper management of incidents and the strategic creation of problem records. • Mentor junior staff on the execution of complex procedures and the interpretation of APM telemetry to foster a culture of technical excellence and proactivity. • Collaborate on product strategy and the implementation of best practices to optimize the performance and stability of global client environments.
No credit card. Takes 10 seconds.