wagey.ggwagey.ggv1.0-0f5e85e-22-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Staff Engineer Role/OKX - Staff/Senior Staff Engineer, Kubernetes
Pro members applied to this job 36 hours before you saw itGet Pro ›
OKX

OKX - Staff/Senior Staff Engineer, Kubernetes

Singapore5d ago
In OfficeStaffAPACCloud ComputingArtificial IntelligenceStaff EngineerKubernetesReportingAWSAlibaba CloudShellJenkinsPythonGovernanceResource AllocationLinuxDockercontainerdIstioPrometheusTerraformEnvoyELKHelmAnsibleJaegerGrafana

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Responsibilities

• K8s cluster lifecycle management: Own the build, scaling, version upgrades, daily operations, fault diagnosis, and performance tuning of large-scale production Kubernetes clusters; ensure 7×24 high availability and stable operations; support continuous business iteration. • Alibaba Cloud & AWS multi-cloud operations (core responsibility): Operate, govern, and optimize Alibaba Cloud and AWS resources across dual-cloud environments, covering container services, networking, storage, IAM, load balancing, databases, and object storage; manage configuration changes, cost optimization, and disaster recovery to achieve unified multi-cloud governance. • Cloud-native architecture and optimization: Lead containerization and microservices operational rollout; optimize Pod scheduling, resource quotas, network policies, image management, and log monitoring systems; resolve cluster resource fragmentation, business adaptation, and network interoperability challenges. • Stability and security: Build comprehensive K8s cluster monitoring, alerting, logging, and distributed tracing systems; define operations runbooks, change processes, and incident response plans; strengthen cluster security controls, disable high-risk permissions, harden container runtime environments, and ensure infrastructure and business data security. • Automated operations and DevOps: Develop operations automation scripts using Shell/Python; integrate Jenkins, GitLab CI, and ArgoCD to build automated release, inspection, and backup systems; implement Infrastructure as Code (IaC) principles to improve efficiency and reduce human error. • Incident management and post-mortem optimization: Lead online incident response, conduct root cause analysis, produce post-mortem reports, and continuously optimize cluster architecture, resource allocation, monitoring strategy, and long-term stability assurance mechanisms. • Technical knowledge sharing and team empowerment: Track Cloud Native and public cloud technology developments; document operations best practices and technical specifications; assist the team in improving multi-cloud K8s operations capabilities. • What We Look For In You • Bachelor's degree or above in a computer-related field; 4+ years of hands-on experience operating production-level Kubernetes clusters; proficient in K8s core principles and components including Pod, Deployment, StatefulSet, Service, Ingress, CRD, controllers, scheduling strategies, network models, and storage mounting; able to independently resolve complex cluster failures and performance bottlenecks. • Proficient in Alibaba Cloud and AWS dual-cloud operations, with independent experience in dual-cloud production environments: • Alibaba Cloud: proficient in ACK Container Service, ECS, SLB, VPC, RAM, RDS, OSS, CloudMonitor, security groups, and snapshot backups. • AWS: proficient in EKS, EC2, S3, VPC, IAM, TGW, load balancing, CloudWatch, and security policies; practical experience in overseas cloud deployment, operations, and disaster recovery. • Proficient in Linux system administration; familiar with system optimization, permission control, process management, log analysis, and online troubleshooting. • Familiar with mainstream container runtimes (containerd/Docker); understand K8s networking (CNI plugins such as Calico/Flannel), storage (CSI), and multi-cluster management; familiar with Istio/Envoy service mesh, east-west traffic governance, gray-scale releases, and network interoperability. • Strong Shell and Python automation skills; experienced with CI/CD pipelines (Jenkins, GitLab CI, ArgoCD); familiar with IaC tools (Terraform, Ansible, Helm); experienced with observability stacks (Prometheus, Grafana, ELK/EFK, Jaeger, SkyWalking). • Preferred: experience in large-scale public cloud environments (100+ nodes); multi-cloud cost optimization; K8s security hardening (OPA/Gatekeeper, Pod Security Standards, Falco); Kubernetes CKA/CKS certification; experience with AI/LLM workload scheduling (GPU scheduling, distributed training).

Benefits

• L&D programs and education subsidy for employees' growth and development • Various team building programs and company events • Wellness and meal allowances • Comprehensive healthcare schemes for employees and dependants • More that we love to tell you along the process! • All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website. • If in doubt, please apply directly through our official careers website. • Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.

Similar Jobs

SardineSardine - Data Engineer4d ago
·Remote - United States·$150k - $205k/year + Equity
RemoteNASeniorCloud ComputingData AnalyticsData EngineerDocumentationTeam LeadershipProduct MarketingKPI TrackingPythonFivetranSQLdbtAirflowSalesforceSnowflakeAmplitudeAWSGCPKubernetesDockerTableauLookerData VisualizationSegmentMixpanelB2BStakeholder ManagementCloseData QualityGovernance
GalaxyGalaxy - VP, Protocol / Backend Engineer4d ago
·Remote - USA
RemoteNAVpCryptocurrencyCloud ComputingBackend EngineerPerlPythonTerraformRustC++JavaGoChainlinkLinuxAWSGCPAzureDockerPrometheusELKDatadogGrafanaKubernetes
Affinity.coAffinity.co - Engineering Manager, Relationship Intelligence4d ago
·Remote - Canada (Remote)·$176k - $220k/year + Equity
RemoteNAStaffCloud ComputingSoftwareEngineering ManagerTeam ManagementCoachingReactRubyTypeScriptAWSKubernetesFull Stack
Stack AVStack AV - Senior Cyber Security Engineer4d ago
·Remote - Pittsburgh, PA or Remote
RemoteNASeniorCybersecuritySoftwareSecurity EngineerCybersecurity EngineerSplunkLinuxTerraformPythonDockerKubernetesAnsiblePrismaGoogle Workspace
GuidePoint SecurityGuidePoint Security - Senior Application Security Engineer - Southeast region (Remote)4d ago
·Remote - USA *
RemoteNASeniorCybersecurityCloud ComputingApplication Security EngineerAdvisorClient ConsultingAWSAzureGCPKubernetesGreenhouseZoomWagmiCPC
Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X