DevOps Engineer

ABOUT CLIENT

Our client is a leading research company specializing in technology innovation

JOB DESCRIPTION

Managing and developing our Kubernetes platform across multiple clusters and environments including production, development, on-premises and public cloud.
Designing and overseeing hybrid cloud infrastructure across on-premises and public clouds (such as GCP, AWS), including workload placement, cross-cloud networking, and unified resource management.
Taking responsibility for the end-to-end CI/CD and GitOps process, including container build pipelines, image optimization, and progressive delivery using tools like ArgoCD/FluxCD.
Taking charge of the observability stack to provide a comprehensive view across all clusters using tools like Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus, and supporting agent-assisted SRE workflows.
Managing and enhancing our inference platform, including vLLM serving and AIBrix for multi-model orchestration and autoscaling with a fleet of NVIDIA GPUs.
Operating platform services such as Kafka, Redis, PostgreSQL, OpenSearch.
Managing identity and access management with Keycloak integrated with Google Workspace, strengthening SSO, RBAC, and secrets management across the platform.
Strengthening network security across private load balancers, firewalls, and VPC segmentation and designing and maintaining hub-and-spoke/multi-AZ topologies.
Supporting training infrastructure with self-service VM provisioning, RunPod burst capacity, and Weights and Biases integration.
Driving infrastructure reliability, cost efficiency, and capacity planning as the platform scales.

JOB REQUIREMENT

Strong production experience with Kubernetes, including workloads and controllers, networking, storage, RBAC, and autoscaling. Familiarity with both cloud-managed and on-premises/self-managed Kubernetes is a plus.
Design-level networking experience, including the ability to defend tradeoffs in real network topologies such as hub-and-spoke, multi-AZ/multi-VPC, and equivalent enterprise patterns. Comfortable with VPCs, firewalls, load balancers, private cluster architecture, DNS, and routing. On-premises networking experience is a strong plus.
Proficiency in building and optimizing Dockerfiles and owning full CI/CD pipelines. Experience with CI/CD changes when deploying to Kubernetes is a bonus.
Previous experience in setting up and operating a full observability stack in production, including metrics, logs, traces, and alerting. Familiarity with the Grafana stack is a strong plus.
Comfort with SSO and identity, including integrating tools with a central IdP.
Strong Linux proficiency, infrastructure-as-code, and configuration management skills.
An ownership mindset and familiarity with operating at high-ownership environments.
Hands-on experience with Kafka, Redis, PostgreSQL, or OpenSearch at a production scale is optional but valuable.
Bonus points for experience with OpenStack and KVM virtualization, familiarity with vLLM internals, a background in AI/ML infrastructure or GPU cluster operations, experience with KEDA or event-driven autoscaling patterns, prior open-source contributions, and kernel-level Linux debugging and performance tuning.

WHAT'S ON OFFER

Join a renowned research team to work on impactful projects
Take ownership of the core training code infrastructure used by the team
Engage with real models, data, and scale, rather than small-scale problems
Contribute to bridging the gap between research velocity and engineering quality
Enjoy a flexible work environment with a culture that values depth, clarity, and curiosity

CONTACT

PEGASI – IT Recruitment Consultancy | Email: recruit@pegasi.com.vn | Tel: +84 28 3622 8666
We are PEGASI – IT Recruitment Consultancy in Vietnam. If you are looking for new opportunity for your career path, kindly visit our website www.pegasi.com.vn for your reference. Thank you!

Job Summary

Company Type:

Product

Technical Skills:

Devops, Kubernetes, Network

Location:

Others - Viet Nam

Working Policy:

Onsite

Job ID:

J02107

Status:

Active

Related Job:

Head of Engineer - Tech Fraud & Scams VN

Ho Chi Minh


Product

  • Management
  • Backend
  • Data Engineering

Develop an integrated roadmap for the strategic execution of Customer Onboarding and Mastery, Financial Crime, and Fraud's strategic ambitions. Lead engineering teams across these domains to drive outcomes, necessitating domain knowledge in these areas. Collaborate with business teams and product owners to validate requirements and monitor post-delivery performance. Oversee the runtime of applications in production and provide active operational support. Lead efforts for cyber security updates and ensure software currency versions remain up to date. Manage investment delivery across CET to maintain alignment between domains and ensure effective spending while providing insights on prioritization of spend and its effectiveness.

Negotiation

View details

Head of Engineer - CET

Ho Chi Minh - Viet Nam


Product

  • Management

Develop an integrated strategic roadmap for the execution of the Customer Onboarding and Mastery, Financial Crime, and Fraud's strategic ambitions, and drive the process from development to implementation. Manage multiple engineering teams across the Customer Onboarding and Mastery, Financial Crime, and Fraud domains to achieve desired outcomes, requiring domain knowledge in these areas. Collaborate with business teams and product owners to validate requirements before and after delivery through showcases and Day 2 production monitoring. Take ownership of the development and runtime of applications in production, providing active operational support and establishing a clear support model with engineers proficient in site reliability engineering. Lead efforts related to cyber security updates, ensuring that software currency versions are kept up to date and infrastructure is patched regularly. Oversee investment delivery across the organization to maintain alignment between domains, effectively allocate investments, and provide insights on the prioritization and effectiveness of spending.

Negotiation

View details

Head of Engineer - Home Ownership

Ho Chi Minh - Viet Nam


Product

  • Management

Take charge of technical leadership for the Sub-Domain and ensure timely delivery of Software Development Life Cycle Epics and Features. Guide, train, and support technology resources to enhance their skills and knowledge for optimal performance and to foster a highly effective team. Drive technical delivery with a focus on enhancing speed, cost, and quality of outcomes, while ensuring alignment of the squads on objectives and outcomes. Facilitate lean portfolio management across the Domain by connecting business roadmaps with software delivery and overseeing cross-functional, agile teams to meet performance targets. Accountable for addressing technical or delivery obstacles that cannot be resolved at the squad level. Oversee financials to ensure adherence to the plan and drive continuous improvement in development and delivery processes. Develop a leading market capacity and strategically deploy departmental resources to achieve optimal resource allocation and successful product development.

Negotiation

View details