DevOps Engineer

ABOUT CLIENT

Our client is a leading research company specializing in technology innovation

JOB DESCRIPTION

Managing and developing our Kubernetes platform across multiple clusters and environments including production, development, on-premises and public cloud.
Designing and overseeing hybrid cloud infrastructure across on-premises and public clouds (such as GCP, AWS), including workload placement, cross-cloud networking, and unified resource management.
Taking responsibility for the end-to-end CI/CD and GitOps process, including container build pipelines, image optimization, and progressive delivery using tools like ArgoCD/FluxCD.
Taking charge of the observability stack to provide a comprehensive view across all clusters using tools like Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus, and supporting agent-assisted SRE workflows.
Managing and enhancing our inference platform, including vLLM serving and AIBrix for multi-model orchestration and autoscaling with a fleet of NVIDIA GPUs.
Operating platform services such as Kafka, Redis, PostgreSQL, OpenSearch.
Managing identity and access management with Keycloak integrated with Google Workspace, strengthening SSO, RBAC, and secrets management across the platform.
Strengthening network security across private load balancers, firewalls, and VPC segmentation and designing and maintaining hub-and-spoke/multi-AZ topologies.
Supporting training infrastructure with self-service VM provisioning, RunPod burst capacity, and Weights and Biases integration.
Driving infrastructure reliability, cost efficiency, and capacity planning as the platform scales.

JOB REQUIREMENT

Strong production experience with Kubernetes, including workloads and controllers, networking, storage, RBAC, and autoscaling. Familiarity with both cloud-managed and on-premises/self-managed Kubernetes is a plus.
Design-level networking experience, including the ability to defend tradeoffs in real network topologies such as hub-and-spoke, multi-AZ/multi-VPC, and equivalent enterprise patterns. Comfortable with VPCs, firewalls, load balancers, private cluster architecture, DNS, and routing. On-premises networking experience is a strong plus.
Proficiency in building and optimizing Dockerfiles and owning full CI/CD pipelines. Experience with CI/CD changes when deploying to Kubernetes is a bonus.
Previous experience in setting up and operating a full observability stack in production, including metrics, logs, traces, and alerting. Familiarity with the Grafana stack is a strong plus.
Comfort with SSO and identity, including integrating tools with a central IdP.
Strong Linux proficiency, infrastructure-as-code, and configuration management skills.
An ownership mindset and familiarity with operating at high-ownership environments.
Hands-on experience with Kafka, Redis, PostgreSQL, or OpenSearch at a production scale is optional but valuable.
Bonus points for experience with OpenStack and KVM virtualization, familiarity with vLLM internals, a background in AI/ML infrastructure or GPU cluster operations, experience with KEDA or event-driven autoscaling patterns, prior open-source contributions, and kernel-level Linux debugging and performance tuning.

WHAT'S ON OFFER

Join a renowned research team to work on impactful projects
Take ownership of the core training code infrastructure used by the team
Engage with real models, data, and scale, rather than small-scale problems
Contribute to bridging the gap between research velocity and engineering quality
Enjoy a flexible work environment with a culture that values depth, clarity, and curiosity

CONTACT

PEGASI – IT Recruitment Consultancy | Email: recruit@pegasi.com.vn | Tel: +84 28 3622 8666
We are PEGASI – IT Recruitment Consultancy in Vietnam. If you are looking for new opportunity for your career path, kindly visit our website www.pegasi.com.vn for your reference. Thank you!

Job Summary

Company Type:

Product

Technical Skills:

Devops, Kubernetes, Network

Location:

Others - Viet Nam

Working Policy:

Onsite

Job ID:

J02107

Status:

Active

Related Job:

Embedded Software Engineer (V)

Ho Chi Minh - Viet Nam


Outsource

  • Embedded
  • C/C++

An Embedded Software Engineer has the opportunity to work with a Microcontroller Unit (MCU) that controls various car functions. They will be involved in the entire development process, from requirement analysis to driver/software design, coding, and testing. The engineer will specialize in a specific phase and module within the MCU to enhance their technical skills and domain expertise.

Negotiation

View details

Embedded Software Engineer (FW)

Ho Chi Minh - Viet Nam


Outsource

  • Embedded
  • C/C++

An Embedded Software Engineer plays a vital role in the development of modern car's Microcontroller Unit (MCU). This role involves various tasks such as requirement analysis, driver/software design, coding, and testing, giving the engineer a comprehensive understanding of the full development cycle. Engineers typically focus on a specific phase and module within the MCU to enhance their technical proficiency and domain experience.

Negotiation

View details

Senior Full-Stack Engineer (C# / React, AI Customer Support Platform)

Ho Chi Minh - Viet Nam


Outsource

  • .NET
  • ReactJS
  • Azure

Create and maintain backend services and APIs using C# and .NET technologies Construct contemporary frontend applications and interfaces using React Create adaptable integrations and workflows across platform services Team up with AI and product teams to incorporate intelligent support features and automation Collaborate with frontend, backend, and DevOps teams to produce top-notch solutions Enhance application performance, maintainability, and reliability Engage in technical discussions, code reviews, and architecture decisions Contribute to engineering standards and development best practices

Negotiation

View details