DevOps Engineer

ABOUT CLIENT

Our client is a leading research company specializing in technology innovation

JOB DESCRIPTION

Managing and developing our Kubernetes platform across multiple clusters and environments including production, development, on-premises and public cloud.
Designing and overseeing hybrid cloud infrastructure across on-premises and public clouds (such as GCP, AWS), including workload placement, cross-cloud networking, and unified resource management.
Taking responsibility for the end-to-end CI/CD and GitOps process, including container build pipelines, image optimization, and progressive delivery using tools like ArgoCD/FluxCD.
Taking charge of the observability stack to provide a comprehensive view across all clusters using tools like Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus, and supporting agent-assisted SRE workflows.
Managing and enhancing our inference platform, including vLLM serving and AIBrix for multi-model orchestration and autoscaling with a fleet of NVIDIA GPUs.
Operating platform services such as Kafka, Redis, PostgreSQL, OpenSearch.
Managing identity and access management with Keycloak integrated with Google Workspace, strengthening SSO, RBAC, and secrets management across the platform.
Strengthening network security across private load balancers, firewalls, and VPC segmentation and designing and maintaining hub-and-spoke/multi-AZ topologies.
Supporting training infrastructure with self-service VM provisioning, RunPod burst capacity, and Weights and Biases integration.
Driving infrastructure reliability, cost efficiency, and capacity planning as the platform scales.

JOB REQUIREMENT

Strong production experience with Kubernetes, including workloads and controllers, networking, storage, RBAC, and autoscaling. Familiarity with both cloud-managed and on-premises/self-managed Kubernetes is a plus.
Design-level networking experience, including the ability to defend tradeoffs in real network topologies such as hub-and-spoke, multi-AZ/multi-VPC, and equivalent enterprise patterns. Comfortable with VPCs, firewalls, load balancers, private cluster architecture, DNS, and routing. On-premises networking experience is a strong plus.
Proficiency in building and optimizing Dockerfiles and owning full CI/CD pipelines. Experience with CI/CD changes when deploying to Kubernetes is a bonus.
Previous experience in setting up and operating a full observability stack in production, including metrics, logs, traces, and alerting. Familiarity with the Grafana stack is a strong plus.
Comfort with SSO and identity, including integrating tools with a central IdP.
Strong Linux proficiency, infrastructure-as-code, and configuration management skills.
An ownership mindset and familiarity with operating at high-ownership environments.
Hands-on experience with Kafka, Redis, PostgreSQL, or OpenSearch at a production scale is optional but valuable.
Bonus points for experience with OpenStack and KVM virtualization, familiarity with vLLM internals, a background in AI/ML infrastructure or GPU cluster operations, experience with KEDA or event-driven autoscaling patterns, prior open-source contributions, and kernel-level Linux debugging and performance tuning.

WHAT'S ON OFFER

Join a renowned research team to work on impactful projects
Take ownership of the core training code infrastructure used by the team
Engage with real models, data, and scale, rather than small-scale problems
Contribute to bridging the gap between research velocity and engineering quality
Enjoy a flexible work environment with a culture that values depth, clarity, and curiosity

CONTACT

PEGASI – IT Recruitment Consultancy | Email: recruit@pegasi.com.vn | Tel: +84 28 3622 8666
We are PEGASI – IT Recruitment Consultancy in Vietnam. If you are looking for new opportunity for your career path, kindly visit our website www.pegasi.com.vn for your reference. Thank you!

Job Summary

Company Type:

Product

Technical Skills:

Devops, Kubernetes, Network

Location:

Others - Viet Nam

Working Policy:

Onsite

Salary:

Negotiation

Job ID:

J02107

Status:

Active

Related Job:

Technical Support Engineer (JP speaking)

Ho Chi Minh - Viet Nam


Outsource

  • Application Support

Investigates and communicates with customers to comprehend and address their concerns. Keeps customers updated on the status and resolution of their issues. Utilizes troubleshooting tools such as event logs and performance traces to resolve customer problems. Addresses various customer issues, documenting technical work and research, and escalating when necessary. Analyzes and offers solutions for customer needs using log analysis and other tools. Works with resources from different groups when needed to resolve complex customer issues. Participates in training to enhance support skills and expertise. Provides input to enhance products and processes, and identifies potential defects for resolution. Offers feedback on improving automated tools. Collaborates and shares ideas in case triage meetings to resolve customer problems effectively.

Negotiation

View details

AI & DATA Engineer/Databricks

Ho Chi Minh - Viet Nam


Outsource

  • Data Engineering
  • AI

Take charge of devising and executing cloud-native data pipelines for large-scale analytics and AI applications utilizing Databricks Create and uphold high-caliber backend APIs and AI Agents that support internal tools and customer-facing products Oversee data migration projects, prioritizing performance, reliability, and maintainability Utilize Databricks environments directly or through CLI for tasks such as development, orchestration, testing, and deployment of jobs and pipelines Advocate for and put into practice CI/CD best practices, Git workflows, and engineering standards within the data team Work closely with AI engineers, consultants, and external stakeholders to transform requirements into scalable solutions

Negotiation

View details

PreSales Solutions Engineer

Ho Chi Minh - Viet Nam


Product

  • Presale
  • System
  • Google Cloud

PreSales Support: Collaborating with the Sales team to understand client needs and develop tailored solutions using Google Maps and Google Cloud services. This involves conducting technical presentations, product demonstrations, and creating proof of concepts (POCs) for prospective clients, as well as contributing to proposals and RFP responses with detailed technical information. Post-Sales Support: Leading the technical implementation of Google Maps and Google Cloud services, ensuring smooth deployment and integration. Providing ongoing technical support and troubleshooting for clients after implementation, working closely with cross-functional teams to ensure client satisfaction and build long-term relationships. Technical Expertise: Staying up-to-date with the latest Google Maps and Google Cloud technologies, serving as a subject matter expert (SME) for both internal teams and clients. Integrating new features and services into client solutions and providing guidance on best practices. Collaboration: Working closely with Sales, Product, Infrastructure, Data, and Engineering teams to align solutions with client needs and company goals. Mentoring junior team members and contributing to training initiatives.

Negotiation

View details