Senior Machine Learning Engineer

ABOUT CLIENT

Our client is a global technology company that specializes in providing innovative IT solutions for the financial services industry

JOB DESCRIPTION

Creating the V1 Evaluation Platform: You will be responsible for designing and building the core backend systems for our new LLM Evaluation Platform, using Arize Phoenix as the basis for traces, evaluations, and experiments.
Implementing Production Observability: You will need to architect and implement the observability backbone for our AI services by integrating Phoenix with OpenTelemetry to establish a centralized system for logging, tracing, and evaluating LLM behavior in production.
Standardizing LLM Deployment Pipeline: You will be in charge of designing and implementing the CI/CD framework for versioning, testing, and deploying prompt-based logic and LLM configurations, ensuring reproducible and auditable deployments across all AI features.
Providing Practical Solutions: Your role will involve making pragmatic technical decisions that prioritize business value and speed of delivery, in line with our early-stage startup environment.
Collaborating with Other Teams: You will work closely with the Data Science team to understand their workflow and ensure that the platform you build meets their core needs for experiment tracking and validation.
Establishing Core Patterns: You will also help in establishing and documenting the initial technical patterns for MLOps and model evaluation that will serve as the foundation for future development.

JOB REQUIREMENT

At least 5 years of software engineering experience, particularly in backend or platform systems.
Proficiency in Python and a track record of constructing reliable, testable, and maintainable production systems.
Substantial hands-on experience in productionizing and assessing modern ML/LLM systems, including familiarity with evaluation frameworks such as Arize Phoenix, LangSmith, or similar platforms.
Practical knowledge of modern observability frameworks, particularly OpenTelemetry.
Ability to apply practical and effective solutions to problems, emphasizing robustness while avoiding over-engineering.
Hands-on expertise with Arize Phoenix for LLM observability, including the setup of custom evaluators, dataset management, and implementation of trace-based debugging.
Familiarity with core AWS services in a platform context (Kubernetes/EKS, RDS, S3, IAM).
Previous experience in a startup environment and willingness to work in a fast-paced and ambiguous setting.

WHAT'S ON OFFER

We offer a professional and enjoyable working atmosphere.
We prioritize your long-term development.
We are dedicated to creating a future-ready digital bank platform.
Competitive salary
13th-month salary guarantee
Performance bonus
Access to professional English courses
Premium health insurance
Generous annual leave allowance

CONTACT

PEGASI – IT Recruitment Consultancy | Email: recruit@pegasi.com.vn | Tel: +84 28 3622 8666
We are PEGASI – IT Recruitment Consultancy in Vietnam. If you are looking for new opportunity for your career path, kindly visit our website www.pegasi.com.vn for your reference. Thank you!

Job Summary

Company Type:

Outsource

Technical Skills:

Machine Learning

Location:

Ho Chi Minh, Ha Noi - Viet Nam

Working Policy:

Hybrid

Salary:

Negotiation

Job ID:

J01914

Status:

Close

Related Job:

PreSales Solutions Engineer

Ho Chi Minh - Viet Nam


Product

  • Presale
  • System
  • Google Cloud

PreSales Support: Collaborating with the Sales team to understand client needs and develop tailored solutions using Google Maps and Google Cloud services. This involves conducting technical presentations, product demonstrations, and creating proof of concepts (POCs) for prospective clients, as well as contributing to proposals and RFP responses with detailed technical information. Post-Sales Support: Leading the technical implementation of Google Maps and Google Cloud services, ensuring smooth deployment and integration. Providing ongoing technical support and troubleshooting for clients after implementation, working closely with cross-functional teams to ensure client satisfaction and build long-term relationships. Technical Expertise: Staying up-to-date with the latest Google Maps and Google Cloud technologies, serving as a subject matter expert (SME) for both internal teams and clients. Integrating new features and services into client solutions and providing guidance on best practices. Collaboration: Working closely with Sales, Product, Infrastructure, Data, and Engineering teams to align solutions with client needs and company goals. Mentoring junior team members and contributing to training initiatives.

Negotiation

View details

Partner Implementation Engineer (Security & Digital Trust)

Ha Noi - Viet Nam


Outsource

  • System

Đóng vai trò là người thực hiện triển khai chủ chốt, chịu trách nhiệm triển khai, cấu hình và tích hợp các giải pháp Security & Digital Trust (PKI, Chữ ký số, Mã hóa, MFA) vào hệ thống thực tế của khách hàng, đảm bảo hệ thống vận hành ổn định, bảo mật và đúng thiết kế. Triển khai hệ thống (Implementation) Chuẩn bị môi trường: kiểm tra hạ tầng (Server, Hệ điều hành, Cơ sở dữ liệu, Mạng) Cài đặt & cấu hình giải pháp: PKI / CA / Chữ ký số / MFA / Mã hóa Thiết lập chính sách bảo mật, quy trình nghiệp vụ Kết nối với thiết bị bảo mật (HSM, Quản lý Khóa) Triển khai trên nền tảng Cloud / Container (nếu có) Triển khai hệ thống trên Kubernetes / OpenShift Cấu hình tài nguyên (YAML: Pod, Dịch vụ, Ingress, Bản đồ Cấu hình, Bí mật) Thiết lập lưu trữ (Khối Lưu trữ Không gian); mạng nội bộ Áp dụng các chính sách bảo mật cho container Tích hợp hệ thống (Integration) Hỗ trợ tích hợp với: Trang web/ Ứng dụng/ Giao diện lập trình ứng dụng và IAM / SSO / AD / LDAP Hướng dẫn sử dụng API/SDK Kiểm tra luồng dữ liệu & bảo mật giao tiếp Phối hợp với nhóm khách hàng (Phát triển / Cơ sở hạ tầng / Bảo mật) Kiểm thử & nghiệm thu (QA/UAT) Thực hiện kiểm thử kỹ thuật & kịch bản vận hành Hỗ trợ UAT với khách hàng Kiểm tra tính đúng đắn của: Chữ ký số; Chứng thư và Luồng xác thực Vận hành & hỗ trợ Giám sát hệ thống, phân tích log, xử lý sự cố Hỗ trợ sau triển khai (L2/L3) Đảm bảo hệ thống hoạt động ổn định & HA Tài liệu & chuyển giao Xây dựng tài liệu triển khai (cấu trúc, cấu hình) Hướng dẫn vận hành cho khách hàng Đào tạo kỹ thuật cơ bản

Negotiation

View details

DevOps Engineer

Others - Viet Nam


Product

  • Devops
  • Kubernetes
  • Network

Managing and developing our Kubernetes platform across multiple clusters and environments including production, development, on-premises and public cloud. Designing and overseeing hybrid cloud infrastructure across on-premises and public clouds (such as GCP, AWS), including workload placement, cross-cloud networking, and unified resource management. Taking responsibility for the end-to-end CI/CD and GitOps process, including container build pipelines, image optimization, and progressive delivery using tools like ArgoCD/FluxCD. Taking charge of the observability stack to provide a comprehensive view across all clusters using tools like Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus, and supporting agent-assisted SRE workflows. Managing and enhancing our inference platform, including vLLM serving and AIBrix for multi-model orchestration and autoscaling with a fleet of NVIDIA GPUs. Operating platform services such as Kafka, Redis, PostgreSQL, OpenSearch. Managing identity and access management with Keycloak integrated with Google Workspace, strengthening SSO, RBAC, and secrets management across the platform. Strengthening network security across private load balancers, firewalls, and VPC segmentation and designing and maintaining hub-and-spoke/multi-AZ topologies. Supporting training infrastructure with self-service VM provisioning, RunPod burst capacity, and Weights and Biases integration. Driving infrastructure reliability, cost efficiency, and capacity planning as the platform scales.

Negotiation

View details