Senior Data Engineer, Log Analytics

ABOUT CLIENT

Our client is a leading technology company specializing in graphics processing units (GPUs) and artificial intelligence (AI).

JOB DESCRIPTION

Take charge of creating cutting-edge technologies and strategies, focusing on visualizing and analyzing log data effectively.
Precisely analyze Syslog and telemetry data to predict and categorize errors and job failures.
Work with team members to set new technology directions and deploy advanced log analytics solutions.
Collaborate with teams from different areas to identify effective data analysis strategies and methods.

JOB REQUIREMENT

Doctorate in Computer Science, Engineering, or a related field.
Extensive experience in data analysis, specifically log analysis for at least 10 years.
Proficiency in data analytics and log analysis, with practical skills in Syslog/telemetry data analysis.
Successful track record in predictive and classification analytics.
Strong problem-solving and collaboration skills.
Demonstrated ability to adapt to new analytic technology and methodologies.
Previous experience in a management or leadership role is beneficial for higher-level positions.
Proficiency in utilizing Large Language Models (LLMs) for advanced syslog analysis and mining.
Proven success in working with NVIDIA systems or similar GPU-accelerated platforms.
Demonstrated excellence in creating innovative data models or algorithms to enhance log analytics efficiency.
Advanced proficiency in developing and implementing AI solutions tailored for large-scale data processing tasks.
Prior experience leading projects that required interdisciplinary collaboration and cutting-edge technology deployment.

WHAT'S ON OFFER

CONTACT

PEGASI – IT Recruitment Consultancy | Email: recruit@pegasi.com.vn | Tel: +84 28 3622 8666
We are PEGASI – IT Recruitment Consultancy in Vietnam. If you are looking for new opportunity for your career path, kindly visit our website www.pegasi.com.vn for your reference. Thank you!

Job Summary

Company Type:

Product

Technical Skills:

Data Engineering

Location:

Ho Chi Minh - Viet Nam

Working Policy:

Onsite

Salary:

Negotiation

Job ID:

J01970

Status:

Close

Related Job:

Partner Implementation Engineer (Security & Digital Trust)

Ha Noi - Viet Nam


Outsource

  • System

Đóng vai trò là người thực hiện triển khai chủ chốt, chịu trách nhiệm triển khai, cấu hình và tích hợp các giải pháp Security & Digital Trust (PKI, Chữ ký số, Mã hóa, MFA) vào hệ thống thực tế của khách hàng, đảm bảo hệ thống vận hành ổn định, bảo mật và đúng thiết kế. Triển khai hệ thống (Implementation) Chuẩn bị môi trường: kiểm tra hạ tầng (Server, Hệ điều hành, Cơ sở dữ liệu, Mạng) Cài đặt & cấu hình giải pháp: PKI / CA / Chữ ký số / MFA / Mã hóa Thiết lập chính sách bảo mật, quy trình nghiệp vụ Kết nối với thiết bị bảo mật (HSM, Quản lý Khóa) Triển khai trên nền tảng Cloud / Container (nếu có) Triển khai hệ thống trên Kubernetes / OpenShift Cấu hình tài nguyên (YAML: Pod, Dịch vụ, Ingress, Bản đồ Cấu hình, Bí mật) Thiết lập lưu trữ (Khối Lưu trữ Không gian); mạng nội bộ Áp dụng các chính sách bảo mật cho container Tích hợp hệ thống (Integration) Hỗ trợ tích hợp với: Trang web/ Ứng dụng/ Giao diện lập trình ứng dụng và IAM / SSO / AD / LDAP Hướng dẫn sử dụng API/SDK Kiểm tra luồng dữ liệu & bảo mật giao tiếp Phối hợp với nhóm khách hàng (Phát triển / Cơ sở hạ tầng / Bảo mật) Kiểm thử & nghiệm thu (QA/UAT) Thực hiện kiểm thử kỹ thuật & kịch bản vận hành Hỗ trợ UAT với khách hàng Kiểm tra tính đúng đắn của: Chữ ký số; Chứng thư và Luồng xác thực Vận hành & hỗ trợ Giám sát hệ thống, phân tích log, xử lý sự cố Hỗ trợ sau triển khai (L2/L3) Đảm bảo hệ thống hoạt động ổn định & HA Tài liệu & chuyển giao Xây dựng tài liệu triển khai (cấu trúc, cấu hình) Hướng dẫn vận hành cho khách hàng Đào tạo kỹ thuật cơ bản

Negotiation

View details

DevOps Engineer

Others - Viet Nam


Product

  • Devops
  • Kubernetes
  • Network

Managing and developing our Kubernetes platform across multiple clusters and environments including production, development, on-premises and public cloud. Designing and overseeing hybrid cloud infrastructure across on-premises and public clouds (such as GCP, AWS), including workload placement, cross-cloud networking, and unified resource management. Taking responsibility for the end-to-end CI/CD and GitOps process, including container build pipelines, image optimization, and progressive delivery using tools like ArgoCD/FluxCD. Taking charge of the observability stack to provide a comprehensive view across all clusters using tools like Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus, and supporting agent-assisted SRE workflows. Managing and enhancing our inference platform, including vLLM serving and AIBrix for multi-model orchestration and autoscaling with a fleet of NVIDIA GPUs. Operating platform services such as Kafka, Redis, PostgreSQL, OpenSearch. Managing identity and access management with Keycloak integrated with Google Workspace, strengthening SSO, RBAC, and secrets management across the platform. Strengthening network security across private load balancers, firewalls, and VPC segmentation and designing and maintaining hub-and-spoke/multi-AZ topologies. Supporting training infrastructure with self-service VM provisioning, RunPod burst capacity, and Weights and Biases integration. Driving infrastructure reliability, cost efficiency, and capacity planning as the platform scales.

Negotiation

View details

Platform Engineer

Ho Chi Minh - Viet Nam


Product

  • Backend
  • Devops
  • Data Engineering

Create and maintain distributed infrastructure responsible for handling telemetry, sensory, and control data in cloud and edge environments Develop and operate data ingestion and streaming pipelines connecting robot fleets to the cloud in real-time, including video, joint states, audio, and LiDAR Build and manage backend services and APIs for the developer-facing platform, prioritizing reliability and excellent developer experience Oversee and improve cloud-native infrastructure utilizing Kubernetes, Docker, and infrastructure as code tooling Ensure platform reliability through monitoring, alerting, autoscaling, failover, and incident response Provide support to ML and robotics teams with data infrastructure for training pipelines, policy rollout, and hardware-in-the-loop simulation Implement secure APIs with access control, rate limiting, and usage metering to accommodate scaling efforts

Negotiation

View details