search our jobs
Platform Lead
Others - Singapore
Product
- Backend
- Devops
- Data Engineering
Develop and expand distributed systems to handle large volumes of sensory, telemetry, and control data across cloud and edge environments, facilitating real-time connections for fleets of robots. Create the API Platform with a focus on high reliability, exceptional developer experience, and robust multimodal AI capabilities accessible through user-friendly APIs and SDKs. Establish extensive training and inference platforms for foundation models used in robot autonomy, teleoperation, and developer integrations. Devise data ingestion and streaming pipelines for real-time connectivity of robot fleets to the cloud, covering various data inputs such as video, LiDAR, joint states, and audio. Oversee and advance a modern cloud native infrastructure stack employing Kubernetes, Docker, and infrastructure as code tools. Ensure platform reliability through telemetry, monitoring, alerting, autoscaling, failover, and disaster recovery measures. Make infrastructure decisions pertaining to distributed storage, consensus protocols, GPU orchestration, network reliability, and API security. Foster collaboration across ML, robotics, and product teams to facilitate hardware in the loop simulation, policy rollout, continuous learning, and CI/CD workflows. Implement secure APIs featuring fine-grained access control, usage metering, rate limiting, and billing integration to accommodate a growing user base.
Negotiation
View detailsSenior System Software Engineer - AI Data Platform - Inference Factory
Ho Chi Minh - Viet Nam
Product
- Devops
- C/C++
- Python
- ...
Create infrastructure and tools to automate complex software processes effectively. Improve performance: Deploy advanced test harnesses, benchmarking frameworks, and analytical tools to thoroughly evaluate and enhance the performance and efficiency of software and hardware platforms. Utilize expertise in operating systems, kernel internals, device drivers, memory management, storage, networking, and high-speed interconnects to construct and troubleshoot high-performance systems. Collaborate with engineering teams to comprehend requirements and deliver efficient solutions. Establish performance objectives, assess feedback, analyze data, and continually enhance system reliability. Shape technical strategies: Contribute to developing technical strategies and roadmaps for platform automation initiatives to ensure they are in line with company goals and industry best practices.
Negotiation
View detailsSenior DevOps Engineer
Ho Chi Minh - Viet Nam
Product
- Devops
- Cloud
- Kubernetes
- AWS
Oversee the Management of VM/Cloud Infrastructure: Ensure the stability and optimal performance of web servers and cloud services. Manage infrastructure on-prem and in the cloud according to DevSecOps best practices, including network design and segmentation. Create and Maintain Scripts and Tools: Develop and maintain scripts (bash, python) to automate tasks and enhance system efficiency. Contribute to our Monitoring System: Establish and oversee monitoring systems using Prometheus and Grafana to monitor system performance and send alerts. Set up CI/CD Pipelines: Implement and manage automated deployment pipelines using GitLab CI, ArgoCD, and FluxCD. Infrastructure Design and Optimization: Design and optimize infrastructure to ensure stability, minimize downtime, and improve overall performance. Management of Web Servers and Platforms: Handle web server configurations and security, including Nginx, Kubernetes ingress, load balancers, DNS, WAF, and firewall rules, to ensure high availability and secure operations. Collaborate with Development Teams: Work closely with development and production teams to streamline deployment processes and address system and security-related issues.
Negotiation
View detailsStorage System Engineer (Linux)
Ho Chi Minh - Viet Nam
Outsource
- System
Monitoring storage performance, capacity, and availability for optimal performance and reliability. Troubleshooting storage-related issues and providing timely resolutions to users. Developing and maintaining scripts and automation tools for storage administration tasks. Performing regular data backup and recovery procedures to ensure data availability.
Negotiation
View detailsOperation Engineer (Python, English)
Ho Chi Minh - Viet Nam
Outsource
- Python
- Devops
Address and resolve operational issues via the JIRA Service Desk from internal teams and researchers. For more complex matters, you’ll coordinate with the DevOps team while still ensuring ownership and resolution. Enhance operational efficiency by analyzing recurring issues, suggesting improved workflows, and fostering clear communication between teams. Create automation tools to streamline operations, including the exploration of bots for repetitive tasks and the development of scripts to auto-categorize or manage tickets in JIRA. Collaborate with DevOps, Engineering, and Research teams to uphold stable and scalable system operations.
Negotiation
View detailsDirector Engineering – Software Engineering and AI Inferencing Platforms
Ho Chi Minh, Ha Noi - Viet Nam
Product
- Management
- Backend
- ...
Lead and expand engineering teams in Vietnam across system software, data science, and AI platforms. Drive the creation, structure, and delivery of high-performance system software platforms that support AI products and services. Collaborate with global teams across Machine Learning, Inference Services, and Hardware/Software integration to guarantee performance, reliability, and scalability. Oversee the development and optimization of AI delivery platforms in Vietnam, including NIMs, Blueprints, and other flagship services. Collaborate with open-source and enterprise data and workflow ecosystems to advance accelerated AI factory, data science, and data engineering workloads. Promote continuous integration, continuous delivery, and engineering best practices across multi-site R&D Centers. Work with product management and other stakeholders to ensure enterprise readiness and customer impact. Establish and implement standard processes for large-scale, distributed system testing including stress, scale, failover, and resiliency testing. Ensure security and compliance testing aligns with industry standards for cloud and data center products. Mentor and develop talent within the organization, fostering a culture of quality and continuous improvement.
Negotiation
View detailsSenior DevOps Engineer
Ho Chi Minh, Ha Noi - Viet Nam
Product
- Devops
- AI
- Cloud
Contribute to the development and maintenance of advanced machine learning software and frameworks with a focus on performance and scalability. Improve CI/CD pipelines to make the development, testing, and deployment of large-scale machine learning models more efficient. Set up and manage cloud infrastructure for continuous integration, delivery, and deployment, ensuring high availability and scalability. Work closely with teams from various departments to enhance development workflows and software delivery speed and quality. Address and resolve complex issues related to software development, containerization, and cloud infrastructure in production environments. Create and update detailed documentation for development and deployment processes. Effectively communicate with both technical and non-technical stakeholders to align expectations and provide transparency throughout the release and deployment process. Oversee code reviews, testing, and debugging to maintain high-quality code and streamline workflows. Provide mentorship and guidance to junior engineers to support their professional growth and improve team capabilities.
Negotiation
View detailsChief Technology Officer
Ho Chi Minh - Viet Nam
Product
- Backend
- Frontend
- Devops
Technology Leadership & System Scalability Oversee and optimize the entire tech stack & architecture to ensure high availability, security, and resilience. Lead infrastructure scaling to support a high volume of daily transactions across multiple services. Implement advanced AI and automation solutions to enhance platform performance. Improve system observability, monitoring, and disaster recovery strategies. Drive cost-efficient CapEx planning, optimizing for performance vs. budget balance. Team & Culture Building Manage and mentor a cross-functional tech team of more than 100 members (Backend, Mobile, SRE, QA, Data science…). Cultivate a high-speed, ownership-driven culture that promotes integrity, collaboration, and accountability. Develop technical leadership pipelines, ensuring top talent development and retention. Cultivate an engineering culture that embraces innovation, continuous learning, and rapid iteration. Future-Readiness & Technology Planning Define a 3-year technology roadmap aligned with business goals and market evolution. Evaluate emerging tech trends (AI, cloud, blockchain, automation) and implement relevant solutions. Collaborate with business and operations teams to align tech investments with growth strategies. Execution & Engineering Excellence Establish world-class engineering practices (CI/CD, DevOps, microservices, cloud architecture). Optimize for high performance, low latency, and fault tolerance in real-time operations. Lead technical transformations, re-architecture projects, and innovation initiatives.
Negotiation
View detailsSenior DevOps (Data Platform)
Ho Chi Minh - Viet Nam
Product
- Devops
- Spark
Managing workloads on EC2 clusters using DataBricks/EMR for efficient data processing Collaborating with stakeholders to implement a Data Mesh architecture for multiple closely related enterprise entities Utilizing Infrastructure as Code (IaC) tools for defining and managing data platform user access Implementing role-based access control (RBAC) mechanisms to enforce least privilege principles Collaborating with cross-functional teams to design, implement, and optimize data pipelines and workflows Utilizing distributed engines such as Spark for efficient data processing and analysis Establishing operational best practices for data warehousing tools Managing storage technologies to meet business requirements Troubleshooting and resolving platform-related issues Staying updated on emerging technologies and industry trends Documenting processes, configurations, and changes for comprehensive system documentation.
Negotiation
View detailsPython Developer (Operations Team)
Ho Chi Minh - Viet Nam
Outsource
- Python
- Devops
The role requires managing and resolving issues, with more complex problems being escalated to the Devops team. The support engineer will maintain ownership even after escalation. The position also involves devising solutions to enhance issue resolution efficiency. Examples include automating common tasks using bots and writing scripts to categorize and manage JIRA service desk tickets programmatically.
Negotiation
View detailsDistributed Systems Engineer
Others - Viet Nam
Product
- Data Engineering
- Devops
- Golang
- ...
Design and create distributed systems capable of handling large amounts of sensory, telemetry, and control data across cloud and edge environments. Plan and implement data ingestion and streaming pipelines to connect groups of robots to the cloud in real-time (video, LiDAR, joint states, audio). Construct platforms for extensive training and inference to support robot autonomy and teleoperation using foundation models. Work closely with ML and Robotics engineers to assist in hardware-in-the-loop simulation, policy rollout, and continuous learning initiatives. Create internal observability systems to monitor fleet performance, reliability, and tuning. Take the lead on infrastructure decisions such as distributed storage, consensus protocols, GPU orchestration, and network reliability.
Negotiation
View detailsMLOps Engineer
Ho Chi Minh - Viet Nam
Product
- Machine Learning
- Devops
Develop and maintain training and inference pipelines using PyTorch, which includes DDP support, mixed precision, checkpointing, experiment versioning, and reproducible evaluation workflows. Take ownership of and advance inference serving infrastructure using vLLM and SGLang, with a focus on debugging issues in inference stacks like tool call parsers and reasoning parsers, and optimizing for throughput and latency. Create and sustain robust tooling in Python and C++ to aid the complete training lifecycle, from data ingestion to model release. Optimize compute workloads for bare-metal environments, encompassing CPU/GPU utilization, memory bandwidth, and I/O throughput. Address low-level networking issues, distributed training errors, and hardware bottlenecks across NCCL, MPI, and high-speed interconnects like InfiniBand and RoCE. Set up and manage ML environments, covering containers, package management, GPU drivers, and runtime configurations. Establish CI/CD patterns for AI workloads, encompassing training, evaluation, quantization, and model release workflows. Integrate monitoring, alerting, anomaly detection, and incident response for both training jobs and inference services. Contribute to shared platform capabilities across reliability, observability, and cost management. Develop and maintain scalable runtime infrastructure for model-backed services and APIs, including support for LLM-backed APIs, MCP servers, and agentic systems.