SRE Lead/Manager (DevOps, AWS)

JOB DESCRIPTION

As a Support Site Reliability Engineer (SRE) leader, you will lead our efforts in establishing a support SRE team that works closely with The Company's Product SRE to increase productivity. The ideal candidate will utilize leadership and technical skills to streamline operational tasks affecting Product SRE team efficiency through collaboration with SRE teams located in Japan and Vietnam

Design and execute the Support SRE team's strategic roadmap.

Collaborate with The Company's Product SRE teams to identify opportunities for improving operational efficiency and reducing toil.

Mentor and coach team members to foster their growth and development in technical and collaboration areas.

Drive a culture of continuous improvement and knowledge sharing within the team.

Design and implement automation solutions to standardize operational tasks, reducing manual effort and improving efficiency.

Develop and maintain tools, scripts and processes to automate routine operational tasks.

Build, maintain, and improve our infrastructure, including monitoring, diagnosing, and resolving incidents promptly.

Participate in incident response, on-call rotations, and post-mortem analysis.

JOB REQUIREMENT

At least 5 years experience as a DevOps Engineer (Experience on on-premises environments being a plus) or similar.

3+ years of hands-on experience with AWS or other cloud platforms. Experience with managed AWS services is a plus.

Solid understanding of CI/CD pipelines and best practices.

Working understanding of containerization technologies (Docker and Kubernetes).

Experience with monitoring and logging solutions.

Proficiency with IaC (e.g., Terraform).

Deep understanding and hands-on experience with MySQL or similar relational databases.

Proven track record in training and educating team members, promoting a culture of continuous learning.

Strong ownership and responsibility, with a proactive and solutions-oriented mindset.

Experience in developing and operating web applications built in Go or Ruby is a plus.

Project management experience.

English language proficiency at a professional working level.

People management or team leadership experience is a plus.

WHAT'S ON OFFER

Caring Mental & Physical Recreation:

Hybrid working: 2 days at the office and 3 days WFH

Working hour: Flexible start 8AM-9AM from Mon-Fri

Full salary in probation

Insurance: Applied from Probation period:

Social Insurance, Health Insurance, Unemployment Insurance (on 100% salary)

Private health insurance & accident insurance. From Managing level: extra for family members

Bonus: 13th month salary

17 - 24 paid days off and more

Paternity leave: Extra 5 days

Annual company trip; Quarterly team building

Billiards & Running club

Annual health check

Well-equipped facility: Macbook pro, additional monitor, ..

Caring Career & Development:

Clear Career path

Foreign language & International technology-related certifications sponsoring

External & internal training courses

Soft-skill workshops

Tech seminars

Monthly and biannual Recognition Awards

Performance & salary review: twice/year (Jun & Dec)

CONTACT

PEGASI – IT Recruitment Consultancy | Email: recruit@pegasi.com.vn | Tel: +84 28 3622 8666

We are PEGASI – IT Recruitment Consultancy in Vietnam. If you are looking for new opportunity for your career path, kindly visit our website www.pegasi.com.vn for your reference. Thank you!

Job Summary

Company Type:

Product

Technical Skills:

Devops, AWS

Location:

Ha Noi - Viet Nam

Working Policy:

Job ID:

J01508

Status:

Related Job:

Manager, Colleague Services

Ho Chi Minh - Viet Nam

Product

Strategically plan and oversee the Colleague Services team to ensure delivery of exceptional P&C services aligning with organizational goals. Lead the development and improvement of P&C procedures and Knowledge-Based Articles to comply with legal and regulatory standards. Supervise accurate and efficient processing of People transactions, including employee lifecycle events, payroll, and benefits administration. Foster collaboration with other P&C teams for seamless integration and service delivery. Utilize data-driven insights to assess and enhance P&C processes, integrating innovative technologies for efficiency. Offer strategic guidance and support to employees and leaders on P&C policies, procedures, and benefit programs. Create and deliver comprehensive P&C metrics and reporting for monitoring service effectiveness. Manage the P&C operational budget and optimize resource allocation. Oversee the Health, Safety, and Well-being framework to promote a safe and healthy workplace environment. Manage People Systems strategically to support organizational objectives. Lead process automation initiatives to streamline P&C operations and improve service delivery. Manage immigration/mobility services for expatriate colleagues, ensuring compliance with relevant laws and regulations.

Negotiation

View details

Senior AI DevSecOps Engineer

Ho Chi Minh - Viet Nam

Product

Devops
AWS
Azure
Security

Management of CI/CD Pipeline: Ensure automation, security, and scalability across all stages of the development lifecycle. Infrastructure & Security: Design and implement secure multi-cloud infrastructure solutions leveraging cloud services, containerization, and orchestration tools. Policy as Code: Define and enforce security and compliance policies across Kubernetes clusters using OPA or Kyverno, ensuring guardrails are automated and auditable. AI & Platform Automation: Drive the adoption of AI-powered tools and workflows to automate infrastructure operations, optimize CI/CD pipelines, accelerate root cause analysis, improve security posture, and enhance engineering productivity. Observability & Alerting: Build and maintain a comprehensive observability stack with proactive alerting, dashboards, and runbooks for critical business flows and security events. Secret & Credential Management: Design and enforce secrets management practices across all environments ensuring zero hardcoded credentials in codebases and pipelines. Incident Response & On-Call: Own and continuously improve incident response processes, define runbooks, lead post-mortems, track MTTR, and participate in on-call rotation to maintain platform reliability and SLO adherence. Threat Modelling & Penetration Testing: Conduct regular threat modelling sessions with engineering teams and coordinate or perform penetration testing activities to proactively identify attack surfaces before they reach production. Code Security: Conduct regular code reviews and static/dynamic analysis to identify and remediate security vulnerabilities. Compliance and Best Practices: Ensure compliance with industry standards and best practices. Collaboration: Collaborate with development, operations, and security teams to foster a culture of automation and security-first thinking. Mentorship: Mentor junior engineers and other team members on security best practices. Documentation: Maintain thorough and up-to-date documentation of security policies, procedures, and incident reports. Trend Scouting: Stay updated with the latest trends in technology and AI to integrate innovative solutions into our processes.

Negotiation

View details

AI Transformation Lead

Ho Chi Minh - Viet Nam

Outsource

Develop a 3-year AI transformation roadmap aligned with business goals in IT outsourcing, product development, and ODC services. Prioritize highest-impact AI use cases in the organization using a build-vs-buy-vs-partner framework. Establish AI governance for model selection, cost management, data privacy, IP protection, and ethical AI guidelines. Implement AI-assisted development workflows for 1,000+ engineers, including AI code generation, review, automated testing, and AI-powered debugging. Drive adoption of AI orchestration platforms to automate repetitive engineering tasks. Create internal AI skills/training programs and a culture of continuous AI experimentation. Measure and report on productivity gains, quality improvements, and time-to-market acceleration from AI adoption. Collaborate with product teams to define AI features for various products. Lead the development of AI Copilots, intelligent assistants, and autonomous agents embedded within products. Guide the architecture of an Ontology-Based AI ERP/MES Platform, including Knowledge Graphs, GraphRAG, and Multi-Agent Systems. Identify new AI-powered product opportunities in logistics, manufacturing, and supply chain to create new revenue streams. Advocate for AI internally, communicate the vision, celebrate wins, and address concerns across all levels. Partner with HR to define new AI-focused roles and refine hiring criteria. Collaborate with ODC/Client Delivery teams to package and sell AI capabilities to existing and new clients. Represent the company externally in conferences, thought leadership, and talent branding to position the company as an AI leader in the IT services industry.

Negotiation

View details