Senior Site Reliability Engineer


Ensure Zero downtime incident
Solution to increase system up time
System request tracing and visualization
Co-operate with functional, other related teams to propose/solve incident
Help with problem resolution and incident firefighting
Cross system monitoring automation and anomaly detection
High availability of our solutions plus having backup process where needed. i.e. Scoring OTP, automated backup process
Chaos engineering
Help to solve cross teams/departments technical issues
System Stability - Automate fall-back approach (Feature toggle)
System Monitor Alert ( API/DB…)
System Capacity plannings ( From server to DB/Infrastructure)
Site stability statistic and action: Traffic count, Response time, Capacity, Error rate - and specific error, Fallback strategy (Hystric/Fall-fast proper way)
What You'll Do:
Work in with functional teams so that you can provide system observation, stability and visualization
You are responsible for keeping our infrastructure humming as new releases and maintenance updates are rolled out
You will help organize, secure, and automate existing infrastructure and deployments
You will work closely with developers to provide feedback and drive operational improvements within our products and operations infrastructure
You will be responsible for ensuring that our platform is stable and balanced
Maintain high site up time, while embracing rapid change and growth
Scale infrastructure to meet increasing demand and evolving technology
Help the dev teams working on our code-bases realize zero downtime deployments
Develop and improve operational practices and procedures
You will coordinate and participate in on-call rotations


Good Knowledge about technologies or related technologies of categories below: 
At least 05 years experiencing with Linux server based technology
Excellent English communication and documentation skills
Any certification in related fields (i.e.: k8s admin, Azure) or equivalent experience or certificate is a plus
Strong bash shell scripts. 
Experiences with CI/CD tools: Jenkins, Gitlab 
K8s Administrator at production grade 
Infrastructure as Code: Azure Resource Manager, Terraform 
System configuration tools: Ansible, Chef 
Container & Container Orchestration: Kubernetes, Docker 
Monitoring & Logging: Prometheus, Grafana, Elastic Search, Splunk
Middleware & Cache: Kafka, RapidMQ, RedisCache
Ability to work independently and under pressure
Independent problem-solving, self-direction
Ability to concentrate and pay close attention to detail
Friendly and teamwork
Accountability on job, giving feedback.
Ethics and integrity.
Welcome challenge and willing to learn, apply new technologies
You are lazy and would love to automate anything
Professional English communication in both verbal and writing
Up to date with new technology trend (DevOps, cloud computing, big data)
Intermediate Reporting skills
Good documentation skills
Ability to do join project as a worker 
Experiences with python, Java is a plus
Participated with full SDLC
Experiences with Microservices design pattern
Strong service mentality, “can do” attitudes, strong drive to succeed
Ability to work in a dynamic environment and provide recommendation to improve operation. 


Monthly allowance (600k)
13th-month Salary and performance- based KPI Bonus (1-3 months)
15+ Annual Leaves
Flexible time, Work-from-home policies 
Full Social Insurance, 24/7 Accidental Insurance, Annual Medical Check- up,Premium PTI 24/7 
Team Building and CSR activities: Year- end Party, New-year Party, Company trip, Charity activities, Blood donation
Learning workshops: Free Udemy E-learning, English courses, Senior management development training programs


PEGASI – IT Recruitment Consultancy | Email: | Tel: +84 28 7109 9077
We are PEGASI – IT Recruitment Consultancy in Vietnam. If you are looking for new opportunity for your career path, kindly visit our website for your reference. Thank you!

Job Summary

Company Type:


Technical Skills:



Ho Chi Minh - Viet Nam



Job ID: