Minor Food

Site Reliability Engineer (SRE)

Digital Product - Minor Food
Entry Level: Full-time
Bangkok, Thailand

Company Description

The Minor Food is one of Asia's largest casual dining and quick-service restaurant companies. We are aggressively accelerating the business growth in both domestic and global market. 

Our number of restaurants exceeding 1,600 in 21 countries under The Pizza Company, Swensen's, Sizzler, Dairy Queen, Burger King, The Coffee Club, Bonchon, Ribs and Rumps, Riverside, Penang Street, and Poulet.

Job Description

You will be working with a passionate and talented team in Minor Food Group.  As a Site Reliability Engineer,  you will have the opportunity to work with exciting brands and channels and optimize the end to end operations of the system.

 

As a key role in the team, your key responsibilities are:

 

●        Be on a on-call duty rotation to respond to Minor’s Digital Channels availability incidents and provide support for service engineers with customer incidents.

●        Use your on-call shift to prevent incidents from ever happening.

●        Make monitoring and alerting alert on symptoms and not on outages.

●        Document every action of your findings turn into repeatable actions–and then into automation.

●        Improve the deployment process to make it as boring as possible.

●        Design, build and maintain core infrastructure pieces that allow scaling to support hundred of thousands of concurrent users.

●        Debug production issues across services and levels of the stack.

●        Run the production environment by monitoring availability and taking a holistic view of system health

●        Build software and systems to manage platform infrastructure and applications

●        Improve reliability, quality, and time-to-market of our suite of software solutions

●        Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve

●        Provide primary operational support and engineering for the squad

●        Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding

●        Partner with development teams to improve services through rigorous testing and release procedures

●        Participate in system design consulting, platform management, and capacity planning

●        Create sustainable systems and services through automation and uplifts

●        Balance feature development speed and reliability with well-defined service level objectives

Qualification

●     Bachelor’s degree in Computer Science or other highly technical, scientific discipline

●     Working experience with AWS and GCP

●     In depth knowledge of networking and distributed computing

●     Experienced with application monitoring and telemetry

●     Experience conducting post mortems and process improvements

●     Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript

●     Experience with distributed storage technologies like NFS, S3 as well as dynamic resource management frameworks (Kubernetes, ECS)

●     A proactive approach to spotting problems, areas for improvement, and performance bottlenecks

●     Comfortable working with CI/CD pipelines using Jenkins / Bitbucket / SonarQueb or any related tooling

Jobs Search