An opportunity to join a top tier investment bank. Our client is the first in the industry to create an Site Reliability Operations team after google's well known SRE model. Your responsibilty will be to ensure that the clients experience of the technology platform is world class.
Management of our production services by measuring and monitoring availability, latency and overall system health.
Automation tasks and pushing for changes that improve reliability and velocity.
Practice sustainable incident response and blameless postmortem
Identify and build improvements to system behavior, control and monitoring tools.
Lead projects that continue to improve the availability, stability and performance of the system (eg.Disaster recovery, stress/capacity and exchange testing)
Help plan and execute new releases of software with development teams.
Bachelor's degree in Computer Science, a related technical field that involves programming, or equivalent practical experience.
3 years technology experience in a commercial environment
Solid analytical and problem-solving skills with appreciation of technical risk
Experience supporting applications in a Linux or Unix environment, with sound knowledge incident and release management process.
Project management of tasks, issue resolution, and escalation
Experience developing test cases and ensuring appropriate test coverage through unit and automated testing
Problem solver, proactive with a sense of ownership and drive.
The Asia Pac team is spread across Australia, Hong Kong, Japan, China and India. The role requires working as part of the larger global team and collaboratively working with counterparts across the region on both local and global initiatives.