The Operations Engineer religiously checks for alerts and validates them against standard limits or threshold and provides clear communication and rapid response for every business-impacting, service-related event. He or she applies the necessary changes to optimize the alerts to be relevant, descriptive, and actionable to the appropriate team and apply within relevant timeframe to minimize negative impact to consumers.
- Troubleshoot complex problems, provide software fault diagnosis, resolve operational issues, and performance bottlenecks.
- Collaborate with Global SRE, Product Delivery, Product Engineering, and Customer Care teams in delivering a true Cloud SaaS experience to our customers 24x7.
- Ensure consistent service availability by monitoring our environments’ stability and performance using the right metrics and tooling.
- Incident and Problem Management – Execute incident response plays, lead major incident bridges, and participate in post-incident review process for incident prevention.
- Develop and manage automation to reduce manual processes and tasks to realize operational efficiencies/
- Drive capacity planning by monitoring system resource utilization, errors, and alerts trends.
- Document system architectures, systems configurations, and technical operational processes and policies.
- Work within one of our 24x7 schedules (Sunday – Thursday or Tuesday – Saturday) and shifts (morning, mid, or night).
- Participating in maintenance activities and on-call rotations as required.
- Execute disaster recovery plans and reporting on metrics related to those activities.
- Bachelor's degree in Computer Science field or equivalent.
- 1-2 Experience with any of the following operations systems: AppDynamics, Splunk, PRTG, SolarWinds DPA, Nagios, New Relic, PagerDuty, Ops Genie, Jira Service Management
- 1-2 years of experience applying an automation first approach to problem solving leveraging configuration management tools and scripting (e.g., Bash, Python, PowerShell)
- Experience with Incident Management and ITIL service operations
- Passionate and curious about ways to leverage technology with self-directed learning
- Must be detail oriented, results driven, and have excellent English communication skills.
- Ability to work effectively within a team environment in and outside the organization to accomplish goals, objectives and to identify and resolve problems.
Bonus Skills and Experience:
- Experience with managing and operating enterprise-grade Windows, Linux, Azure, AWS, VMware production environment
- Experience with any of the following operations systems: AppDynamics, Splunk, PRTG, SolarWinds DPA, Nagios, New Relic, PagerDuty, Ops Genie, Jira Service Management
- Understanding of high availability and disaster recovery strategies.
- Understanding of software development lifecycle (SDLC) and agile development.
- Experience with configuration management and orchestration (e.g., Terraform, Cloud Formation, Ansible).
- Experience with continuous integration tools (e.g., GitHub, Azure DevOps)
- Experience with the AWS CLI.
- Experience with Web Services, cXML, EDI, or other integration methods preferred
- Infrastructure-as-code, self-healing, security automation patterns experience is an advantage.
- Change Management
- Business Analysis
- Incident Management
- Information Technology
- Computer Science
- Computer Engineering
- Job Level
- Associate / Supervisor
- Job Category
- IT and Software
- Educational Requirement
- Bachelor's degree graduate
- Recruiter response to application
- Office Address
- Ground Floor ELJ Building, ABS-CBN Compound, Mother Ignacia Avenue, 1100 Metro Manila, Quezon City, Metro Manila, Philippines
- Entertainment / Film / News and Current Affairs / Public Service / Publishing / Digital, etc.
- 1 opening