MCS Group are proud to continue their relationship with a market leader in the FinTech sector to assist in their search for a Site Reliability Engineer to join the team. Our client operates with a 100% remote working policy and to accommodate this they offer a generous amount of money to set up your home office and a monthly allowance to cover all costs accrued from working at home.
Our clients HQ is based in the US but have a big presence globally with a number of specialist teams throughout the world, they have placed huge investment into their team based in Northern Ireland, seeing it as an integral part of the organisation. The NI Team has gone through a major period of expansion in recent times with further growth planned for the coming years, offering candidates security in a new role as well as excitement with the huge potential of the organisation.
This is a great opportunity to play a crucial role in the design and development of our clients tooling, monitoring, control, self-service reporting, and analysis approach. Duties and responsibilities include as shown below:
- Architecting and developing solutions and road maps for monitoring of various systems that constitute our clients operating environment and leveraging such telemetry in an IT setting for alert response and troubleshooting.
- Liaise with other teams in the organisation to develop innovative solutions to attain high availability scalability and reliability.
- Complete technical hands-on scripting, tooling, automation for continuous operations.
- Detect incidents based on monitoring tools, notifications, and log files.
- Develop new and modify existing monitors as needed.
- Triage incidents and perform documented steps to resolve when a known error is identified.
- Logging incidents within the Incident Tracking system, clearly documenting symptoms needed for others to investigate the incident.
- Work with all groups as needed to narrow investigative efforts and resolve incidents.
- Monitor running jobs for operational impact.
- Identify scheduled job failures.
- Maintain critical documentation assets, such as customer contact lists, escalation procedures, scheduled job inventories, and operational "run-books."
The Ideal Candidate:
- Demonstrated experience with monitoring development/deployment - ideally for mission-critical, 24x7 environments.
- Appreciation of monitoring fundamentals associated with SNMP, WMI, Synthetic Transaction Engines and experience with various commercial, open source and homegrown monitoring packages and methods (e.g., Splunk, Nagios, Zabbix, OneSite, Gomez, CA, HP Openview, etc.).
- Good understanding of languages such as Powershell, Perl, or Python.
- Solid understanding of networking, including network devices, subnets, and routing protocols; ability to take and interpret packet captures (Ethereal, etc.).
- Solid understanding of systems, including server hardware, Windows and Linux operating systems, iSCSI/FC SAN/NAS/DAS storage, Hypervisor/Virtualization (VMware, Hyper-V).
- Proficiency in AD/DNS/DHCP.
- The ability to independently implement and build tools and test significant features and capabilities, as well as work jointly with other team members on complex site issues.
Full details, including salary and the full excellent benefits package will be provided upon application.
To speak in absolute confidence about this opportunity please send an up to date CV via the link provided or contact Timothy McNeill, Specialist Consultant at MCS Group
Even if this position is not right for you, we may have others that are. Please visit MCS Group to view a wide selection of our current jobs
Not all agencies are the same…MCS Group are passionate about providing a first-class service to all our customers and have an independent review rating of 4.9 stars on Google.
Experience: 0 yrs