Systems Reliability Engineer (SRE)

Location: New York, NY
Position Overview
As a Systems Reliability Engineer (SRE) working on critical services, your mission will be to ensure that the company is fast, highly available, scalable, and able to withstand unprecedented increases in load. In this role you will be at the heart of solving production problems. Your scope is from the kernel to the application. The position requires the flexibility to take a holistic approach to troubleshooting and the ability to delve deeply into technical details. The SRE will build automation tools for system health, production acceptance tests to validate production changes and will ensure the system is well instrumented and highly fault tolerant.

  • Manage availability, latency, scalability and efficiency of the application development by instilling engineering reliability into their development life cycle with a focus on fault tolerant approaches.
  • Respond to and resolve unexpected and potential service problems and write software to prevent problem recurrence.
  • Drive capacity planning, performance analysis, instrumentation and other non-functional systems requirements.
  • Review and influence ongoing design, architecture, standards and methods for improving operating services.
  • Manage system releases, write production software acceptance tests and coordinate all aspects of the release including coverage and communication plans.
  • 3 + years experience as a Software Engineer or Developer of customer-facing, high-availability, large scale distributed applications.
  • Experience in C or C++, Java technologies.
  • Experience with PHP, Python, Ruby or other scripting languages.
  • Extensive experience with Linux/Unix.
  • Bachelor's degree in Computer Science or equivalent experience.
  • Prior successful experience as a systems performance or site/systems reliability engineer.
  • Strong leadership skills.
  • Extensive experience working with fault-tolerant approaches in a large-scale distributed environment and high performance systems.
  • Demonstrated experience working in large, complex systems environments.
  • Deep understanding of internet and networking protocols.
  • A passion for performance excellence, robustness and an engineering mindset.
  • Experience in a high-volume or critical production service environment.
  • Knowledge of IP networking, network analysis and performance, and application issues using standard tools such as tcpdump.
About MGRS Group
MGRS Group specializes in the placement of information technologists, with a very specific focus on object-oriented programmers. We effectively recruit for C/C++, C# and Java Software Engineers as well as Ruby on Rails, PHP, Python and Perl Developers. We work with the most talented developers in this business, at all levels in their career and within various areas such as Research and Development (R&D), Market Data, Trading Systems, HFT, Low Latency, Data Science, Machine Learning, Natural Language Processing (NLP), Artificial Intelligence (AI), Business Intelligence (BI), Big Data Analytics, Cloud Computing, Mobile Development, Websphere Portal and Business Integration.
this job portal is powered by CATS