Main Purpose of the Role
We're building and operating the platform as a service (PaaS) and toolkit that enables Truphone's engineering community to develop and deploy products and services at a global scale.
Our hosted infrastructure and global connectivity backbone, integrated with public clouds, aims to support the most demanding requirements for scalability, availability and security.
We design, build and manage our infrastructure as code (IaC), operate highly available environments and enable every engineering team at Truphone with the internal systems and tools they need to autonomously build, deploy and operate the services under their remits, ranging from continuous integration, delivery and deployment (CI / CD) tools and pipelines to observability stacks and distributed big data platforms.
We love GitOps and automation, we strive for absolute resilience for all our mission-critical services. We truly care about our customers and partner with other teams to achieve the best possible outcomes, in a highly collaborative and diverse environment.
The DevOps Engineer is responsible for :
work closely with all our engineering teams to support the adoption and use of our platform services and tools, advocating for best practice and improving the developer experience;
tackle technical challenges and troubleshoot complex issues with our different products and services, contributing with improvements to the platform and tools, as well as products and services, at every opportunity;
identify operational, performance, security and availability shortcomings and bottlenecks, providing advise to improve the product evolution;
help our engineering teams improving speed and effectiveness, automating routine work (toil) and eliminating human intervention from repetitive activity;
assist our engineering teams on debugging customer issues and tackling production outages;
support definition and adoption of monitoring / alerting best practices and implementation of service level objectives;
create and maintain the relevant documentation for peers and users (e.g. handbooks, tutorials);
support our commitment with high availability and reliability, taking part of a 24x7 on-call production support;
develop your skills at all levels, being a part of a team that owns each solution from design to operation, taking ownership for its outcomes.
Skills and Experience Required :
Strong knowledge and experience in software engineering / development;
Knowledgeable of best practices, standards and patterns for building and running reliable software across distributed systems;
Proficiency with a set of programming and scripting languages e.g. Java, .Net, Go;
Comfortable in and around Linux, Docker, Kubernetes and distributed systems;
Experience with data modelling and relational databases;
Experience implementing REST interfaces and CLIs;
Excellent working knowledge of CI / CD tools and pipelines for software lifecycle and infrastructure provisioning (e.g. gitlab, nexus);
Experience with logging, metrics, monitoring and tracing technologies and systems (e.g. prometheus, grafana, telegraf, opentracing);
Positive and solution-oriented mindset, curious to experiment new technology and always eager to learn new skills;
Confortable making decisions with a high degree of autonomy, while working within a multi-disciplinary team;
Be an awesome team player, share our values and work by those values.