Main Purpose of the Role
We're building and operating the platform as a service (PaaS) and toolkit that enables Truphone's engineering community to develop and deploy products and services at a global scale.
Our hosted infrastructure and global connectivity backbone, integrated with public clouds, aims to support the most demanding requirements for scalability, availability and security.
We design, build and manage our infrastructure as code (IaC), operate highly available environments and enable every engineering team at Truphone with the internal systems and tools they need to autonomously build, deploy and operate the services under their remits, ranging from continuous integration, delivery and deployment (CI / CD) tools and pipelines to observability stacks and distributed big data platforms.
We love GitOps and automation, we strive for absolute resilience for all our mission-critical services. We truly care about our customers and partner with other teams to achieve the best possible outcomes, in a highly collaborative and diverse environment.
The Infrastructure Engineer is responsible for :
Design, deploy and operate our global sites & IaaS (infrastructure as a service);
Manage our infrastructure resources as code (compute, networking, storage), and expose the provisioning of those resources via defined interfaces (CLIs, APIs);
Perform systems hardening, working with our security team and using CIS standard to meet / surpass security audit requirements;
Maintain optimal observability over the systems and applications' availability and performance (logs, metrics, alerts);
Perform regular maintenance and house-keeping, verifying the integrity and availability of all hardware resources, systems, data and applications / processes, as well as the successful completion of any automated jobs (e.g. backups);
Maintain sw patches and upgrades, ensuring the systems are kept secure and compliant with best practice;
Investigate issues to their root cause and resolution, being available to support the teams' services on a 24x7 on-duty rota;
Think about scale, reliability and cost efficiency for our on-premises infrastructure;
Comfortable making decisions with a high degree of autonomy and collaborating within a multi-disciplinary team;
Set high standards for documentation of designs and systems.
Skills and Experience Required :
Experience in administration of Red Hat Linux / CentOS;
Experience in systems hardening and security management, IPTables, SELinux;
Experience in administration of virtualized environments based on VMware ESXi;
Administration of storage systems like NetApp;
Working knowledge of Infrastructure as Code and related tools such as terraform, ansible or puppet;
Management and use of CI / CD systems and pipelines like Gitlab;
Containerizing App / services with Docker;
Managing logging systems like Graylog or ELK;
Monitoring technology like Prometheus, OpenNMS, Nagios;
Management of proxies, web servers and load balancers, like Squid, NGINX and HAProxy;
Working knowledge of scripting technology;
Experience managing IAM systems like openLDAP, Active Directory and Bind DNS;
Understanding of essential TCP / IP protocols : TCP, UDP, ICMP, SMTP, SNMP, LDAP, DNS, NTP and others;
Understanding of fundamental hardware concepts : disk, memory, CPU, NIC, HBA, firmware, RAID, performance concepts (I / O, paging, swapping, system calls), iSCSI, Fibre Channel, Ethernet;
Positive and solution-oriented mindset, curious to experiment new technology and always eager to learn new skills;
Comfortable making decisions with a high degree of autonomy, while working within a multi-disciplinary team;
Be an awesome team player, share our values and work by those values.