Data Engineer, Data & Analytics, Global IT
DSV
Lisbon, PT
há 1 dia

Job Req Number : 40103

40103

  • At DSV we are looking for a Data Engineer to be part of our Data & Analytics team in Global IT . The focus of our team is to build advanced end-2-end solutions that create direct business value for DSV’s divisions, including for example : customs declaration automation;
  • vendor invoices automation; address validation; ETA prediction; and many more to come

    The word advanced is used to underline that the use cases we solve tend to have a high degree of complexity, requiring non-deterministic problem solving (i.

    e. the use of ML / AI), near real-time data processing, a need for high availability, vertical and horizontal scalability and a very high volume of transactions.

    However, fancy technologies and accurate ML models do not solve the issues at hand alone; we strive to combine our competencies to build holistic solutions where the underlying complexity is hidden for the user to create simple and value-adding experiences.

    As a Data Engineer your main responsibilities and activities will be :

  • Sourcing data from systems in DSV via the appropriate pattern for the use case (e.g. REST services, event streaming) or in the beginning of projects before the real integration is established (e.
  • g. simple file dumps on FTP servers)

  • Building near real-time event streaming data flows with separated micro services for different functional purposes (e.g.
  • file upload, image extraction, file extraction, etc.)

  • Re-configure the event streaming broker for optimization
  • Establish domain-based ontologies for data to ensure semantic interoperability between different systems
  • Making data available in the needed form to use in ML models and for visualization by business applications with a frontend and collect the inputs of the users as data enrichment from the frontend
  • Ensuring scalability of data flows by removing bottlenecks
  • Working together with the logging team, who set guidelines for how to structure your log outputs
  • We expect you to have experience with most of the following technologies :

  • Event streaming : Confluent Kafka, Kafka streams, KSQL
  • Logging & Monitoring : Elastic, Kibana, Grafana, Logstash, FluentD
  • Coding languages : Python, Java, Scala
  • Storage technologies such as :
  • SQL database : We use MySQL
  • NoSQL database : We use MongoDB
  • File systems : E.g. Azure Files, Filestore, etc.
  • Blob storage : E.g. Azure Blob Storage, S3, etc.
  • Version control : Git (we use Atlassian BitBucket as a wrapper on top of Git)
  • Containerization : Docker / containerd
  • Container orchestration : Kubernetes
  • OS : OS Linux (CentOS / RHCOS) and Win
  • BI tools : PowerBI, Qlik, etc.
  • Moderate experience with Cloud platforms : AWS, Azure, T-platforms, GCP
  • It is a bonus if you also have some experience with some of the other technologies that our team works with, such as :

  • Other data processing frameworks : E.g. Spark or Ray
  • ML model serving : TensorFlow serving, Torch serving
  • Authentication : Open ID Connect 2.0 (we use Red Hat KeyCloak as identity broker)
  • CI / CD Pipelines : Jenkins (our templates are written in Groovy) and AzureDevOps
  • Load balancing : NGINX
  • Installation scripts : Ansible
  • Requirements : Jira
  • Documentation : Confluence
  • Frontend technologies : React JS, Material UI, JavaScript / TypeScript, Redux
  • Test framework : Jest
  • ML Frameworks : TensorFlow / PyTorch
  • Languages :

  • Proficient Level of English (spoken & written)
  • You like to :

  • Learn from some of the most knowledgeable colleagues, and to teach them about stuff you know
  • Fine tune partitioning, replicas and offsets to improve performance
  • Build scalable and robust data flows that can process large volumes of data in near-realtime
  • Analyze statistics of your data flows to identify bottlenecks and to remove these to improve the data flow
  • Log key aspects of your data flows to ensure observability and lineage
  • Keep up to date with the new technologies and employ them in your work if they can add value
  • Use well-structured naming conventions
  • Deliver working software with high quality and test coverage
  • Find pragmatic solutions that balance the need between what is the optimal solution from a theoretical standpoint with what is possible within the constraints set by project deadlines, current data availability, etc.
  • You always think about tomorrow’s requirements when designing a data flow
  • Take a lot of responsibility both for exciting R&D work to push the boundaries of data processing, but also for doing the necessary nitty gritty work in the data flow
  • Have a holistic view by understanding the business context and to help the data scientists getting the best data for their models and the application developers getting the data needed for the UI
  • Automate testing and evaluation of the performance of your data pipeline
  • Break down the solutions into iterations so they can deliver value quickly in MVP versions before they are enriched with more nice-to-have functionality in later iterations
  • Reach out to others for help or clarifications whenever you need it and to ensure alignment with others
  • Make realistic mockups of data to allow you to test things swiftly on synthetic data before you get access to production data
  • Ensure semantic interoperability between different systems by standardizing data definitions
  • We are an ambitious team with a flat hierarchy and a mix of young and very experienced persons , who are working according to the following principles :

  • We celebrate victories together
  • We take responsibility for mistakes and learn from them
  • We design for scale but build only for the near future
  • We value working software and informal alignment over tedious documentation
  • We make decisions based on knowledge and insight rather than hierarchical structures
  • Decisions are the product of conversations between people with different competencies (not one person)
  • Everyone can speak their honest opinion
  • We have all the needed competencies to build awesome products inside the team : Product owner, Business analysts, Application developers (frontend + backend), Data engineers, Data scientists, ML engineers, DevOps engineers.

    Job / Environment

  • Your job location will be Portugal (Lisboa, Saldanha) and you will be part of DSV Global IT with peers working in remote teams across the Globe
  • International Environment
  • Permanent Contract with 35h / week
  • DSV Global Transport and Logistics

    Reportar esta oferta de trabalho
    checkmark

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Candidatar
    Meu email
    Ao clicar em "Continuar", autorizo a neuvoo a processar os meus dados e a enviar-me alertas de e-mail, conforme detalhado na Política de Privacidade da neuvoo . Posso retirar o meu consentimento ou cancelar a subscrição a qualquer momento.
    Continuar
    Formulário de candidatura