Job Req Number : 40103
At DSV we are looking for a Data Engineer to be part of our Data & Analytics team in Global IT . The focus of our team is to build advanced end-2-end solutions that create direct business value for DSV’s divisions, including for example : customs declaration automation;
vendor invoices automation; address validation; ETA prediction; and many more to come
The word advanced is used to underline that the use cases we solve tend to have a high degree of complexity, requiring non-deterministic problem solving (i.
e. the use of ML / AI), near real-time data processing, a need for high availability, vertical and horizontal scalability and a very high volume of transactions.
However, fancy technologies and accurate ML models do not solve the issues at hand alone; we strive to combine our competencies to build holistic solutions where the underlying complexity is hidden for the user to create simple and value-adding experiences.
As a Data Engineer your main responsibilities and activities will be :
Sourcing data from systems in DSV via the appropriate pattern for the use case (e.g. REST services, event streaming) or in the beginning of projects before the real integration is established (e.
g. simple file dumps on FTP servers)
Building near real-time event streaming data flows with separated micro services for different functional purposes (e.g.
file upload, image extraction, file extraction, etc.)
Re-configure the event streaming broker for optimization
Establish domain-based ontologies for data to ensure semantic interoperability between different systems
Making data available in the needed form to use in ML models and for visualization by business applications with a frontend and collect the inputs of the users as data enrichment from the frontend
Ensuring scalability of data flows by removing bottlenecks
Working together with the logging team, who set guidelines for how to structure your log outputs
We expect you to have experience with most of the following technologies :
Event streaming : Confluent Kafka, Kafka streams, KSQL
Logging & Monitoring : Elastic, Kibana, Grafana, Logstash, FluentD
Coding languages : Python, Java, Scala
Storage technologies such as :
SQL database : We use MySQL
NoSQL database : We use MongoDB
File systems : E.g. Azure Files, Filestore, etc.
Blob storage : E.g. Azure Blob Storage, S3, etc.
Version control : Git (we use Atlassian BitBucket as a wrapper on top of Git)
Containerization : Docker / containerd
Container orchestration : Kubernetes
OS : OS Linux (CentOS / RHCOS) and Win
BI tools : PowerBI, Qlik, etc.
Moderate experience with Cloud platforms : AWS, Azure, T-platforms, GCP
It is a bonus if you also have some experience with some of the other technologies that our team works with, such as :
Other data processing frameworks : E.g. Spark or Ray
ML model serving : TensorFlow serving, Torch serving
Authentication : Open ID Connect 2.0 (we use Red Hat KeyCloak as identity broker)
CI / CD Pipelines : Jenkins (our templates are written in Groovy) and AzureDevOps
Load balancing : NGINX
Installation scripts : Ansible
Requirements : Jira
Documentation : Confluence
Test framework : Jest
ML Frameworks : TensorFlow / PyTorch
Proficient Level of English (spoken & written)
You like to :
Learn from some of the most knowledgeable colleagues, and to teach them about stuff you know
Fine tune partitioning, replicas and offsets to improve performance
Build scalable and robust data flows that can process large volumes of data in near-realtime
Analyze statistics of your data flows to identify bottlenecks and to remove these to improve the data flow
Log key aspects of your data flows to ensure observability and lineage
Keep up to date with the new technologies and employ them in your work if they can add value
Use well-structured naming conventions
Deliver working software with high quality and test coverage
Find pragmatic solutions that balance the need between what is the optimal solution from a theoretical standpoint with what is possible within the constraints set by project deadlines, current data availability, etc.
You always think about tomorrow’s requirements when designing a data flow
Take a lot of responsibility both for exciting R&D work to push the boundaries of data processing, but also for doing the necessary nitty gritty work in the data flow
Have a holistic view by understanding the business context and to help the data scientists getting the best data for their models and the application developers getting the data needed for the UI
Automate testing and evaluation of the performance of your data pipeline
Break down the solutions into iterations so they can deliver value quickly in MVP versions before they are enriched with more nice-to-have functionality in later iterations
Reach out to others for help or clarifications whenever you need it and to ensure alignment with others
Make realistic mockups of data to allow you to test things swiftly on synthetic data before you get access to production data
Ensure semantic interoperability between different systems by standardizing data definitions
We are an ambitious team with a flat hierarchy and a mix of young and very experienced persons , who are working according to the following principles :
We celebrate victories together
We take responsibility for mistakes and learn from them
We design for scale but build only for the near future
We value working software and informal alignment over tedious documentation
We make decisions based on knowledge and insight rather than hierarchical structures
Decisions are the product of conversations between people with different competencies (not one person)
Everyone can speak their honest opinion
We have all the needed competencies to build awesome products inside the team : Product owner, Business analysts, Application developers (frontend + backend), Data engineers, Data scientists, ML engineers, DevOps engineers.
Job / Environment
Your job location will be Portugal (Lisboa, Saldanha) and you will be part of DSV Global IT with peers working in remote teams across the Globe
Permanent Contract with 35h / week
DSV Global Transport and Logistics