Together with Porphyrio, Sentigrate developed a data pipeline to reliably and accurately ingest all sorts of data into their system.
A combination of Nifi and Airflow allowed us to programmatically define the data flow and corresponding steps in Python. This way the data can be transformed, manipulated and cleaned in a testable and extensible way, allowing for a clear separation of concern for each step.
The results of each intermediate step were saved to a separate table in a Postgres database, which ensured easier debugging but also allowed us to easily resume the data flow when necessary.