Hi all, and welcome back to the site – I appreciate it has been an unexpectedly long time since I last posted…in fact my last post was around this time last year. Hopefully I can get back on the “treadmill” and churn out some articles at a somewhat faster rate than 1 a year over the next couple of months! Well that’s my aim anyway.
Ok so this post will be based on how to build and structure what is often referred to as a “data pipeline”; essentially it is the part of the overall project workflow concerned with the gathering and storing of data, along with any wrangling/munging/pre-processing/transforming of that data for later use.
Ideally this endeavour should aim to produce a code module which is robust, efficient, reusable, scalable and maintainable. It should also aim to produce a well structured, easily accessible store of “high-quality”, ready to use financial data sets and series.
I thought I’d start by sketching out a rough mind-map style representation of the concepts and ideas I want to cover and the basic points I want to make; this is shown below.