ETL and Data Cleansing for social services datasets
Social Finance wished to create an analytics system to help understand the case histories of vulnerable young people. The data was supplied to central government by local authorities in a complex XML format and data was often missing or inconsistent. This data was highly sensitive so strict data security protocols were necessary.
The XML files were parsed and transformed into a set of relational tables. Heuristics were devised to correct missing and inconsistent values. Fields that carried a high risk of deanonymisation were removed.