Skip to content

Data Organization and Enhancement

Delphi Analytics Rwanda recently received some data from an administrator of finance institutions in Rwanda (our ‘Client’). And, in order to add value to the data received, Delphi needed to make progress in two distinct areas that bear discussion, Data Organization and Data Enhancement. These areas of progress are very common in many data projects, so we decided that we would discuss these common challenges in this brief post.

Our Client was in possession of data that related to remote offices associated with their business purpose. These remote offices are spread throughout Rwanda and are administered through various regional and organization structures that are relevant for the business purposes of the organization. First, there are organization type and affiliation characteristics of the remote offices, and second there are geographic considerations of where these remote offices are located.

Data Organization

As a consequence of these various factors of the physical and administrative organization properties, the data that were presented to Delphi consisted of twenty distinct data files – each was prepared by the manager of that particular aspect of the organization – and, as one might expect, the data files were organized in slightly different ways. The data files generally contained the same (or very similar) information. But, the order of the ‘columns’ of data were different, with the prioritization for quality of data being dependent upon that particular manager’s emphasis on business priorities. Some files omitted columns and some contained information unique for that particular manager. As an example, some manager’s primary method of contact with their charges was via e-mail, so the e-mail column of data was very consistently entered – and some manager’s primary method of contact with their charges was via telephone, so the telephone contact information supplied by that manager was consistently entered. But, very often, the non-preferred contact information was missing or only sparsely completed, at best.

In other words, some managers viewed certain types of information with different priorities that the organization as a whole. This ‘local’ prioritization of information that was not consistent with the ‘global’ business priorities points to a very common problem faced by management.

In order to proceed with its data organization task, Delphi created a common, unified structure for all of the data provided in the twenty individual data files. This required quite a bit of effort to coordinate and align the data structure to meet the global business needs at the central office.

By means of methodology, Delphi created a unified MySQL database that contained every piece of information contained in the twenty individual data files – so that no information that was important to a local manager was lost. This is the database design part of the analysis task.

Then, Delphi had to create an ‘Extract, Translate and Load’ (ETL) procedure that would enable the information in each of the twenty individual files to be extracted, translated into a common structure and be loaded into the common database.

This means that, say, the fifth column of data from one file, or the third column from another file, be extracted and translated into the common data format (e.g., be converted from numeric to text data), and that the data then be loaded into the appropriate column of the final, common database.

As data are consolidated for the benefit of the entire ‘global’ business structure, as was done by Delphi, putting a consistent data organization structure allows the global managers to identify these different priorities by the local managers. With the global managers’ oversight, then priorities at the local level can be adjusted to ensure consistent data quality that meets the overall organization’s objectives. Without this global data organization and approach, it would be difficult to identify and modify the inconsistent data priorities.

Data Enhancements

Another area in which Delphi added value to this Client’s data had to do with enhancements to data that were presented.

In this case, Delphi expanded the available information available to the organization by ‘connecting’ the data fields in the tables provided to externally available information provided by other sources. Geographic data provides a case, in point.

One of the administrator’s data priorities dealt with geographic location of the offices it supervises – in fact, a major part of the managerial organization deals with the regional nature of these offices. So, to facilitate the geographic understanding of the underlying data, Delphi connected the geographic labels inherent in the distributed offices’ location (the District and Sector information in the office titles) to internationally provided public databases of geographic information (GIS data – or Geographic Information Systems data).

Delphi has long had expertise in utilizing GIS data in many of it’s analysis projects. GIS data can include boundaries of a state’s administrative units (down to the village boundaries in Rwanda). It can include population statistics relating to demographics, age, income, education level, etc. It can include physical geographic information such as amount of rainfall, elevation, and proximity to infrastructure such as roadways, railways and waterways. There is a wealth of information available in publicly maintained and freely available GIS data.

So, to enhance the data provided by this administrator, Delphi connected the MySQL database to a Postgres database of GIS information specific to Rwanda – which is often called a PostGIS implementation – using the ‘R’ language as the connecting ‘glue.’ Data were read the MySQL tables and manipulated using SQL commands issued in the ‘R’ language, the data were processed using ‘R’ routines to make calls to the PostGIS implementation to receive the supplemental data – and finally were further processed in R to provide a final, graphical result.

As an example of the output, the geographic units associated with the administrator’s remote offices were plotted within the associated geographic boundaries, as is shown in the accompanying plot. The boundary data for each administrative unit was part of the GIS data connected to and retrieved by the ‘R’ process implemented by Delphi.

This is part of the ‘magic’ of advance data analytics. Delphi has long excelled in Data Organization tasks, such as database design, ETL processes, etc. – and Delphi is noted for its ability in Data Enhancements – it’s ability to bring in supplemental information that increases a client’s understanding of it’s own data.

Leave a Reply

Your email address will not be published. Required fields are marked *