Introduction
Nowadays virtually all organizations want to cope with the amount of data generated everywhere that is exponentially increasing from time to time. On the other hand, humanity is facing unprecedented challenges due to pressing issues such as climate change, conflict, and forced displacement that negatively affect the lives and well-being of societies. This creates the need for a reliable information system that can manage big data as well as support timely decisions for combating global problems.
At AI for Peace, we are particularly interested in data-driven technologies as we strive to implement advanced analytics and machine learning algorithms that require massive data from different sources. The present blog post makes the case for modern data warehousing solutions to automate the identification, access, storage, and analysis of relevant data from the bulk of information generated out there. To achieve this, we propose a data integration model that was adapted from the conceptual framework introduced by the Institute for Economics and Peace.
By definition, a data warehouse is a system for reporting and data analysis using a central repository of integrated data from one or more disparate sources. A data warehouse stores current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise (Wikipedia). The modern version of data warehouse incorporates features like agility, and large scale of operation including performing advanced analytics, and integration of structured, unstructured, and semi-structured data (Microsoft)
The rationale for modern data warehouse
The relationship diagram presented in Figure 1 captures the range of determinants that affect the level of peace in the society. As mentioned earlier, this data integration model was driven from the conceptual framework proposed by the Institute for Economics and Peace (IEP) in view of understanding and describing the factors that are associated with peaceful societies. Consequently, various types of data analysis implementations could be performed following the data integration model presented in Figure 1. In this case, we can consider peace as an outcome variable where we would like to promote under any circumstances. At the heart of the framework, the level of peace is explained by the eight pillars or outcomes that are interdependent and mutually reinforcing.
On top of peace outcomes, we introduced the explanatory/input variables that are broadly grouped into political, environmental, social, economic, connectivity, and demographic factors. These factors could affect one or more outcomes of peace independently or in combination. Additionally, the elements of each input variable are interconnected to each other which implies the existence of interaction between them that ultimately affect the level of peace in the society.
Figure 1: Mapping AI for PEACE data integration requirements
A holistic approach to the analysis of peace would require a massive data set on two or more indicators listed in the above data integration model. However, this requires a unique solution to bring the data that is available in different forms as well as at different locations. Moreover, different types of analysis require different types of data transformation and processing time. For instance, in descriptive and causal analysis, we might be less interested in the frequency of data processing while in the predictive analysis we usually aim to get the latest data as much as possible to produce a useful forecast on a timely basis. Additionally, machine learning and deep learning model implementations require complex data transformation that adds to the need for modern data warehousing solutions.
Implementation Plan
Modern data warehouse development can best be viewed as an evolving and iterative process (Predica). The development has to be evolving and adapting to the ever-changing business and technology environment. Additionally, agility gives an opportunity for implementing the platform at a different scale which is also changing from time to time. With this in mind, ScienceSoft recommends seven steps to the implementation of a modern data warehouse.
The process starts with goals elicitation at step 1 where we identify business objectives, scope, and data sources. In our case, this is partly covered in this blog post as we have our goal of building peace at the international and local level and the suggested data integration model that outlines the types of data we need to meet our analytical objectives.
Once we have identified the goals and specificity of the platform, the next step (Step 2) is platform selection. At this stage, we also chose the optimal deployment options (on-premises/in-cloud/hybrid), data volume capacity, and data security requirements.
In step 3, we define a data warehouse development roadmap that includes project scope, budget, and timelines. This is followed by the system analysis and data warehouse architecture design step. At this stage, we define a detailed analysis of each data source; and design data cleansing, security, and encryption (if any) policies.
The system analysis and architecture design step pave the way to step 5 which is the development and stabilization stage. In this phase, we will configure and customize the data warehouse platform and implement the data security policies. We also undertake data warehouse performance testing.
Next to the development and stabilization stage is launching. In this stage, we perform data migration, data quality assessment, and training of users. Finally, at step 7 we undertake performance fine-tuning, troubleshooting, and support that continues throughout the life of the platform.
Conclusion
In this blog post, we have discussed the relevance of modern data warehousing solutions to improve the achievement of organizational goals. We have explained how data warehouses can be designed in alignment with organizational goals and objectives. Finally, the implementation plan discussed here ensures the effective launch of modern data warehousing solutions that significantly cuts the time it takes to process and analyze, and handle big data that are key to decision making.
By Yared Hurisa, Chief Data Officer, AI for Peace
Comments