Analysis and Quality Assessment of Source Data Sets
In the analysis phase of the project, we have received seven data sets from Moldova and five data sets from the Ukraine. For all data sets, we performed a quick quality analysis. This analysis included the following checks:
- Completeness: Are all fields in the source data filled?
- Consistency: Are there many inconsistent values, such as overlapping geometries, or different spellings of the same names?
- Coverage: Can we likely get the minimum required information to fulfill INSPIRE requirements from the source data sets?
- Encoding: Is the encoding clear and correct?
This is an overview of all data sets we have analysed, and those we selected for the pilot project:
Source Theme | Moldova | Ukraine | Selected |
---|---|---|---|
Administrative Units | Yes | Yes | Yes |
Forests | Yes | Yes | |
Hydrography | Yes | Yes | Yes |
Transportation | Yes | Yes | Yes |
Vegetation | Yes | Yes | |
Soil | Yes | ||
Land Cover | Yes |
This table summarizes the individual results per data set. We have used the validation (Schema + TEAM Engine, if appropriate), attribute coverage and analysis tools of hale connect to determine these values.
Source Data Set | Completeness | Consistency | Coverage | Encoding/Others |
---|---|---|---|---|
Moldova - Communes | 100% | 100% | 100% | First delivery had broken encoding, second delivery had correct UTF-8 encoding |
Ukraine - Districts | 100% | 100% | 100% | No issues |
Ukraine - Settlements | 98% | 100% | 98% | One object was incomplete, missing ENG and KOATUU |
Ukraine - Councils | 100% | 100% | 100% | No issues |
Ukraine - Roads | 25% | 100% | 100% | Only geometry is available, but classification would also be required. |
Ukraine - Railways | 75% | 100% | 100% | The names column is empty, but not mandatory. |
This is part of an example analysis result, showing the attribute coverage of a given data set. We can quickly see whether any attributes are incomplete, and we can see if there are inconsistencies.