Jump to content

KDE Core/OpenCodes: Difference between revisions

From KDE Community Wiki
Jlayt (talk | contribs)
Created page with "The OpenCodes project is part of the KDE_Core/KDE_Open_Data KDE Open Data initiative seeking to develop a standard json file format for ISO Codes and to provide a set of d..."
 
Jlayt (talk | contribs)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
The OpenCodes project is part of the [[KDE_Core/KDE_Open_Data KDE Open Data]] initiative seeking to develop a standard json file format for ISO Codes and to provide a set of data files derived from Wikidata.
The OpenCodes project is part of the [[KDE_Core/KDE_Open_Data|KDE Open Data]] initiative seeking to develop a standard json file format for ISO Codes and to provide a set of data files derived from Wikidata.


== Architecture==
== Architecture==
Line 5: Line 5:
OpenCodes will be a single git repository containing a set of scripts to maintain the ISO Codes data as well as the data files themselves. The repo will not contain any code APIs to utilise the data, this is to ensure the project is completely standalone and can be utilised by as many other projects as possible.
OpenCodes will be a single git repository containing a set of scripts to maintain the ISO Codes data as well as the data files themselves. The repo will not contain any code APIs to utilise the data, this is to ensure the project is completely standalone and can be utilised by as many other projects as possible.


A python script will use the Wikidata Query API once available (or other tools such as [https://tools.wmflabs.org/wikidata-todo/autolist2.php Autolists] in the interim) to list all Items for the ISO code Property and then obtain all the required Properties for each Item instance. This data will then be merged with any extra fields OpenCodes requires and written to the base set of json files which will be committed to the repo.  
The data files will be in JSON format with a schema defined using the JSON Schema standard will will allow for automated verification and consumption.


A second python script will generate the payload files from the base files in a choice of formats:
A Python script will use the Wikidata Query API once available (or other tools such as [https://tools.wmflabs.org/wikidata-todo/autolist2.php Autolists] in the interim) to list all Items for the ISO code Property and then obtain all the required Properties for each Item instance. This data will then be merged with any extra fields OpenCodes requires and written to the base set of json files which will be committed to the repo.
 
A second Python script will generate the payload files from the base files in a choice of formats:
* Data as separate files for each ISO code instance or a single file containing all instances
* Data as separate files for each ISO code instance or a single file containing all instances
* Translations as .po files or JSON translation files (node.js format, and any others) or inline in data files
* Translations as .po files or JSON translation files (node.js format, and any others required) or inline in data files




Line 22: Line 24:
=== Wikidata Feed===
=== Wikidata Feed===


The following Items will be used:
The following Items are defined in Wikidata but are not connected to the Properties and have poor translations and definitions, probably due to import from Freebase:
* ISO 3166 (Q106487)
** ISO 3166-1 (Q25275)
*** ISO 3166-1 alpha-2 (Q1140221)
*** ISO 3166-1 alpha-3 (Q1341492)
*** ISO 3166-1 numeric (Q2725758)
** ISO 3166-2 (Q133153)
** ISO 3166-3 (Q877561)
 
 
We should propose changes to Wikidata to link these to the actual properties used for the countries. Because they are not currently connected we cannot start the query from the Item.
 
The following Properties are defined for all admin Items:
* continent (P30)
* contains administrative territorial entity (P150)
* flag image (P41)
* flag (P163)
* time zone (P421)
* capital (P36)
* OpenStreetMap Relation ID (P402)
* FIPS 10-4 (countries and regions) (P901)
 
 
The following Properties are defined on the Item for each Country:
* ISO 3166-1 alpha-2 (P297)
* ISO 3166-1 alpha-3 (P298)
* ISO 3166-1 numeric (P299)
* country calling code (P474)
* FIPS 10-4 (countries and regions) (P901)
* IOC country code (P984)
* top-level domain (P78)
* licence plate code (P395)
* official language (P37)
* currency (P38)
 
 
The following Properties are defined on the Item for each Subdivision:
* type of administrative territorial entity (P132)
* ISO 3166-2 (P300)
 
 
The following Properties are also on OSM but not Wikidata:
* Name
* Long Name
* Official Name
 
 
The following items are also in KDE but not Wikidata:
*
 
 
To obtain all country Items for the ISO alpha-2 code run the following query: [http://wdq.wmflabs.org/api?q=claim&#91;297&#93;<nowiki>http://wdq.wmflabs.org/api?q=claim[297]</nowiki>]

Latest revision as of 16:01, 6 July 2014

The OpenCodes project is part of the KDE Open Data initiative seeking to develop a standard json file format for ISO Codes and to provide a set of data files derived from Wikidata.

Architecture

OpenCodes will be a single git repository containing a set of scripts to maintain the ISO Codes data as well as the data files themselves. The repo will not contain any code APIs to utilise the data, this is to ensure the project is completely standalone and can be utilised by as many other projects as possible.

The data files will be in JSON format with a schema defined using the JSON Schema standard will will allow for automated verification and consumption.

A Python script will use the Wikidata Query API once available (or other tools such as Autolists in the interim) to list all Items for the ISO code Property and then obtain all the required Properties for each Item instance. This data will then be merged with any extra fields OpenCodes requires and written to the base set of json files which will be committed to the repo.

A second Python script will generate the payload files from the base files in a choice of formats:

  • Data as separate files for each ISO code instance or a single file containing all instances
  • Translations as .po files or JSON translation files (node.js format, and any others required) or inline in data files


For Linux installs generated using 'make install' the base files will be installed to /usr/share/opencodes and .po translation files installed to /usr/locale/.

Translations will be sourced from both Wikidata and KDE. It is expected that Wikidata will have a greater number of languages supported than KDE so will be the preferred source, but KDE may have some languages unsupported in WIkidata so we need to cater for this. It is hoped KDE translators will submit translations directly to Wikidata, but we cannot automate this as Wikidata uses CC-0 licensing.

Country Code

JSON File Format

Wikidata Feed

The following Items are defined in Wikidata but are not connected to the Properties and have poor translations and definitions, probably due to import from Freebase:

  • ISO 3166 (Q106487)
    • ISO 3166-1 (Q25275)
      • ISO 3166-1 alpha-2 (Q1140221)
      • ISO 3166-1 alpha-3 (Q1341492)
      • ISO 3166-1 numeric (Q2725758)
    • ISO 3166-2 (Q133153)
    • ISO 3166-3 (Q877561)


We should propose changes to Wikidata to link these to the actual properties used for the countries. Because they are not currently connected we cannot start the query from the Item.

The following Properties are defined for all admin Items:

  • continent (P30)
  • contains administrative territorial entity (P150)
  • flag image (P41)
  • flag (P163)
  • time zone (P421)
  • capital (P36)
  • OpenStreetMap Relation ID (P402)
  • FIPS 10-4 (countries and regions) (P901)


The following Properties are defined on the Item for each Country:

  • ISO 3166-1 alpha-2 (P297)
  • ISO 3166-1 alpha-3 (P298)
  • ISO 3166-1 numeric (P299)
  • country calling code (P474)
  • FIPS 10-4 (countries and regions) (P901)
  • IOC country code (P984)
  • top-level domain (P78)
  • licence plate code (P395)
  • official language (P37)
  • currency (P38)


The following Properties are defined on the Item for each Subdivision:

  • type of administrative territorial entity (P132)
  • ISO 3166-2 (P300)


The following Properties are also on OSM but not Wikidata:

  • Name
  • Long Name
  • Official Name


The following items are also in KDE but not Wikidata:


To obtain all country Items for the ISO alpha-2 code run the following query: http://wdq.wmflabs.org/api?q=claim[297]