Jump to content

Projects/Nepomuk/MetadataSharing

From KDE Community Wiki
Revision as of 15:47, 23 August 2012 by Vhanda (talk | contribs) (Moved from techbase)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Metadata Sharing

Use Cases

  • Perform Nepomuk queries on other systems
  • Allow importing metadata like Tags from other repositories
  • Send the metadata when transmitting files over email/XMPP

Proposed Architecture

Each repository is identified by unique id - nso:repositoryIdentifier. It can be a hash of a randomly generated number + Name of the owner + Machine + hostname. Some combination that would have a very low chance of being the same on two machines.

A Nepomuk service exists which loads the metadata transfer plugins. Planned plugins -

  • Local Sharing via Sparql endpoint + zeroconf
  • XMPP via Telepathy
  • Maybe allow querying sparql endpoints?

Querying

Raw SPARQL queries are not allowed, as restricting the queries, for privacy, would be very difficult. The Nepomuk Query API must be used for all queries.

The idea is that one pimo:Agent own one or more nso:Repositories. One can query either the specific machine or the pimo:Agent, in which case all accessible machines will be queried.


Metadata Sharing

The meta data for any resource can be requested through the metadata sharing service. All of meta data of that resource is returned. Maybe sub resources as well?

trueg: IMHO sub-resources should be returned in any case. (I even think that we should default to that in DMS - maybe even remove the possibility to not do it. But that is another topic.) This is something that should be covered by the DMS method describeResources which should be extended to allow the specification of the receiving end to filter by permissions.

Storing the metadata

1. If the resource doesn't exists - Identification is unsuccessful. Then the entire resource is saved as is. The resource uri will obviously be different. We need some way of mentioning that this resource is the same as that one of that machine.

Possible Solution - Use repository identifier

<nepomuk:/res/coldplay>
    a nco:Contact ;
    nco:fullname "Coldplay" .

from machine with identifier "Charlie-Brown" would get saved as

<nepomuk:/res/some-uuid>
    a nco:Contact ;
    nco:fullname "Coldplay" ;
    nxx:sameAs <nepomuk:/Charlie-Brown/res/coldplay> .

<nepomuk:/Charlie-Brown/res/coldplay>
    a nxx:RemoteResource ;
    nxx:belongsTo <nepomuk:/res/charlie-brown-repo> .

Maybe the nxx:belongsTo should be specified in the graph?

trueg: This seems a bit weird. Why not rely on the identification both ways? After all you cannot know what happens with the original resource after the merge. It could change completely in which case the nxx:sameAs would not be valid anymore.

2. If the resource exists, and it is being merged, then the graph must be annotated with the creator information. How do we do that?

One option is nco:creator. But I don't agree with the range of nco:creator - nco:Contact is not sufficient. It should be pimo:Agent. Plus, how do we specify the repository from which the data came from? Both nso:belongsTo and nco:creator?

trueg: forget nco:creator. It does not belong here. Use nao:creator which has a range of nao:Party. So as discussed before we only need to make pimo:Agent a sub-class to nao:Party to make this work.

https://sourceforge.net/apps/trac/oscaf/ticket/100

One option is that we only store the creator as the nso:Repository. The pimo:Agent can be inferred.

<nepomuk:/ctx/uuid>
    a nrl:InstanceBase ;
    nco:creator <nepomuk:/res/some-repo> .

<nepomuk:/res/some-repo>
    a nso:Repository ;
    nso:belongsTo <nepomuk:/res/pimoPerson> .

<nepomuk:/res/pimoPerson>
    a pimo:Person ;
    pimo:groundingOccurance <nepomuk:/res/contact> .

<nepomuk:/res/contact>
    a nco:Contact ;
    nco:fullname "Johnie Walker" .

trueg: The way I understood it a graph would be derived from a repository and the repository would be created (nao:creator) by a pimo:Person.