Technical analysis: Sync with deletes + transformation

https://binnenland.atlassian.net/browse/OP-2869 identifies a bug in the current sync mechanism.

In some cases, a delete in the source system leads to too many deletes.

  • Deltas arrive in small chunks without any context.

    • To process these deltas (eventually) - the consumer attempts to retrieve context such as the datatype in order to eventually process the incoming changes

    • After converting the incoming delta + context, the resulting transformation needs to be filtered again, in order to not delete too many triples.

      • e.g. when the rdf:type of a subject was added as context, this statement should not be part of the DELETE query.

Mitigation:

Removing the context triples from the transformed graph mitigates the issue.

There are potential problems with this approach which should be taken into consideration when changing the conversions rules:

Deducing rdf:type triples in the rules should be avoided. e.g:

e.g.:

  • different rules deriving an `rdf:type` based on a property:

    • can lead to deletions of the :subject a ex:SomeClass triple when one triple with a matching property is deleted in the source system.

Solution

Some possibilities:

  • Keep current approach, keeping mitigations in mind

  • Add more complex filtereing logic in consumer code

    • potentially very complex logic which duplicates a lot of the reasoning logic

  • Do all mappings in JS code

    • Logic in one place

    • might be very slow, especially for an initial sync

      • Keep reasoner for initial sync?

Last updated