Skip to content

PyPI version

Incremental Xmlupload

When uploading data with the xmlupload command, resources can reference each other with an internal ID, e.g. in the <resptr> tag. Once the data is in DSP, the resources cannot be referenced by their internal ID anymore. Instead, the resource's IRI which was generated by the DSP has to be used. After a successful xmlupload, the mapping of internal IDs to their respective IRIs is written to a file called id2iri_mapping_[timestamp].json.

The mapping is necessary if at a later point in time additional data should be uploaded. Depending on the type of references the additional data contains, there are 4 different ways how new data has to be uploaded:

  1. no references to existing resources: normal xmlupload
  2. references to existing resources via IRIs: incremental xmlupload
  3. references to existing resources via internal IDs: first id2iri, then incremental xmlupload
  4. continue an interrupted xmlupload: first id2iri, then incremental xmlupload

1. No References to Existing Resources

The first case is the simplest one: No mapping is required, and the additional data can be uploaded with:

dsp-tools xmlupload additional_data.xml

2. References to Existing Resources Via IRIs

The second case is relatively easy, too: The file additional_data.xml contains references like <resptr>http://rdfh.ch/4123/nyOODvYySV2nJ5RWRdmOdQ</resptr>. Such a file can be uploaded with:

dsp-tools xmlupload additional_data.xml

3. References to Existing Resources Via Internal IDs

The third case, however, is a bit more complicated: The file additional_data.xml contains references like <resptr>book_1</resptr>, or <text><a class="salsah-link" href="IRI:book_1:IRI">link to book_1</a></text>, where book_1 was the internal ID of a resource that had previously been uploaded to DSP.

Before such an XML file can be uploaded, the internal IDs must be replaced with their respective IRIs. That's where the JSON mapping file comes in: It contains a mapping from book_1 to http://rdfh.ch/4123/nyOODvYySV2nJ5RWRdmOdQ.

As a first step, a new file must be generated with the id2iri command:

dsp-tools id2iri additional_data.xml id2iri_mapping_[timestamp].json

In a second step, the newly generated XML file can be uploaded to DSP:

dsp-tools xmlupload additional_data_replaced_[timestamp].xml

4. Continue an Interrupted Xmlupload

If a xmlupload didn't finish successfully, some resources have already been created, while others have not. If one of the remaining resources references a newly created resource by its internal ID, this internal ID must be replaced by the IRI of the newly created resource.

Additionally, the newly created resources must be removed from the XML file. Otherwise, they would be created a second time.

In such a case, proceed as follows:

  1. Initial xmlupload: dsp-tools xmlupload data.xml
  2. A crash happens. Some resources have been uploaded, and a id2iri_mapping_[timestamp].json file has been written
  3. Fix the reason for the crash
  4. Replace the internal IDs and remove the created resources with: dsp-tools id2iri data.xml --remove-resources id2iri_mapping_[timestamp].json
  5. Upload the outputted XML file with dsp-tools xmlupload data_replaced_[timestamp].xml

Last update: September 12, 2023