Metadata-Version: 2.1
Name: ckanext-dcor_depot
Version: 0.15.3
Summary: Manages data storage for DCOR
Author: Paul Müller
Maintainer-email: Paul Müller <dev@craban.de>
License: GNU Affero General Public License v3 or later (AGPLv3+)
Project-URL: source, https://github.com/DCOR-dev/ckanext-dcor_depot
Project-URL: tracker, https://github.com/DCOR-dev/ckanext-dcor_depot/issues
Project-URL: changelog, https://github.com/DCOR-dev/ckanext-dcor_depot/blob/main/CHANGELOG
Keywords: DC,DCOR,deformability,cytometry
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Visualization
Classifier: Intended Audience :: Science/Research
Requires-Python: <4,>=3.8
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: boto3
Requires-Dist: ckan<3,>=2.10.4
Requires-Dist: click
Requires-Dist: dclab>=0.60.9
Requires-Dist: dcor_shared>=0.10.0
Requires-Dist: h5py
Requires-Dist: html2text==2019.8.11
Requires-Dist: numpy
Requires-Dist: requests

ckanext-dcor_depot
==================

|PyPI Version| |Build Status| |Coverage Status|

This plugin manages how data are stored in DCOR. There are two types of
files in DCOR:

1. Resources uploaded by users, imported from figshare, or
   imported from a data archive
2. Ancillary files that are generated upon resource creation, such as
   condensed DC data, preview images (see
   `ckanext-dc_view <https://github.com/DCOR-dev/ckanext-dc_view>`_).

This plugin implements:

- Data storage management. All resources uploaded by a user are moved
  to ``/data/users-HOSTNAME/USERNAME-ORGNAME/PK/ID/PKGNAME_RESID_RESNAME``
  and symlinks are created in ``/data/ckan-HOSTNAME/resources/RES/OUR/CEID``
  via a background job.
  CKAN itself will not notice this. The idea is to have a filesystem overview
  about the datasets of each user.
- A backround job that uploads resources to S3 in `after_resource_create`
  if the resources were uploaded via the legacy upload route.
- A background job that backs up resources from S3 to local block storage
  if the resources were uploaded via the S3 upload route.
- Import datasets from figshare. Existing datasets from figshare are
  downloaded to the ``/data/depots/figshare`` directory and, upon resource
  creation, symlinked there from  ``/data/ckan-HOSTNAME/resources/RES/OUR/CEID``
  (Note that this is an exemption of the data storage management described
  above). When running the following command, the "figshare-import" organization
  is created and the datasets listed in ``figshare_dois.txt`` are added to CKAN:

  ::

     ckan import-figshare


- CLI for symlinking datasets that have failed to symlink before:

  ::

     ckan run-jobs-dcor-depot


- CLI for appending a resource to a dataset

  ::

     ckan append-resource /path/to/file dataset_id --delete-source

Please make sure that the necessary file permissions are given in ``/data``.

In 2023, it was decided that the huge block storage of DCOR
should be replaced with an S3-compatible object store, because block storage
does not scale well. This partially deprecates some of the commands above
which might be removed or modified to support object storage directly.

- CLI for migrating data from block storage to an S3-compatible object storage
  service. For this, the following configuration keys must be specified in
  the ``ckan.ini`` file::

    dcor_object_store.access_key_id = ACCESS_KEY_ID
    dcor_object_store.secret_access_key = SECRET_ACCESS_KEY
    dcor_object_store.endpoint_url = S3_ENDPOINT_URL
    dcor_object_store.ssl_verify = true
    # The bucket name is by default defined by the circle ID. Resources
    # are stored in the "RES/OUR/CEID-SCHEME" in that bucket.
    dcor_object_store.bucket_name = circle-{organization_id}

  Usage::

    ckan dcor-migrate-resources-to-object-store


Installation
------------

::

    pip install ckanext-dcor_depot


Add this extension to the plugins and defaul_views in ckan.ini:

::

    ckan.plugins = [...] dcor_depot
    ckan.storage_path=/data/ckan-HOSTNAME
    ckanext.dcor_depot.depots_path=/data/depots
    ckanext.dcor_depot.users_depot_name=users-HOSTNAME

This plugin stores resources to `/data`:

::

    mkdir -p /data/depots/users-$(hostname)
    chown -R www-data /data/depots/users-$(hostname)


Testing
-------
If CKAN/DCOR is installed and setup for testing, this extension can
be tested with pytest:

::

    pytest ckanext

Testing can also be done via vagrant in a virtualmachine using the
`dcor-test <https://app.vagrantup.com/paulmueller/boxes/dcor-test/>` image.
Make sure that `vagrant` and `virtualbox` are installed and run the
following commands in the root of this repository:

::

    # Setup virtual machine using `Vagrantfile`
    vagrant up
    # Run the tests
    vagrant ssh -- sudo bash /testing/vagrant-run-tests.sh


.. |PyPI Version| image:: https://img.shields.io/pypi/v/ckanext.dcor_depot.svg
   :target: https://pypi.python.org/pypi/ckanext.dcor_depot
.. |Build Status| image:: https://img.shields.io/github/actions/workflow/status/DCOR-dev/ckanext-dcor_depot/check.yml
   :target: https://github.com/DCOR-dev/ckanext-dcor_depot/actions?query=workflow%3AChecks
.. |Coverage Status| image:: https://img.shields.io/codecov/c/github/DCOR-dev/ckanext-dcor_depot
   :target: https://codecov.io/gh/DCOR-dev/ckanext-dcor_depot
