Metadata-Version: 2.1
Name: oarepo-s3
Version: 1.2.3
Summary: S3 file storage support for Invenio. 
Home-page: https://github.com/oarepo/oarepo-s3
Author: Miroslav Bauer @ CESNET
Author-email: bauer@cesnet.cz
License: MIT
Keywords: oarepo s3
Platform: any
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Development Status :: 4 - Beta
Description-Content-Type: text/markdown
Requires-Dist: invenio-s3 (>=1.0.3)
Provides-Extra: all
Requires-Dist: Sphinx (<3.0.2,>=1.5.1) ; extra == 'all'
Requires-Dist: oarepo[tests] (>=3.3.0) ; extra == 'all'
Requires-Dist: moto[s3] (>=1.3.7) ; extra == 'all'
Requires-Dist: oarepo-records-draft (>=5.0.0a19) ; extra == 'all'
Provides-Extra: docs
Requires-Dist: Sphinx (<3.0.2,>=1.5.1) ; extra == 'docs'
Provides-Extra: tests
Requires-Dist: oarepo[tests] (>=3.3.0) ; extra == 'tests'
Requires-Dist: moto[s3] (>=1.3.7) ; extra == 'tests'
Requires-Dist: oarepo-records-draft (>=5.0.0a19) ; extra == 'tests'

# oarepo-s3

[![image][]][1]
[![image][2]][3]
[![image][4]][5]
[![image][6]][7]

This package built on top of the [invenio-s3](https://github.com/inveniosoftware/invenio-s3)
library offers integration with any AWS S3 REST API compatible object storage backend.
In addition to the invenio-s3, it tries to minimize processing of file requests on the
Invenio server side and uses direct access to S3 storage backend as much as possible
(neither multipart file uploads, nor downloads are processed by Invenio server itself).

## Instalation

To start using this library

1) install the following packages in your project's venv:
    ```bash
    git clone https://github.com/CESNET/s3-client
    cd s3-client
    poetry install
    pip install oarepo-s3
    ```

2) Create an S3 account and bucket on your S3 storage provider of choice.
3) Put the S3 access configuration into your Invenio server config (e.g. `invenio.cfg`):
    ```python
    INVENIO_S3_TENANT=None
    INVENIO_S3_ENDPOINT_URL='https://s3.example.org'
    INVENIO_S3_ACCESS_KEY_ID='your_access_key'
    INVENIO_S3_SECRET_ACCESS_KEY='your_secret_key'
    ```
3) Create Invenio files location targetting the S3 bucket
    ```bash
    invenio files location --default 'default-s3' s3://oarepo-bucket
    ```

## Usage

To use this library as an Invenio Files storage in your projects, put the following
into your Invenio server config:

```python
FILES_REST_STORAGE_FACTORY = 'oarepo_s3.storage.s3_storage_factory'
```

This storage overrides the `save()` method from the `InvenioS3` storage and adds
the possibility for **direct S3 multipart uploads**. Every other functionality
is handled by underlying `InvenioS3` storage library.

### Direct multipart upload

To create a direct multipart upload to S3 backend, one should provide an
instance of `MultipartUpload` instead of a usual `stream` when assigning
a file to a certain record, e.g.:

```python
from oarepo_s3.api import MultipartUpload
files = record.files  # Record instance FilesIterator
mu = MultipartUpload(key='filename',
                     base_uri=files.bucket.location.uri,
                     expires=3600,
                     size=1024*1024*1000,  # total file size
                     part_size=None,
                     # completion resources as registered in blueprints, see below
                     complete_url='/records/1/files/filename/multipart-complete',
                     abort_url='/records/1/files/filename/multipart-abort')

# Assigning a MultipartUpload to the FilesIterator here will
# trigger the multipart upload creation on the S3 storage backend.
files['test'] = mu
```

this will configure the passed in `MultipartUpload` instance with
all the information needed by any uploader client to process and
complete the upload. The multipart upload session configuration
can be found under the `MultipartUpload.session` field.

To be able to complete or abort an ongoing multipart upload, after an
uploader client finishes uploading all the parts to the S3 backend,
one needs to register the provided resources from `oarepo_s3.views` in
the app blueprints:

```python
def multipart_actions(code, files, rest_endpoint, extra, is_draft):
    # rest path -> view
    return {
        'files/<key>/complete-multipart':
            MultipartUploadCompleteResource.as_view(
                MultipartUploadCompleteResource.view_name.format(endpoint=code)
            ),
        'files/<key>/abort-multipart':
            MultipartUploadAbortResource.as_view(
                MultipartUploadAbortResource.view_name.format(endpoint=code)
            )
    }
```

## OARepo Records Draft integration

This library works best together with [oarepo-records-draft](https://github.com/oarepo/oarepo-records-draft)
library. When integrated into draft endpoints one doesn't need to manually
register the completion resources to blueprints. Multipart upload creation
is also handled automatically.

To setup a drafts integration, just run the following:
```bash
pip install oarepo-records-draft oarepo-s3
```

and configure draft endpoints according to the library's README.
Doing so, will auto-register the following file API actions on the draft
endpoints:

### Create multipart upload
```
POST /draft/records/<pid>/files/?multipart=True
{
  "key": "filename.txt",
  "multipart_content_type": "text/plain",
  "size": 1024
}
```

### Complete multipart upload
```
POST /draft/records/<pid>/files/<key>/complete-multipart
{
  "parts": [{"ETag": <uploaded_part_etag>, PartNum: <part_num>},...]
}
```

### Abort multipart upload
```
POST /draft/records/<pid>/files/<key>/abort-multipart
```

## Tasks

This library provides a task that looks up the expired ongoing
file uploads that could no longer be completed and removes them
from the associated record's bucket, to use this task in your
Celery cron schedule, configure it in your Invenio server config like this:

```python
CELERY_BEAT_SCHEDULE = {
    'cleanup_expired_multipart_uploads': {
        'task': 'oarepo_s3.tasks.cleanup_expired_multipart_uploads',
        'schedule': timedelta(minutes=60),
    },
    ...
}
```

  [image]: https://img.shields.io/github/license/oarepo/oarepo-s3.svg
  [1]: https://github.com/oarepo/oarepo-s3/blob/master/LICENSE
  [2]: https://img.shields.io/travis/oarepo/oarepo-s3.svg
  [3]: https://travis-ci.com/oarepo/oarepo-s3
  [4]: https://img.shields.io/coveralls/oarepo/oarepo-s3.svg
  [5]: https://coveralls.io/r/oarepo/oarepo-s3
  [6]: https://img.shields.io/pypi/v/oarepo-s3.svg
  [7]: https://pypi.org/pypi/oarepo-s3


..
    Copyright (C) 2020 CESNET
    oarepo-s3 is free software; you can redistribute it and/or modify it
    under the terms of the MIT License; see LICENSE file for more details.

Changes
=======

Version 1.0.3 (released 2020-04-25)

- Allow for dynamic part size for multipart uploads.
- Adds new configuration variables to define default part size and maximum
  number of parts.

Version 1.0.2 (released 2020-02-17)

- Fixes typos on configuration variables and cached properties.
- Adds AWS region name and signature version to configuration.

Version 1.0.1 (released 2019-01-23)

- New configuration variable for URL expiration.
- Enhances file serving.
- Unpins Boto3 library.
- Fixes test suit configuration.

Version 1.0.0 (released 2018-09-19)

- Initial public release.


