Metadata-Version: 2.3
Name: persidict
Version: 0.36.3
Summary: Simple persistent key-value store for Python. Values are stored as files on a disk or as S3 objects on AWS cloud.
Keywords: persistence,dicts,distributed,parallel
Author: Vlad (Volodymyr) Pavlov
Author-email: Vlad (Volodymyr) Pavlov <vlpavlov@ieee.org>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: parameterizable
Requires-Dist: lz4
Requires-Dist: joblib
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: jsonpickle
Requires-Dist: deepdiff
Requires-Dist: boto3 ; extra == 'aws'
Requires-Dist: boto3 ; extra == 'dev'
Requires-Dist: moto ; extra == 'dev'
Requires-Dist: pytest ; extra == 'dev'
Requires-Python: >=3.10
Project-URL: Homepage, https://github.com/pythagoras-dev/persidict
Provides-Extra: aws
Provides-Extra: dev
Description-Content-Type: text/markdown

# persidict

Simple persistent dictionaries for distributed applications in Python.

## 1. What Is It?

`persidict` offers a simple persistent key-value store for Python. 
It saves the content of the dictionary in a folder on a disk 
or in an S3 bucket on AWS. Each value is stored as a separate file / S3 object.
Only text strings or sequences of strings are allowed as keys.

Unlike other persistent dictionaries (e.g. Python's native `shelve`), 
`persidict` is designed for use in highly **distributed environments**, 
where multiple instances of a program run concurrently across many machines,
accessing the same dictionary via a shared storage.

## 2. Features
* **Persistent Storage**: Save dictionaries to the local filesystem 
(`FileDirDict`) or AWS S3 (`S3Dict`).
* **Standard Dictionary API**: Use persidict objects like standard 
Python dictionaries with methods like `__getitem__`, `__setitem__`, 
`__delitem__`, `keys`, `values`, `items`, etc.
* **Distributed Computing Ready**: Designed for concurrent access 
in distributed environments.
* **Flexible Serialization**: Store values as pickles (`pkl`), 
JSON (`json`), or plain text.
* **Type Safety**: Optionally enforce that all values in a dictionary are 
instances of a specific class.
* **Advanced Functionality**: Includes features like write-once dictionaries, 
timestamping of entries, and tools for handling file-system-safe keys.

## 3. Usage

### 3.1 Storing Data on a Local Disk

The `FileDirDict` class saves your dictionary to a local folder. 
Each key-value pair is stored as a separate file.

```python
from persidict import FileDirDict

# Create a dictionary that will be stored in the "my_app_data" folder.
# The folder will be created automatically if it doesn't exist.
app_settings = FileDirDict(base_dir="my_app_data")

# Add and update items just like a regular dictionary.
app_settings["username"] = "alex"
app_settings["theme"] = "dark"
app_settings["notifications_enabled"] = True

# Values can be any pickleable Python object.
app_settings["recent_projects"] = ["project_a", "project_b"]

print(f"Current theme is: {app_settings['theme']}")
# >>> Current theme is: dark

# The data persists!
# If you run the script again or create a new dictionary object
# pointing to the same folder, the data will be there.
reloaded_settings = FileDirDict(base_dir="my_app_data")

print(f"Number of settings: {len(reloaded_settings)}")
# >>> Number of settings: 4

print("username" in reloaded_settings)
# >>> True
```
### 3.2 Storing Data in the Cloud (AWS S3)

For distributed applications, you can use **`S3Dict`** to store data in 
an AWS S3 bucket. The usage is identical, allowing you to switch 
between local and cloud storage with minimal code changes.

```python
from persidict import S3Dict

# Create a dictionary that will be stored in an S3 bucket.
# The bucket will be created if it doesn't exist.
cloud_config = S3Dict(bucket_name="my-app-config-bucket")

# Use it just like a FileDirDict.
cloud_config["api_key"] = "ABC-123-XYZ"
cloud_config["timeout_seconds"] = 30

print(f"API Key: {cloud_config['api_key']}")
# >>> API Key: ABC-123-XYZ
```

## 4. Glossary

### 4.1 Core Concepts

* **`PersiDict`**: The abstract base class that defines the common interface 
for all persistent dictionaries in the package. It's the foundation 
upon which everything else is built.
* **`PersiDictKey`**: A type hint that specifies what can be used
as a key in any `PersiDict`. It can be a `SafeStrTuple`, 
a single string, or a sequence of strings.
* **`SafeStrTuple`**: The core data structure for keys. It's an immutable, 
flat tuple of non-empty, URL/filename-safe strings, ensuring that 
keys are consistent and safe for various storage backends.

### 4.2 Main Implementations

* **`FileDirDict`**: A primary, concrete implementation of `PersiDict` 
that stores each key-value pair as a separate file in a local directory.
* **`S3Dict`**: The other primary implementation of `PersiDict`, 
which stores each key-value pair as an object in an AWS S3 bucket, 
suitable for distributed environments.

### 4.3 Key Parameters

* **`file_type`**: A key parameter for `FileDirDict` and `S3Dict` that 
determines the serialization format for values. 
Common options are `"pkl"` (pickle) and `"json"`. 
Any other value is treated as plain text for string storage.
* **`base_class_for_values`**: An optional parameter for any `PersiDict` 
that enforces type checking on all stored values, ensuring they are 
instances of a specific class.
* **`immutable_items`**: A boolean parameter that can make a `PersiDict` 
"write-once," preventing any modification or deletion of existing items.
* **`digest_len`**: An integer that specifies the length of a hash suffix 
added to key components to prevent collisions on case-insensitive file systems.
* **`base_dir`**: A string specifying the directory path where a `FileDirDict`
stores its files. For `S3Dict`, this directory is used to cache files locally.
* **`bucket_name`**: A string specifying the name of the S3 bucket where
an `S3Dict` stores its objects.
* **`region`**: An optional string specifying the AWS region for the S3 bucket.

### 4.4 Advanced Classes

* **`WriteOnceDict`**: A wrapper that enforces write-once behavior 
on any `PersiDict`, ignoring subsequent writes to the same key. 
It also allows for random consistency checks to ensure subsequent 
writes to the same key always match the original value.
* **`OverlappingMultiDict`**: An advanced container that holds 
multiple `PersiDict` instances sharing the same storage 
but with different `file_type`s.

### 4.5 Special "Joker" Values

* **`Joker`**: The base class for special command-like values that 
can be assigned to a key to trigger an action instead of storing a value.
* **`KEEP_CURRENT`**: A "joker" value that, when assigned to a key, 
ensures the existing value is not changed.
* **`DELETE_CURRENT`**: A "joker" value that deletes the key-value pair 
from the dictionary when assigned to a key.

## 5. Comparison With Python Built-in Dictionaries

### 5.1 Similarities 

`PersiDict` subclasses can be used like regular Python dictionaries, supporting: 

* Get, set, and delete operations with square brackets (`[]`).
* Iteration over keys, values, and items.
* Membership testing with `in`.
* Length checking with `len()`.
* Standard methods like `keys()`, `values()`, `items()`, `get()`, `clear()`
, `setdefault()`, and `update()`.

### 5.2 Differences 

* **Persistence**: Data is saved between program executions.
* **Keys**: Keys must be strings or sequences of URL/filename-safe strings.
* **Values**: Values must be pickleable. 
You can also constrain values to a specific class.
* **Order**: Insertion order is not preserved.
* **Additional Methods**: `PersiDict` provides extra methods not in the standard 
dict API, such as `timestamp()`, `random_key()`, `newest_keys()`, `subdicts()`
, `delete_if_exists()`, `get_params()` and more.
* **Special Values**: Use `KEEP_CURRENT` to avoid updating a value 
and `DELETE_CURRENT` to delete a value during an assignment.

## 6. Installation

The source code is hosted on GitHub at:
[https://github.com/pythagoras-dev/persidict](https://github.com/pythagoras-dev/persidict) 

Binary installers for the latest released version are available at the Python package index at:
[https://pypi.org/project/persidict](https://pypi.org/project/persidict)

You can install `persidict` using `pip` or your favorite package manager:

```bash
pip install persidict
```

To include the AWS S3 extra dependencies:

```bash
pip install persidict[aws]
```

For development, including test dependencies:

```bash
pip install persidict[dev]
```

## 7. Dependencies

`persidict` has the following core dependencies:

* [parameterizable](https://pypi.org/project/parameterizable/)
* [jsonpickle](https://jsonpickle.github.io)
* [joblib](https://joblib.readthedocs.io)
* [lz4](https://python-lz4.readthedocs.io)
* [pandas](https://pandas.pydata.org)
* [numpy](https://numpy.org)
* [deepdiff](https://zepworks.com/deepdiff)

For AWS S3 support (`S3Dict`), you will also need:
* [boto3](https://boto3.readthedocs.io)

For development and testing, the following packages are used:
* [pytest](https://pytest.org)
* [moto](http://getmoto.org)

## 8. Contributing
Contributions are welcome! Please see the contributing [guide](https://github.com/pythagoras-dev/persidict?tab=contributing-ov-file) for more details 
on how to get started, run tests, and submit pull requests.

## 9. License
`persidict` is licensed under the MIT License. See the [LICENSE](https://github.com/pythagoras-dev/persidict?tab=MIT-1-ov-file) file for more details.

## 10. Key Contacts

* [Vlad (Volodymyr) Pavlov](https://www.linkedin.com/in/vlpavlov/)