Metadata-Version: 2.1
Name: ascmhl
Version: 0.5.1
Summary: ASC Media Hash List (ASC MHL)
Home-page: https://github.com/ascmitc/mhl
Author: Patrick Renner, Alexander Sahm
Author-email: opensource@pomfort.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: ~=3.7
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: Click (>=7.0)
Requires-Dist: lxml (>=4.6.2)
Requires-Dist: packaging (>=20.9)
Requires-Dist: pathspec (>=0.8.0)
Requires-Dist: requests (>=2.25.1)
Requires-Dist: xxhash (>=2.0.0)
Requires-Dist: importlib-metadata (>=4.0.1) ; python_version < "3.8"

![ascmhl-build-test](https://github.com/ascmitc/mhl/workflows/ascmhl-build-test/badge.svg) 
[![Documentation Status](https://readthedocs.org/projects/ascmhl/badge/?version=latest)](http://ascmhl.readthedocs.io/?badge=latest)
[![GitHub release](https://img.shields.io/github/tag/ascmitc/mhl.svg)](https://github.com/ascmitc/mhl/releases)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

# ASC Media Hash List (ASC MHL)
> The software in this repository aids the ongoing specification process of the ASC MHL format by the Advanced Data Management Subcommittee of the ASC Motion Imaging Technology Council at the [American Society of Cinematographers](https://theasc.com) (ASC). 
> 
> This software is work in progress and is not intended to be used in production (yet).
> 
> In case you are looking for the current, original specification of MHL, please take a look at [https://mediahashlist.org](https://mediahashlist.org).

Ensuring file integrity when backing up and verifying files during production and post production is of utmost importance. The ASC MHL is used to create a chain of custody by tracking each and every copy made between the media’s initial download on set, all the way through to final archival.

The ASC MHL uses common checksum methods for hashing files and folders, specifies what information is gathered, where the checksum is placed, and documents these hashes together with essential file metadata in an XML format that is human readable.

This repository holds all information about the document format,  a reference implementation, and tools.

## ASC MHL Format Specification

The ASC MHL consists of a 

* definition of naming conventions for the ascmhl folder and the file names of its content
* XML schema for the ASC MHL files
* definition for the chain file

The schema definition can be found in the `./xsd` folder.

## `mhllib` Reference Implementation 

The implementation of a reference library aims to be used in applications and tools dealing with ASC MHL files. The library takes responsibility of dealing with complex use cases of nesting and assembling of information.

The reference library covers

* reading ascmhl folders and their contents
* parsing and writing of ASC MHL XML files
* parsing and writing ASC MHL chain files
* dealing with nested mhl folders

ASC MHL supports the hash formats

* xxHash (64-bit, and latest XXH3 with 64-bit and 128-bit)
* MD5
* SHA1, SHA256
* C4

The source code for `mhllib` can be found in the `./mhl` folder.

## The `ascmhl` Tool

The `ascmhl` tool is a command line tool based on `mhllib` that allows to perform typical activities for the use cases of ASC MHL.

The ASC MHL tool implementation can

* create and extend ASC MHL history for given files and entire file hierarchies in a file system,
* output information about recorded history (summary of history or detailed information about single files), and
* verify files and entire file hierarchies.

Typical scenarios, sample CLI output, and generated ASC MHL files can be found in the [README.md](https://github.com/ascmitc/mhl/blob/master/examples/scenarios/) file in the ``examples/scenarios`` folder of the git repository.

The documentation can also be found at [https://ascmhl.readthedocs.io/](https://ascmhl.readthedocs.io/)


## Getting started

The `mhllib` as well as the `ascmhl` tool require a few dependencies that need to be installed first. 

For installing system dependencies on macOS [Homebrew](https://brew.sh) is recommended.

### System requirements

Make sure you have Python 3 installed:

```shell
$ brew install python3
$ brew postinstall python3
```

### Installing / updating `ascmhl` as a user
Please run the following command to install (or upgrade to) the latest development version of `ascmhl`:
```shell
$ pip3 install --upgrade ascmhl
```

To verify that it has been correctly installed run:
```shell
$ ascmhl --help
```

### Installing `ascmhl` as a developer

Download the source code and install dependencies using a [Virtual Environment](https://docs.python.org/3/tutorial/venv.html):

```shell
$ git clone https://github.com/ascmitc/mhl.git
$ cd mhl
$ python3 -m venv env
$ source env/bin/activate
$ pip3 install --editable .
```

This will install a wrapper script for `ascmhl` to be available on your `$PATH`. Inside the virtualenv, this wrapper 
will be installed as `env/bin/ascmhl`. Regular users might have it in 
`/Library/Frameworks/Python.framework/Versions/3.9/bin/ascmhl` or `/usr/local/bin`. For Windows users, pip will 
create an `ascmhl.exe`.

More information on installing Python commandline tools using `entry_points` can be found here:
* https://setuptools.readthedocs.io/en/latest/userguide/entry_point.html
* https://packaging.python.org/specifications/entry-points/#use-for-scripts
* https://click.palletsprojects.com/en/master/setuptools/

Adding the `-e / --editable` flag installs a linked version to your `site-packages` directory to allow editing the source 
files in your working directory as usual.

## Common Scenarios for `ascmhl`

The `ascmhl` tool can be used to 

* create new MHL generations for given files and folders (command `create`), 
* verify the state of files and folders against the MHL history (command `verify`),
* print differences between the records in the MHL history and given files and folders (command `diff`), and
* print information about an MHL history (command `info`).

Additional utility commands:
* for validating MHL (command `xsd-schema-check`)


### Working with file hierarchies (with completeness check)

The most common commands when using the `ascmhl` in data management scenarios are the `create` and the `check` commands in their default behavior (without subcommand options). 

Sealing a folder / drive with the `create` command traverses through a folder hierarchy, hashes all found files and compares the hashes against the records in the `ascmhl` folder (if present). The command creates a new generation (or an initial one) for the content of an entire folder at the given folder level. It can be used to document all files in a folder or drive with all verified or newly created file hashes of the moment the `create` command ran.

Checking a folder / drive with the `verify` command traverses through the content of a folder, hashes all found files and compares the hashes against the records in the `ascmhl` folder. The `verify` command behaves like a `create` command (both without additional options), but doesn't write new generations. It can be used to verify the content of a received drive with existing ascmhl information.

The `diff` command also traverses through the content of a folder / drive.  The `diff` command thus behaves like the `verify` command, but the `diff` command does not hash any files (e.g. doesn't do file verification) and thus is much faster in execution. It can be used to print all files that are existent in the file system and are not registered in the `ascmhl` folder yet, and all files that are registered in the `ascmhl` folder but that are missing in the file system.


### Working with single files (without completeness check)

In some scenarios working with an entire folder structure is not adequate, and finer control of the processes files is needed. For those scenarios the `create` and `verify` commands are used with additional subcommand options.

Adding single files in a new generation with the `create -sf` ("single files, no completeness check") command allows to add single files to an existing folder structure and create new generations only with records of these files.

Hashing and verifying single files against hash information stored in the `ascmhl` folder with the `verify -sf` ("single files") command allows to "check" single files without the need for a (probably much longer running) check of the integrity of the entire folder structure. 

The `info -sf` ("single file") command prints the known history of a single file with details about all generations.


## Commands of `ascmhl`

_Implementation status 2020-09-08:_

* __Implemented__: `create`, `verify` (partially), `diff`, `info` (partially), `xsd-schema-check`
* __Not implemented yet__: some subcommands for `verify`, `info`

_The commands are also marked below with their current implementation status._


### The `create` command

The `create` command hashes all files given with the different options and creates a new generation in the mhl-history with records for all hashed files. The command compares the hashes against the hashes stored in previous generations if available.

#### `create` default behavior (for file hierarchy, with completeness check)

The `create` command traverses through a folder hierarchy (such as a folder with media files, a camera card, or an entire drive). The command hashes all files (not ignored by the given ignore patterns given with the `-i` or `-ii` options) and the hashes are compared against records in the `ascmhl` folder. It records all hashed files in the new generation. Directory hashes are computed and also recorded in the new generation.

The command detects, prints error, and exits with a non-0 exit code if it finds files that are registered in the `ascmhl` folder but that are missing in the file system. 

Files that are existent in the file system but are not registered in the `ascmhl` folder yet, are registered as new entries in the newly created generation(s).

The `create` command takes the root path of the file hierarchy as the parameter:

```
$ ascmhl create [-i ignore pattern|-ii /path/to/ignore-file.txt] /path/to/folder/
```

It works on folders with or without an `ascmhl` folder within the given folder hierarchy, and creates a new `ascmhl` folder at the given folder level if none is present before.

`ascmhl` folders further down the file hierarchy are read, handled, and referenced in top-level `ascmhl` folders. Existing `ascmhl` folders further down the folder structure will also get a new generation added.

Implementation:

```
read (recursive) mhl history (mhllib)
traverse folder
 	hash each file
	if `ascmhl` folder exists, compare hash (mhllib)
	on error (including mismatching hashes):
		print error
	 	continue
 	add files to new generation if not present yet
compare found files in file system with records in ascmhl folder and \
   warn if files are missing that are recorded in the ascmhl folder
create new generation(s) (mhllib)
```

#### `create` with `-sf` option(s) (for single file(s), no completeness check)

The `create` command with `-sf` option is run with the root path of the file hierarchy as well as one or multiple paths to the individual files to be recorded as the parameters.

This command can be used for instance when adding single files to an already mhl-managed file hierarchy.

```
$ ascmhl create /path/to/root/folder -sf /path/to/single/file1 [-sf /path/to/single/file2 ..]
```

A new generation is created in all `ascmhl` folders below the given root path (e.g. in a nested mhl-history). If no mhl-history is present yet, an error is thrown.

No other files than the ones specified as `-sf` options are handled by this command.

Implementation:

```
read (recursive) mhl-history (mhllib) starting from root path
for each file from input
	check if file is not recorded in `ascmhl` folder yet
	hash file
	add record for file to new generation (mhllib)
		add a new generation if necessary in appropriate `ascmhl` folder (mhllib)
```

### The `verify` command

#### `verify` default behavior (for file hierarchy, with completeness check)

The `verify` command traverses through the content of a folder, hashes all found files  (filtered by the ignore patterns from the `ascmhl` folder) and compares the hashes against the records in the `ascmhl` folder.

The command detects, prints errors, and exits with a non-0 exit code for

* all files that are existent in the file system but are not registered in the `ascmhl` folder yet, and
* all files that are registered in the `ascmhl` folder but that are missing in the file system. 

It is run with the root path of the file hierarchy as the parameter.

```
$ ascmhl verify /path/to/folder/
```

If no `ascmhl` folder is found on the root level, an error is thrown.

`ascmhl` folders further down the file hierarchy are also read, and its recorded hashes are used for verification.

Implementation:

```
error if no mhl folder found on root level
read (recursive) mhl history (mhllib)
traverse folder
	hash each file (filtered by ignore patterns from mhl folder)
	compare hash (mhllib)
	on error (including mismatching hashes):
		print error
	 	continue
compare found files in file system with records in ascmhl folder and \
  warn if files are missing that are recorded in the ascmhl folder
end with exit !=0 if at least one of the files has failed, a file was \
  missing, or new files have been found
```


#### `verify` with `-sf` option (for single files, no completeness check) _[not implemented yet]_

The `verify` command can be used to verify a single or multiple files. It is run with either 

* the path to a single file, or
* a text file with paths to multiple files

as the parameter.

```
$ ascmhl verify -sf /path/to/single/file
$ ascmhl verify -sf -l list/of/files.txt
```

The command looks for an `ascmhl` folder in the folders above the given files. If no mhl-history is present yet, an error is thrown.

If used with the `-l` option, all files in the list must be contained in the same (recursive) mhl-history. 

Implementation:

```
if input is `-l`: create a list of files from input
find mhl-history information in the path above (mhllib)
	error of no `ascmhl` folder is found
read (recursive) mhl-history (mhllib)
for each file from input
	hash each file
	compare hashes (mhllib)
if file is not found in mhl-history, throw error
on error (including mismatching hashes):
	don't break
	print error
	end with exit !=0 if at least one of the files has failed
```


#### `verify` with `-dh` subcommand option (for directory hash)

The `verify` command with the `-dh` subcommand option creates the directory hash by hashing the contained files of the given directory path (filtered by the ignore patterns from the `ascmhl` folder) and compares it with the to-be-expected directory hash calculated from the file hashes (same calculation as the `info` command with the `-dh` subcommand option).


```
$ ascmhl verify -dh /path/to/folder
```

Implementation:

```
find mhl-history information in the path above (mhllib)
	error of no `ascmhl` folder is found
read (recursive) mhl history (mhllib)
calculate to-be-expected directory hash from file hashes
traverse folder
 	hash each file
calculate actual directory hash
compare to-be-expected directory hash with actual directory hash
on error (including mismatching hash):
	print error
	end with exit !=0
```

### The `diff` command

The `diff` command is very similar to the `verify` command in the default behavior, only that it doesn't create hashes and doesn't verify them. It can be used to quickly check if a folder structure has new files that have not been recorded yet, or if files are missing.

The command detects, prints errors, and exits with a non-0 exit code for

* all files that existent in the file system but not registered in the `ascmhl` folder yet, and
* all files that are registered in the `ascmhl` folder but that are missing in the file system. 

It is run with the root path of the file hierarchy as the parameter.

```
$ ascmhl diff /path/to/folder/ 
```

If no `ascmhl` folder is found on the root level, an error is thrown.

`ascmhl` folders are read recursively. 

Implementation:

```
error if no mhl folder found on root level
read (recursive) mhl history (mhllib)
traverse folder
	on missing file:
		print error
	 	continue
compare found files in file system with records in ascmhl folder \
  and warn if files are missing that are recorded in the ascmhl folder
end with exit !=0 if at least one of the files has failed, a file was \
  missing, or new files have been found
```


### The `info` command 

#### `info` default behavior _[not implemented yet]_

The `ascmhl` folder contains well readable XML files, but the number of recorded files, generations, hash entries, verification info and so forth adds up to an amount of information that cannot be quickly understood. The `info` command helps to get a quick overview of the contents of the stored information in an `ascmhl` folder. 

The `info` command prints
* a summary (with the `-s` subcommand option) of the information in an ascmhl folder, such as number of recorded files, and a list of the generations with their creator info, and/or
* a list (with the `-l` option) of all file (and folder) records stored in an ascmhl folder, together with relative file paths, file size, and known file hashes.

It is run with the path to a specific `ascmhl`folder.

```
$ ascmhl info [-s|-l] /path/to/ascmhl/ 
```

Implementation:

```
error if no mhl folder found on root level
read (recursive) mhl history (mhllib)
if summary option:
	print summary
if list option:
	for each file record
		print file info, hashes, etc.
```


#### `info` with the `-sf` subcommand option 

The `info` command with the `-sf` subcommand option outputs information about the full and detailed history information about one file.

```
$ ascmhl info -sf /path/to/file [-sf /path/to/other/file] [-rp /root/path]
```

The command outputs each generation where the file has been handled, including date, hash, and activity (and creator info and absolute path with the `-v` option). The history information is read from the "next" ASC MHL history found in the path, of at the fiven root path (`-rp` option).

Implementation:

```
find mhl-history information in the path above (mhllib)
	error of no `ascmhl` folder is found
print detailed info for file
```


#### `info` with the `-dh` subcommand option _[not implemented yet]_

The `info` command with the `-dh` subcommand option prints
* the directory hash of a folder computed from stored file hashes of an `ascmhl` folder (with the `-dh` option).

The directory hash can be used to quickly verify if the state of a folder structure is still the same compared to the last generation created with a `create` command (manually compare with the hash in the `<root>` tag in the ASC MHL file).

It is run with the path to a specific `ascmhl`folder and the path to the desired folder for the computed directory hash.

```
$ ascmhl info -dh /path/to/ascmhl/ /path/to/sub/folder 
```

Implementation:

```
error if no mhl folder found on root level
read (recursive) mhl history (mhllib)
calculate directory hash from file hashes
print directory hash
```


### The `xsd-schema-check` command

The `xsd-schema-check` command validates a given ASC MHL file against the XML XSD. This command can be used to ensure the creation of syntactically valid ASC MHL files, for example during  implementation of tools creating ASC MHL files.


```
$ ascmhl xsd-schema-check /path/to/ascmhl/XXXXX.mhl
```




