Metadata-Version: 2.1
Name: headfake
Version: 1.1.1
Summary: Health data faker
Home-page: https://github.com/rcfgroup/headfake
Author: Rob Free
Author-email: rcfree@pm.me
License: MIT
Keywords: fake,data,generation,faker,pydbgen
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: attrs
Requires-Dist: faker
Requires-Dist: pandas
Requires-Dist: pyyaml
Requires-Dist: scipy
Provides-Extra: all
Provides-Extra: dev
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: pytest-mock ; extra == 'dev'
Requires-Dist: mkdocs ; extra == 'dev'
Requires-Dist: mkapi ; extra == 'dev'
Requires-Dist: twine ; extra == 'dev'
Requires-Dist: wheel ; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs ; extra == 'docs'
Requires-Dist: mkapi ; extra == 'docs'
Provides-Extra: tests
Requires-Dist: pytest ; extra == 'tests'
Requires-Dist: pytest-cov ; extra == 'tests'
Requires-Dist: pytest-mock ; extra == 'tests'

![](https://github.com/rcfgroup/clinicle/workflows/main/badge.svg) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4890383.svg)](https://doi.org/10.5281/zenodo.4890383)
# Headfake

## What is Headfake?
**Hea**lth **d**ata **fake**r is a Python-based package which allows the user to use a declarative approach to create fake or test data sets.
using either Python code, or a YAML or JSON-based template file.

The package can be embedded directly into Python scripts, or it can be used through a command-line script.

It takes ideas from other declarative packages (e.g. pydbgen), but also adds support for a number of additional features
including statistically distributed random values; dependent fields and custom fields as well as approaches for
transforming generated fields before/after the generation process.

## How do I install and use Headfake?
To get started quickly, you can use pip to install it:
```bash
pip install headfake
```

Then test it out using one of the example YAML templates:
```bash
headfake examples/patients.yaml --no-rows=100
```

You should get 100 rows of generated data.

For further information on using Headfake head over to [Tutorials](https://rcfgroup.github.io/headfake/tutorials/tutorial1) or the [Usage](https://rcfgroup.github.io/headfake/usage) page.

## Why would I use Headfake?
Headfake makes it simple and straightforward to generate fake or test data. It has a number of features which make this easier:

* Support for shareable template-based config or direct Python implementation to setup and perform the data generation.

* Embeddable data generation into projects (either using a YAML or JSON config or using Python data structures/classes).

* Generation of names and contact details through use of the Python package Faker.

* Randomised names can be output based on a gender field.

* More realistic simulated data uses statistical distributions to create date of birth and also probability-based option values. Other approaches to simulate real data are also being investigated.

* Clinical data supported includes random NHS numbers and deceased flags/date of death based on age-based odds of death.

* Dependent fields (e.g. one field's values are dependent on the values from one or more)

* Operation fields (e.g. combine generated values using specific operations such as add or subtract)

* Field data can be looked up from another file using a key field, allowing re-use of patient details in a different field set.

* A selection of fields to handle generation of different types of data.

* Ability to create and use custom fields to generate your own data types and values

* Support for transformers which pre or post-process data once it's been generated


## Is Headfake being actively maintained?
Yes, we are using Headfake is our own projects and as result are keep it maintained and adding new features when we need them.

## Is Headfake suitable for my project?
The library has been released under an MIT license so can be embedded into your own tools with minimal restrictions on use.

## If I use Headfake to generate data in my research project which source should I cite?
We are working on a journal paper, for now please cite the [Zenodo record](https://doi.org/10.5281/zenodo.4890383).

## Where can I get more information?
The documentation for the package can be found on the [documentation site](https://rcfgroup.github.io/headfake)

