Metadata-Version: 2.1
Name: imagenet-voice
Version: 0.30.0
Summary: A collection of synthetic audio files containing spoken labels from ImageNet
Home-page: https://github.com/CharlesAverill/ImageNet-Voice/
Author: Charles Averill
Author-email: charlesaverill20@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: progressbar2
Requires-Dist: wavio
Requires-Dist: numpy

# ImageNet Voice
This dataset is derived from ImageNet, and contains synthetic recordings of TTS voices saying each ImageNet label 200 times. This is more or less a toy dataset, but can be very useful for projects with little-to-no data.


### Content

This dataset is currently at 28% capacity, and more will be uploaded soon. Each folder, representing a category in ImageNet, contains 200 unique TTS files generated using [ttsddg](https://pypi.org/project/ttsdg/) using the 7 pre-installed voices in OSX. 


### Acknowledgements

[pyttsx3](https://pypi.org/project/pyttsx3/) was integral to creating ttsdg. As much trouble as I had with ffmpeg and other audio libraries, pyttsx3 made it much easier than any other option I could've taken to generate this dataset.

I appreciate any help that anyone would be willing to offer to get this dataset at 100% capacity any sooner. Please contact me!

### Inspiration

I created this dataset to provide training data for a GAN that would take in an audio file of a spoken word and would generate an image of whatever was spoken. I still plan to work on that project, but I'd like to see other people perform similar tasks.


