Metadata-Version: 2.4
Name: python-idzip
Version: 0.3.10
Summary: DictZip - Random Access gzip files
Author-email: Rik Faith <john.doe@example.com>
Maintainer-email: Dan Bauman <dan@bauman.space>
License: MIT
Project-URL: Homepage, https://github.com/bauman/python-idzip
Project-URL: Repository, https://github.com/bauman/python-idzip
Project-URL: Issues, https://github.com/bauman/python-idzip/issues
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

python-idzip
============

Seekable, gzip compatible, compression format

Gzip allows to store extra fields in the gzip header. Idzip stores offsets for the efficient seeking there.


Install
===============
python setup.py install

or

[python-idzip RHEL6 signed RPM] (http://pkgs.bauman.in/repoview/python-idzip.html)

Acknowledgement
===============

based on https://code.google.com/p/idzip/

The file format was designed by Rik Faith for dictzip. Idzip just uses multiple gzip members to have no file size limit.

Idzip means Improved Dictzip.



added a Writer class

Sizing
==========
downloaded

http://textfiles.com/stories/bureau.txt


cat several copies together up to 20GB > input.txt

gzfile generated using standard gzip

dzfile generated using this library

```
    total 50172612
    -rw-rw-r--. 1 dan dan 21313751280 May 10 15:58 input.txt
    -rw-rw-r--. 1 dan dan  8576570661 May 10 17:21 dzfile.txt.dz
    -rw-rw-r--. 1 dan dan  8076548622 May 10 16:28 gzfile.txt.gz
```


Size is almost the same as standard gzip


Seek Timing
==========
``` python
    seekpos = 21313751280 - 15
    from time import time

    start=time()
    original = open("/home/dan/ziptest/input.txt")
    original.seek(seekpos)
    original.close()
    print "Raw Seek to end", time() - start, "seconds"


    import gzip
    start=time()
    verify = gzip.open("/home/dan/ziptest/gzfile.txt.gz", "rb")
    verify.seek(seekpos)
    verify.close()
    print "Standard GZIP Seek to end", time() - start, "seconds"


    import idzip
    start=time()
    verify = idzip.open("/home/dan/ziptest/input.txt.dz")
    verify.seek(seekpos)
    verify.close()
    print "idzip Seek to end", time() - start, "seconds"

```

```
    Raw Seek to end 0.000866889953613 seconds
    Standard GZIP Seek to end 255.133864164 seconds
    idzip Seek to end 0.0381989479065 seconds
```




Stream Writer
===========

class allows streaming.

``` python
    from idzip import Writer

    outfile = "/home/dan/ziptest/input1.txt.dz"
    writer = Writer(outfile, sync_size=1048576*100)
    infile = open("/home/dan/ziptest/input.txt", "rb")
    while True:
        data = infile.read(1048576+1)
        if not data:
            break
        writer.write(data)
    writer.close()
    infile.close()
```

Alternatively, you can open an `IdzipFile` in write mode and accomplish the
same task:

```python
    import idzip

    infile =

    infile = open("/home/dan/ziptest/input.txt", "rb")
    writer = idzip.IdzipFile("/home/dan/ziptest/input1.txt.dz", "wb", sync_size=1048576*100)

    with infile, writer:
        while True:
            data = infile.read(1048576 + 1)
            if not data:
                break
            writer.write(data)

```
