Archives from memory- libarchive

This blog post is about python wrapper around libarchive and how to use it to generate archive from memory.

Libarchive & python-libarchive-c

If you happen to learn more about how to create archives in various formats like tar, iso or zip I bet you heard about libarchive. It is widely used archive library written in C.

To use it within python you can choose from a few libraries but one that is currently maintained is called python-libarchive-c. When in my work I was to implement the feature of adding entries to archive from memory I decided to use existing module and give something back to a community in form of an open source contribution.

Add entry from memory

To make such a feature I have to reread carefully code examples in libarchive c itself. I also get familiar with few archive formats and their limitations. But enough talking lets jump to the code:

import requests
import libarchive

def create_archive_from_memory_file():
    response = requests.get('link', stream=True)

    with libarchive.file_writer('archive.zip', 'zip') as archive:
        archive.add_file_from_memory(
            entry_path='filename',
            entry_size=int(response.headers['Content-Length']),
            entry_data=response.iter_content(chunk_size=1024)
        )

if __name__ == '__main__':
    create_archive_from_memory_file()

My changes in code have not been released so make sure that you install python-libarchive-c from github like this (to run this script you also need requests library):

$ pip install git+https://github.com/Changaco/python-libarchive-c

In this snippet, I use request feature that doesn’t require loading the whole content of the response to memory but instead I add the argument: stream=True and then I use response.iter_content(chunk_size=1024). Rest of the code is calling add_file_from_memory with a path (entry_path) and size of the entry in an archive (entry_size).

Under the hood, python-libarchive-c is using c_types with ffi to call libarchive functions. At first, it setup path to entry then sets its size, filetype and permission which file will be saved in the archive. Then write the header and start iterating through the entry_data by chunks and write them. At the end, header is set and archive is ready for user.

To see it in action have snippet above as example.py and run this script:

$ python example.py
$ ls -la
-rw-r--r--. 1 kzuraw kzuraw 11M 09-24 13:04 archive.zip
-rw-rw-r--. 1 kzuraw kzuraw 511 09-24 12:59 example.py

That’s all for this week.

Special thanks to Kasia for being editor for this post. Thank you.