New Project: cl-tar
Written on 2021-09-23 15:10:00 UTC
I have just published the first release of a new
cl-tar. This was supposed
to be my summer side-project, but it ran long as they often do :).
The goal of this project is to provide a Common Lisp interface to tar archives.
It has its foundations
in Nathan Froyd's
archive library, but
has been significantly extended and improved.
There are actually two subprojects under the
cl-tar umbrella. The first
provides the ASDF system and package
tar-file. This project provides
low-level access to physical entries in tar files. As a consequence, two tar
files that extract to the same set of files on your filesystem may have two
very different sets of entries of
tar-file's point of view, depending on the
tar format used (PAX vs ustar vs GNU vs v7).
cl-tar-file project is technically a fork
archive. Except, all non-portable
bits have been removed (such as code to create symlinks), better support for
the various archive variants has been added, better blocking support added (tar
readers/writers are supposed to read/write in some multiple of 512 bytes), cpio
support removed, and a test suite added, along with other miscellaneous fixes
The second sub project
cl-tar itself, which
provides three ASDF systems and packages:
tar system provides a thin wrapper over the
tar-file system that
operates on logical entries in tar files. That is, a regular file is
represented as a single entry, no matter how many entries it is composed of in
the actual bits that get written to the tar file. This system is useful for
analyzing a tar file or creating one using data that is not gotten directly
from the file system.
tar-simple-extract system provides a completely portable interface to
extract a tar archive to your file system. The downside of portability is that
there is information loss. For example, file owners, permissions, and
modification times cannot be set. Additionally, symbolic links cannot be
extracted as symbolic links (but they can be dereferenced).
tar-extract system provides a more lossless extraction capability. The
downside of being lossless is that it is more demanding
(osicat must support your implementation
and OS) and it raises security concerns.
A common security concern is that a malicious tar file can extract a symlink that points to an arbitrary location in your filesystem and then trick you into overwriting files at the location by extracting later files through that symlink. This system tries its best to mitigate that (but makes no guarantees), so long as you use its default settings. If you find a bug that allows an archive to extract to an arbitrary location in your filesystem, I'd appreciate it if you report it!
Also note that
tar-extract currently requires a copy of
osicat that has the
commits associated with this PR
First, close the loop on the osicat PR. It started off as a straightforward PR that just added new functions. However, when I tested on Windows, I realized I couldn't load osicat. So I added a commit that fixed that. There may be some feedback and changes requested on how I actually acomplished that.
tar-extract into CLPM. CLPM currently shells out to a
executable to extract archives. I'd like to use this pure CL solution
instead. Plus, using it with CLPM will act as a stress test by exposing it to
many tar files.
Third, add it to Quicklisp.
tar-extract won't compile without the osicat
changes, so those definitely need to be merged first. Additionally, I want to
have at least some experience with real world tar files before making this
project widely available.
Fourth, add support for creating archives from the filesystem.
Fifth, add the ability to compile to an executable so you could use this in place of GNU or BSD tar :).
If the fourth and fifth steps excite you, I'd love to have your help making them a reality! They're not on my critical path for anything at the moment, so it'll likely be a while before I can get to them.