New Project: cl-tar
Tagged as blog, common-lisp
Written on 2021-09-23 15:10:00 UTC
I have just published the first release of a new
project: cl-tar
. This was supposed
to be my summer side-project, but it ran long as they often do :).
The goal of this project is to provide a Common Lisp interface to tar archives.
It has its foundations
in Nathan Froyd's archive
library, but
has been significantly extended and improved.
cl-tar-file
There are actually two subprojects under the cl-tar
umbrella. The first
is cl-tar-file
, which
provides the ASDF system and package tar-file
. This project provides
low-level access to physical entries in tar files. As a consequence, two tar
files that extract to the same set of files on your filesystem may have two
very different sets of entries of tar-file
's point of view, depending on the
tar format used (PAX vs ustar vs GNU vs v7).
The cl-tar-file
project is technically a fork
of archive
. Except, all non-portable
bits have been removed (such as code to create symlinks), better support for
the various archive variants has been added, better blocking support added (tar
readers/writers are supposed to read/write in some multiple of 512 bytes), cpio
support removed, and a test suite added, along with other miscellaneous fixes
and improvements.
cl-tar
The second sub project
is cl-tar
itself, which
provides three ASDF systems and packages: tar
, tar-simple-extract
, and
tar-extract
.
The tar
system provides a thin wrapper over the tar-file
system that
operates on logical entries in tar files. That is, a regular file is
represented as a single entry, no matter how many entries it is composed of in
the actual bits that get written to the tar file. This system is useful for
analyzing a tar file or creating one using data that is not gotten directly
from the file system.
The tar-simple-extract
system provides a completely portable interface to
extract a tar archive to your file system. The downside of portability is that
there is information loss. For example, file owners, permissions, and
modification times cannot be set. Additionally, symbolic links cannot be
extracted as symbolic links (but they can be dereferenced).
The tar-extract
system provides a more lossless extraction capability. The
downside of being lossless is that it is more demanding
(osicat must support your implementation
and OS) and it raises security concerns.
A common security concern is that a malicious tar file can extract a symlink that points to an arbitrary location in your filesystem and then trick you into overwriting files at the location by extracting later files through that symlink. This system tries its best to mitigate that (but makes no guarantees), so long as you use its default settings. If you find a bug that allows an archive to extract to an arbitrary location in your filesystem, I'd appreciate it if you report it!
Also note that tar-extract
currently requires a copy of osicat
that has the
commits associated with this PR
applied.
next steps
First, close the loop on the osicat PR. It started off as a straightforward PR that just added new functions. However, when I tested on Windows, I realized I couldn't load osicat. So I added a commit that fixed that. There may be some feedback and changes requested on how I actually acomplished that.
Second, integrate tar-extract
into CLPM. CLPM currently shells out to a tar
executable to extract archives. I'd like to use this pure CL solution
instead. Plus, using it with CLPM will act as a stress test by exposing it to
many tar files.
Third, add it to Quicklisp. tar-extract
won't compile without the osicat
changes, so those definitely need to be merged first. Additionally, I want to
have at least some experience with real world tar files before making this
project widely available.
Fourth, add support for creating archives from the filesystem.
Fifth, add the ability to compile to an executable so you could use this in place of GNU or BSD tar :).
If the fourth and fifth steps excite you, I'd love to have your help making them a reality! They're not on my critical path for anything at the moment, so it'll likely be a while before I can get to them.