Artifact [5f2735dc00]
Not logged in

Artifact 5f2735dc004f8e10d18f6fc30bb360462cda24fb:

Wiki page [ZIP virtual file system] by chw 2020-06-26 17:20:02.
D 2020-06-26T17:20:02.170
L ZIP\svirtual\sfile\ssystem
P 540b75d5cb6f0585472f95eb9ed4ac894ec44774
U chw
W 12811
<h2>ZIPFS</h2>

[AndroWish] comes with a special <a href="http://en.wikipedia.org/wiki/Zip_%28file_format%29">ZIP</a> virtual file system which uses mmap(2) to read-only map a ZIP file (in this case AndroWish's APK, i.e. its own installation package) into the process address space to speed up startup time and subsequent read accesses. While this file system was designed primarily for AndroWish it can be used on other platforms, too. Namely, [undroidwish] uses it on Windows and Linux to mount an archive of Tcl and native extensions which is appended to the executable portion of its binary. It is implemented in the files <tt><a href="/index.html/artifact/02f3c2065ff5fa3e">zipfs.c</a></tt> and <tt><a href="/index.html/artifact/87abbb588a4070ce">zipfs.h</a></tt> in AndroWish's <tt>.../jni/tcl/generic</tt> folder and enabled in the Tcl core by the presence of the C preprocessor macro <tt>ZIPFS_IN_TCL</tt>.

<h3>Low-level C interface</h3>

<tt>Tclzipfs_Init(Tcl_Interp *interp)</tt>

    Performs one-time initialization of the file system and registers it process wide. Additionally, a package named <i>zipfs</i> is provided and supplemental Tcl commands are created in the given interpreter.

<tt>Tclzipfs_Mount(Tcl_Interp *<i>interp</i>, const char *<i>zipname</i>, const char *<i>mntpt</i>, const char *<i>passwd</i>)</tt>

    Mounts the ZIP archive file <tt><i>zipname</i></tt> on the mount point <tt><i>mntpt</i></tt> using the optional ZIP password <tt><i>passwd</i></tt>. Errors during that process are reported in the interpreter <tt><i>interp</i></tt>. If <tt><i>zipname</i></tt> is a NULL pointer, information on all currently mounted ZIP file systems is written into <tt><i>interp</i></tt>'s result as a sequence of mount points and ZIP file names.

<tt>Tclzipfs_MountBuffer(Tcl_Interp *<i>interp</i>, const char *<i>mntpt</i>, unsigned char *<i>data</i>, int <i>length</i>, int <i>copy</i>)</tt>

    Mounts the ZIP archive contained in the memory buffer described by <tt><i>data</i></tt> and <tt><i>length</i></tt>  on the mount point <tt><i>mntpt</i></tt>. Depending on <tt><i>copy</i></tt> a private copy of this memory buffer is made and used for the mount operation. Errors during that process are reported in the interpreter <tt><i>interp</i></tt>. If the mount operation succeeds, a string of the form "<tt>memory_&lt;size&gt;_&lt;id&gt;</tt>" is left in <tt><i>interp</i></tt>’s result identifying the archive from the memory buffer. This information is useful as <tt></i>zipname</i></tt> parameter in a later unmount operation. If <tt><i>mntpt</i></tt> is a NULL pointer, information on all currently mounted ZIP file systems is written into <tt><i>interp</i></tt>'s result as a sequence of mount points and ZIP file names.

<tt>Tclzipfs_Unmount(Tcl_Interp *<i>interp</i>, const char *<i>zipname</i>)</tt>

    Undoes the effect of <tt>Tclzipfs_Mount()</tt>, i.e. unmounts the mounted ZIP archive file <tt><i>zipname</i></tt>. Errors are reported in the interpreter <tt><i>interp</i></tt>.

<h3>Tcl commands</h3>

The <tt>zipfs</tt> package provides Tcl with the ability to mount the contents of a ZIP file as a virtual file system.

<tt>zipfs::exists <i>filename</i></tt>

    Return 1 if the given <tt><i>filename</i></tt> exists in the mounted zipfs and 0 if it does not.

<tt>zipfs::find <i>dir</i></tt>

    Recursively lists files including and below the directory <tt><i>dir</i></tt>. The result list consists of relative path names starting from the given directory. This command is also used by the <tt>zipfs::mkzip</tt> and <tt>zipfs::mkimg</tt> commands.

<tt>zipfs::info <i>file</i></tt>

    Return information about the given <tt><i>file</i></tt> in the mounted zipfs. The information consists of (1) the name of the ZIP archive file that contains the file, (2) the size of the file after decompression, (3) the compressed size of the file, and (4) the offset of the compressed data in the ZIP archive file.

    Note: querying the mount point gives the start of ZIP data offset in (4), which can be used to truncate the ZIP info off an executable.

    Note: the file of a mounted ZIP archive appears as directory but can  be opened and read like a regular file if the mount process detected a non archive area in front of the  ZIP archive,  e.g. when the ZIP archive was appended to an executable file. In this case that area can be read using the Tcl <tt>open</tt> and <tt>read</tt> commands but <tt>file copy</tt> treats the mounted archive as a directory.

<tt>zipfs::list ?<i>-glob</i>|<i>-regexp</i>? ?<i>pattern</i>?</tt>

    Lists files of any or all of the mounted ZIP archives. If <tt><i>pattern</i></tt> is omitted all files are listed. Otherwise <tt><i>pattern</i></tt> is interpreted as a glob or regexp pattern and used to list only files matching this pattern.

<tt>zipfs::lmkimg <i>outfile inlist</i> ?<i>password</i>? ?<i>infile</i>?</tt>

    Like <tt>zipfs::mkimg</tt> but instead of an input directory <tt><i>inlist</i></tt> must be a list where the odd elements are the original input file names as copied into the archive and the even elements their respective names within the archive.

<tt>zipfs::lmkzip <i>outfile inlist</i> ?<i>password</i>?</tt>

    Like <tt>zipfs::mkzip</tt> but instead of an input directory <tt><i>inlist</i></tt> must be a list where the odd elements are the original input file names as copied into the archive and the even elements their respective names within the archive.

<tt>zipfs::mkimg <i>outfile indir</i> ?<i>strip</i>? ?<i>password</i>? ?<i>infile</i>?</tt>

    Create an image (potentially a new executable file) similar to <tt>zipfs::mkzip</tt>. If the <tt><i>infile</i></tt> parameter is specified, this file is prepended in front of the ZIP archive, otherwise the file returned by <tt>Tcl_NameOfExecutable(3)</tt> (i.e. the executable file of the running process) is used. If the <tt><i>password</i></tt> parameter is not empty, an obfuscated version of that password is placed between the image and ZIP chunks of the output file and the contents of the ZIP chunk are protected with that password.

    Caution: highly  experimental, not usable on Android, only partially tested on Linux and Windows.

<tt>zipfs::mkkey <i>password</i></tt>

    For the clear text <tt><i>password</i></tt> argument an obfuscated string  version is returned with the same format used in the <tt>zipfs::mkimg</tt> command.

<tt>zipfs::mkzip <i>outfile indir</i> ?<i>strip</i>? ?<i>password</i>?</tt>

    Creates a ZIP archive file named <tt><i>outfile</i></tt> from the contents of the input directory <tt><i>indir</i></tt> (contained regular files only) with optional ZIP password <tt><i>password</i></tt>. While processing the files below <tt><i>indir</i></tt> the optional prefix given in <tt><i>strip</i></tt> is stripped off the beginning of the respective file name.

    Caution: the choice of the <tt><i>indir</i></tt> parameter (less the optional <tt><i>strip</i></tt> prefix) determines the later root name of the archive's content.

<tt>zipfs::mount ?<i>zipfile</i> ?<i>mountpoint</i>? ?<i>password</i>?</tt>

<tt>zipfs::mount -file <i>zipfile mountpoint</i> ?<i>password</i>?</tt>

<tt>zipfs::mount -- <i>zipfile mountpoint</i> ?<i>password</i>?</tt>

    This command mounts a ZIP archive file as a VFS. After this command  executes, files contained in <tt><i>zipfile</i></tt> will appear to Tcl to be regular files at the mount point.

    In the first command form, with no <tt><i>mountpoint</i></tt>, returns the mount point for <tt><i>zipfile</i></tt>. With no <tt><i>zipfile</i></tt>, return all zipfile/mount point pairs. If <tt><i>mountpoint</i></tt> is specified as an empty string, the mount point will be the current directory. If <tt><i>password</i></tt> is specified, files from <tt><i>zipfile</i></tt> are decrypted using this password when read.

<tt>zipfs::mount -data <i>bytearray mountpoint</i></tt>

    The data in <tt><i>bytearray</i></tt> must represent a ZIP archive which gets mounted on <tt><i>mountpoint</i></tt>. If the mount operation succeeds, the result is a string of the form "<tt>memory_&lt;size&gt;_&lt;id&gt;</tt>" which can later be used as <tt><i>zipfile</i></tt> parameter in an unmount operation.

<tt>zipfs::mount -chan <i>channelId mountpoint</i></tt>

    A ZIP archive is read from channel <tt><i>channelId</i></tt> and mounted on <tt><i>mountpoint</i></tt>. If the mount operation succeeds, the result is a string of the form "<tt>memory_&lt;size&gt;_&lt;id&gt;</tt>" which can later be used as <tt><i>zipfile</i></tt> parameter in an unmount operation.

<tt>zipfs::unmount <i>zipfile<i></tt>

    Unmounts the mounted ZIP archive file <tt><i>zipfile</i></tt>.

<tt>zipfs::unwrap <i>?filename?</i></tt>

    If <tt><i>filename</i></tt> is the root of a mounted ZIP archive its content is unpacked to a local directory named <tt>filename.vfs</tt>. This directory must not exists prior to the call. Otherwise, <tt><i>filename</i></tt> is temporarily mounted before the unpack operation takes place and unmounted afterwards. If <tt><i>filename</i></tt> is omitted the result of <tt>info nameofexecutable</tt> is used instead, i.e. the main ZIP archive of the running process is unpacked.

The commands described above are available as subcommands in the <tt>zipfs</tt>
ensemble, i.e. <tt>zipfs list</tt> is equivalent to <tt>zipfs::list</tt>.

<h3>zipfs as Tcl (and Tk) bootstrap file system</h3>

On the Android platform zipfs is used to boot Tcl/Tk from the APK by early mounting the APK file on the file system root as seen by Tcl. Since nearly all relevant files within the APK are below the <tt>assets</tt> folder, this lets Tcl see the directory <tt>/assets</tt> with its library directories, e.g. the <tt>/assets/tcl8.6</tt> directory with Tcl's library modules, encoding tables etc. That relationship to <tt>/assets/tcl8.6</tt> is hard coded into the Tcl shared library and based on it all other packaged library directories can be found during Tcl initialization.

For standalone apps a similar approach is chosen by hard coding the file <tt>/assets/app/main.tcl</tt> as the file to be sourced (if present) right after Tcl's initialization. This allows for packaging Tcl based apps as an APK, see the description in [AndroWish SDK] for instructions.

On other platforms (currently tested Linux and Windows) the initial mount of an embedded ZIP file system is done on the executable itself, e.g. if <tt>/home/john/awish</tt> is the Tcl/Tk binary with an included ZIP file system, the Tcl library directory of the file system when mounted becomes <tt>/home/john/awish/tcl8.6</tt>. Similarly, built in application code will be started from the file <tt>/home/john/awish/app/main.tcl</tt> if present. Additionally, the contents of the optional file <tt>/home/john/awish/app/cmdline</tt> are appended to the command line before Tk is initialized and control is transferred to the <tt>main.tcl</tt> script. This is useful to setup certain aspects of SDL, e.g. to start in full screen mode with or without changed display resolution (see description of SDL startup options in [Beyond AndroWish]). Another hook is <tt>/home/john/awish/app/icon.bmp</tt> which (if present) should be a Windows BMP 24 bit RGB bitmap file used as the icon for the SDL root window.

On Windows platforms the drive letter of the base executable is prepended to the respective path names. For the example above this means: <tt>C:\home\john\awish.exe</tt> is the binary, <tt>C:/home/john/awish.exe/tcl8.6</tt> becomes the Tcl library directory, <tt>C:/home/john/awish.exe/app/main.tcl</tt> is the optional application script, and so on.

<h3>Some delicate implementation details</h3>

For loading binary Tcl extensions (shared libraries) on certain platforms (Linux and FreeBSD) special handling is tried to be carried out:

  *  on Linux, the <b>memfd_create</b> system call is used (if available) to make a memory backed file with the payload in the <b>/dev/shm</b> namespace which finally is <b>dlopen</b>'ed to provide the shared library.

  *  on FreeBSD, there's <b>fdlopen</b> which allows a file descriptor to be treated as a shared library. Very similar to the Linux approach, the file descriptor refers to a <b>/dev/shm</b> memory backed file which is primed with the contents for the shared library from the ZIP archive.

For improving <b>glob</b> operations the ZIP virtual file system uses two hash based data structures: <b>ZipEntry</b> for regular files and <b>ZipDirEntry</b> for directories which additionally contains a hash table to accelerate lookups in this directory. For typical searches, this usually outperforms the native OS functions.



Z 0e7eaf26e777a2671ded416cb06eb930