#4552 closed defect (fixed)

untar: problems with existing directories

Reported by: Christian Mauderer Owned by: Christian Mauderer
Priority: normal Milestone: Indefinite
Component: lib Version: 5
Severity: normal Keywords:
Cc: Blocked By:
Blocking:

Description

Our current implementation of untar in cpukit/libmisc/untar/untar.c has problems if a directory in the archive already exists. Note that this is no problem, if the archive contains only a file.

The problem exists on 5 and master.

Example: If I have a tar.gz file which contains a file and directories l1/l2/x.txt and call Untar_FromGzChunk_Print twice, the first attempt will print

untar: dir: l1
untar: dir: l1/l2
untar: file: l1/l2/x.txt (s:12,m:0644)

After that the directories l1 already exists. So if I re-try to extract the archive, I'll get the following:

untar: dir: l1
untar: mkdir: l1: (17) File exists

My expectation would have been that the files are just integrated into an existing directory structure. If a file exists, it should be overwritten.

We have multiple references for expected behavior. GNU or BSD tar or POSIX pax. In my experience tar is the better known tool so my suggestion would be to use the default behavior of tar as a reference.

GNU or BSD tar

I tested the default behavior of GNU tar and BSD tar. It seems to be the same for both:

  • If a directory structure exists, the files from the archive will be integrated. Existing files are overwritten.
  • If a file exists and the archive contains a directory with the same name, the file is removed and a directory is created. In the above example: if l1/l2 is a file it will be overwritten with a new directory.
  • If a directory exists and the archive contains a file with the same name, the directory will be replaced if it is empty. If it contains files, the result is an error.
  • An archive also can contain only a file without the parent directories. If in that case one of the parent directories exists as a file extracting the archive results in an error. In the example: if l1/l2 is a file and the archive doesn't contain the directories but only the file l1/l2/x.txt that would be an error.

In case of an error, it is possible that the archive has been partially extracted.

Note: GNU tar has options to change the behavior (like --recursive-unlink). I'm sure there are similar options in BSD tar. From my point of view we should adapt to the default behavior, so I ignored these options.

The POSIX pax utility

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html

Default behavior is described as follows:

If an attempt is made to extract a directory when the directory already exists, this shall not be considered an error. If an attempt is made to extract a FIFO when the FIFO already exists, this shall not be considered an error.

From some quick tests pax has a similar behavior like tar. The only difference I noted is that empty directories are not overwritten with files from the archive.

Change History (2)

comment:1 Changed on 11/29/21 at 08:42:53 by Christian Mauderer

PS: A maybe related (closed) ticket is #3823. The solution to that ticket changed the behavior.

comment:2 Changed on 12/09/21 at 07:21:20 by Christian Mauderer <christian.mauderer@…>

Resolution: fixed
Status: assignedclosed

In ff3f3490/rtems:

untar: Make behavior similar to GNU or BSD tar

RTEMS untar implementation had problems with overwriting or integrating
archives into existing directory structures. This patch adapts the
behavior to mimic that of a GNU tar or BSD tar and extends the tar01
test to check for the behavior. That is:

  • If a directory structure exists, the files from the archive will be integrated. Existing files are overwritten.
  • If a file exists and the archive contains a directory with the same name, the file is removed and a directory is created. In the above example: if l1/l2 is a file it will be overwritten with a new directory.
  • If a directory exists and the archive contains a file with the same name, the directory will be replaced if it is empty. If it contains files, the result is an error.
  • An archive also can contain only a file without the parent directories. If in that case one of the parent directories exists as a file extracting the archive results in an error. In the example: if l1/l2 is a file and the archive doesn't contain the directories but only the file l1/l2/x.txt that would be an error.
  • In case of an error, it is possible that the archive has been partially extracted.

Closes #4552

Note: See TracTickets for help on using tickets.