View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000622 | file | General | public | 2025-02-07 18:46 | 2025-04-25 21:33 |
Reporter | Odd_Bloke | Assigned To | |||
Priority | normal | Severity | minor | Reproducibility | always |
Status | new | Resolution | open | ||
Product Version | 5.46 | ||||
Summary | 0000622: Incorrect mime type for zip files when using magic_buffer | ||||
Description | Using `python3-magic` as the interface, `magic.from_buffer(..., mime=True)` against a zip file with libmagic 5.45 installed returns the expected `application/zip` but with libmagic 5.46 installed returns `application/octet-stream` instead. Note that unlike https://bugs.astron.com/view.php?id=612, the `file` command does return the expected `application/zip` with both library versions. The issue I'm hitting does sound like the issue ifoancea mentioned in the first comment on that issue. | ||||
Steps To Reproduce | Using Docker to reproduce with 5.46 (with irrelevant output redacted for readability): $ docker run -t cgr.dev/chainguard/wolfi-base sh -c ' apk add file libmagic py3-magic python3 zip touch test_file zip test.zip test_file file --mime test.zip python3 -c "import magic; file = open(\"test.zip\",\"rb\"); print(magic.from_buffer(file.read(), mime=True))"' fetch https://packages.wolfi.dev/os/x86_64/APKINDEX.tar.gz (1/18) Installing libmagic (5.46-r1) (2/18) Installing file (5.46-r1) ... OK: 69 MiB in 33 packages adding: test_file (stored 0%) test.zip: application/zip; charset=binary application/octet-stream and to see the expected behaviour on the previous version (note the version specifiers on the file and libmagic packages): $ docker run -t cgr.dev/chainguard/wolfi-base sh -c ' apk add file\<5.46 libmagic\<5.46 py3-magic python3 zip touch test_file zip test.zip test_file file --mime test.zip python3 -c "import magic; file = open(\"test.zip\",\"rb\"); print(magic.from_buffer(file.read(), mime=True))"' fetch https://packages.wolfi.dev/os/x86_64/APKINDEX.tar.gz (1/18) Installing libmagic (5.45-r4) (2/18) Installing file (5.45-r4) ... OK: 67 MiB in 33 packages adding: test_file (stored 0%) test.zip: application/zip; charset=binary application/zip | ||||
Additional Information | (I have also tested a build of master from GitHub and observed the same unexpected behaviour.) | ||||
Tags | No tags attached. | ||||
|
I am seeing something similar in file 5.45 and above (including master). In this case it was observed with both python wrappers and Golang wrappers that use the magic_buffer API. It does not impact the magic_file API which is why the CLI tool (from file or stdin) works fine. Note: I am not deeply knowledgeable of this codebase so I am not sure if this is the right/best fix. However, this is the behavior I observed when comparing where the debug output diverges, and the fix. I noticed that the debug output indicated the offset of 'b' was equal to FILE_BADSIZE. After more digging I was able to isolate the issue/codepath where this occurs. From what I can tell it is related to the negative buffer offset handling that occurs during calls to softmagic->msetoffset(). In particular it is how msetoffset behaves when the offset is negative AND you are using magic_buffer. In particular there are two issues: 1. buffer_fill() returns early due to "if (!S_ISREG(b->st.st_mode))". Here, I believe we need to add a check for "b->fd == -1" and return 0 early. We don't need to copy or re-scan the buffer since the whole file is already in memory as part of the magic_buffer call. 2. We need to adjust some of the checks in msetoffset() when the offset is negative. Since we're not using b->elen, b->ebuf, we need to work around this (ideally without duplicating the buffer). We can populate "bb" from "b->flen" and "b->fbuf" (assuming allocation/frees are safe as well). Then we need to also update a few checks that use "b->elen" and the calculation of "ms->eoffset". A rough diff looks like: ------------- diff --git a/src/buffer.c b/src/buffer.c index 598db147..dd5bc480 100644 --- a/src/buffer.c +++ b/src/buffer.c @@ -68,8 +89,16 @@ buffer_fill(const struct buffer *bb) if (b->elen != 0) return b->elen == FILE_BADSIZE ? -1 : 0; - if (!S_ISREG(b->st.st_mode)) - goto out; + if (b->fd == -1) { // Working purely in memory, nothing to re-fill + return 0; + } if (!S_ISREG(b->st.st_mode)) goto out; diff --git a/src/softmagic.c b/src/softmagic.c index 59300a69..100a70f9 100644 --- a/src/softmagic.c +++ b/src/softmagic.c @@ -1534,10 +1534,19 @@ msetoffset(struct magic_set *ms, struct magic *m, struct buffer *bb, "u at level %u", o, cont_level); return -1; } - if (CAST(size_t, m->offset) > b->elen) - return -1; - buffer_init(bb, -1, NULL, b->ebuf, b->elen); - ms->eoffset = ms->offset = CAST(int32_t, b->elen - m->offset); + if ( (b->fd == -1 && CAST(size_t, m->offset) > b->flen) || (b->fd != -1 && CAST(size_t, m->offset) > b->elen)) { + printf("dropping len=%zu offset=%d\n", b->flen, m->offset); + return -1; + } + if(b->fd==-1) { + printf("reverse buffer init!\n"); + buffer_init(bb, -1, NULL, b->fbuf, b->flen); + ms->eoffset = ms->offset = CAST(int32_t, b->flen - m->offset); + } else { + buffer_init(bb, -1, NULL, b->ebuf, b->elen); + ms->eoffset = ms->offset = CAST(int32_t, b->elen - m->offset); + } + } else { offset = m->offset; ------------- |
|
I tested a bit more and came across a bug in my above workaround. Without it it can error out with "ERROR: negative offset X at continuationlevel Y". This is all pretty rough so the diff is simple. Its one additional fix to softmagic at line 1559. Trying to collect some more samples as some zipfiles trigger the reported behavior, and some do not. I'm also beginning to suspect this odd file vs buffer behavior is only triggered with certain malformed zipfiles floating around. Either way its an odd difference that hopefully we can resolve. ------ diff --git a/src/softmagic.c b/src/softmagic.c index 1b95d522..6bb97e82 100644 --- a/src/softmagic.c +++ b/src/softmagic.c @@ -1517,11 +1517,10 @@ msetoffset(struct magic_set *ms, struct magic *m, struct buffer *bb, int32_t offset; if (m->flag & OFFNEGATIVE) { offset = -m->offset; -printf("file negative msetoffset\n"); if (cont_level > 0) { if (m->flag & (OFFADD|INDIROFFADD)) goto normal; -#if 1 +#if 0 file_error(ms, 0, "negative offset %d at continuation" "level %u", m->offset, cont_level); return -1; @@ -1542,10 +1541,12 @@ printf("file negative msetoffset\n"); if(b->fd==-1) { printf("reverse buffer init!\n"); buffer_init(bb, -1, NULL, b->fbuf, b->flen); + ms->eoffset = ms->offset = CAST(int32_t, b->flen - m->offset); } else { buffer_init(bb, -1, NULL, b->ebuf, b->elen); + ms->eoffset = ms->offset = CAST(int32_t, b->elen - m->offset); } - ms->eoffset = ms->offset = CAST(int32_t, b->elen - m->offset); + } else { offset = m->offset; if ((m->flag & OFFPOSITIVE) || cont_level == 0) { @@ -1555,7 +1556,7 @@ normal: ms->offset = offset; ms->eoffset = 0; } else { - ms->offset = ms->eoffset + offset; + ms->offset = (b->fd == -1) ? offset : ms->eoffset + offset; } } ------ |
Date Modified | Username | Field | Change |
---|---|---|---|
2025-02-07 18:46 | Odd_Bloke | New Issue | |
2025-04-24 23:54 | gwittelspl | Note Added: 0004212 | |
2025-04-25 21:33 | gwittelspl | Note Added: 0004213 |