View Issue Details

IDProjectCategoryView StatusLast Update
0000364fileGeneralpublic2022-07-07 17:20
Reportermam-ableton Assigned Tochristos  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Fixed in VersionHEAD 
Summary0000364: Truncated "from" string when processing QEMU ELF coredumps
DescriptionFile shows incorrect output when processing ELF coredumps created by QEMU.

Here is example output:

root@489a926e3c6b:/ci/test-arm# file qemu_segfault-arm-more-than-sixteen-chars_20220701-162324_884.core
qemu_segfault-arm-more-than-sixteen-chars_20220701-162324_884.core: ELF 32-bit LSB core file, ARM, version 1 (SYSV), SVR4-style, from 'segfault-arm-mor./segfault-arm-more-than-sixteen-chars', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: './segfault-arm-more-than-sixteen-chars', platform: 'v8l'

Note this specific part: "from 'segfault-arm-mor./segfault-arm-more-than-sixteen-chars"

It should actually say: "from './segfault-arm-more-than-sixteen-chars"

This is two separate strings that have been incorrectly concatenated: "segfault-arm-mor" and "./segfault-arm-more-than-sixteen-chars"

The first string comes from the `pr_fname` member of the `struct elf_prpsinfo`. It is a 16 byte char buffer that contains the first 16 bytes of the filename. It is not required to be NULL terminated, *although Linux does*. QEMU does not in its coredumps — if the filename is >=16 characters, there is no NULL separating this buffer from `pr_psargs` (described next).

The second string comes from the immediate following struct member, `pr_psargs`, an 80 byte char buffer with the argv strings.

See struct definition here: https://github.com/torvalds/linux/blob/c1084b6c5620a743f86947caca66d90f24060f56/include/linux/elfcore.h#L73-L74
QEMU has a matching one: https://github.com/qemu/qemu/blob/19361471b59441cd6f2aa22d4fbee7a6e9e76586/linux-user/elfload.c#L3558-L3559

Here's the bug: File first examines `pr_psargs` via its offsets lists: https://github.com/file/file/blob/f042050f59bfc037677871c4d1037c33273f5213/src/readelf.c#L266-L267

If it finds a valid string there, it then checks the `pr_fname` buffer (immediately before in memory) if it contains only printable characters. If it does, it concludes that both buffers are the first and second parts of the same string, and prints output with them concatenated. See https://github.com/file/file/blob/f042050f59bfc037677871c4d1037c33273f5213/src/readelf.c#L894-L905

Strictly speaking, this is not sound because the `pr_fname` buffer is not guaranteed to be NULL terminated (i.e. have a non-printable character).

In practice, this bug does not manifest for most coredumps because they are generated by the Linux kernel, which happens to NULL terminate `pr_fname`. This causes the above printable character check to fail, and only the `pr_psargs` buffer is output.



Environment:

```
# file --version
file-5.38
magic file from /etc/magic:/usr/share/misc/magic

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal

# uname -a
Linux 489a926e3c6b 5.10.104-linuxkit 0000001 SMP Thu Mar 17 17:08:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
```
Steps To ReproduceFor convenience I've attached a QEMU coredump and native Linux coredump. They exceed the upload limit, so I've made them available here: https://www.dropbox.com/sh/2si8tcbt2w5rumh/AAB447Q2Vey6kgSozQV5uKa6a?dl=0

# file *
native-linux-core.core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from './main-more-than-sixteen-chars', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: './main-more-than-sixteen-chars', platform: 'x86_64'
qemu_main-test-this-is-more-than-sixteen-chars_20220705-161944_20938.core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'main-test-this-i', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, platform: 'i686'

To try from scratch:

- Build an x86_64 binary that segfaults and name it with a filename >= 16 bytes in length

int main(){
        *(int*)(0) = 0;
}

gcc -o main-more-than-sixteen-chars main.c

- Install qemu-user (e.g. apt install qemu-user)
- Enable coredumps (ulimit -c unlimited)
- Run the binary under qemu-x86_64 (e.g. qemu-x86_64 main-more-than-sixteen-chars)
- Run file on the coredump (e.g. file core)
TagsNo tags attached.

Activities

mam-ableton

2022-07-05 14:26

reporter   ~0003783

> - Run file on the coredump (e.g. file core)

Typo here; the coredump will not be named "core''. It will begin with "qemu_..."

christos

2022-07-07 15:32

manager   ~0003784

Hmm the native hexdump looks like:
000006e0 be 51 00 00 01 00 00 00 6d 61 69 6e 2d 6d 6f 72 |.Q......main-mor|
000006f0 65 2d 74 68 61 6e 2d 00 2e 2f 6d 61 69 6e 2d 6d |e-than-../main-m|
00000700 6f 72 65 2d 74 68 61 6e 2d 73 69 78 74 65 65 6e |ore-than-sixteen|
00000710 2d 63 68 61 72 73 20 00 00 00 00 00 00 00 00 00 |-chars .........|

We first look at the full name at 0x6f8, we find it and we print it.

Where the qemu one looks like:
00000590 ca 51 00 00 01 00 00 00 6d 61 69 6e 2d 74 65 73 |.Q......main-tes|
000005a0 74 2d 74 68 69 73 2d 69 d7 58 80 01 40 20 20 20 |t-this-i.X..@ |
000005b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000005f0 00 00 00 00 00 00 00 00 05 00 00 00 10 01 00 00 |................|
00000600 06 00 00 00 43 4f 52 45 00 75 6e 61 03 00 00 00 |....CORE.una....|


As you can see QEMU does not put the full command line where we expect it (at 0x5a8 byte 0xd7) and the contents there are non printable, so it tries at 0x598 and prints the short name (which is as you mention non-nul-terminated).

mam-ableton

2022-07-07 16:01

reporter   ~0003785

Thanks for the quick reply. My mistake - that qemu core dump was from a buggy old qemu version which produced buggy coredumps. (See https://github.com/qemu/qemu/commit/5f779a3a26a9dcc8072d909b7759bb9fade097a9)

I have supplied a coredump from a newer qemu version : "qemu_main-test-this-is-more-than-sixteen-chars_20220702-121202_14541.core" in the same link: https://www.dropbox.com/sh/2si8tcbt2w5rumh/AAB447Q2Vey6kgSozQV5uKa6a?dl=0

That produces output like this:

qemu_main-test-this-is-more-than-sixteen-chars_20220702-121202_14541.core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'main-test-this-i./main-test-this-is-more-than-sixteen-chars', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: './main-test-this-is-more-than-sixteen-chars', platform: 'x86_64'

In this case the short name and arguments are directly continuous. First it will look at the args, but then it will peek at the short name immediately before, see that there are all printable characters, and assume they are both part of the same string, which is not the case.

000005f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000600: 0000 0000 0500 0000 8800 0000 0300 0000 ................
00000610: 434f 5245 0075 6e61 0000 0000 0000 0000 CORE.una........
00000620: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000630: cd38 0000 ad38 0000 cd38 0000 ad38 0000 .8...8...8...8..
00000640: 6d61 696e 2d74 6573 742d 7468 6973 2d69 main-test-this-i
00000650: 2e2f 6d61 696e 2d74 6573 742d 7468 6973 ./main-test-this
00000660: 2d69 732d 6d6f 7265 2d74 6861 6e2d 7369 -is-more-than-si
00000670: 7874 6565 6e2d 6368 6172 7320 0000 0000 xteen-chars ....
00000680: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000690: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000006a0: 0500 0000 2001 0000 0600 0000 434f 5245 .... .......CORE
000006b0: 0075 6e61 0300 0000 0000 0000 4000 0000 .una........@...

christos

2022-07-07 17:20

manager   ~0003786

Thanks, fixed.

Issue History

Date Modified Username Field Change
2022-07-05 14:24 mam-ableton New Issue
2022-07-05 14:26 mam-ableton Note Added: 0003783
2022-07-07 15:32 christos Note Added: 0003784
2022-07-07 15:36 christos Assigned To => christos
2022-07-07 15:36 christos Status new => assigned
2022-07-07 16:01 mam-ableton Note Added: 0003785
2022-07-07 17:20 christos Status assigned => resolved
2022-07-07 17:20 christos Resolution open => fixed
2022-07-07 17:20 christos Fixed in Version => HEAD
2022-07-07 17:20 christos Note Added: 0003786