View Issue Details

IDProjectCategoryView StatusLast Update
0000151fileGeneralpublic2020-08-23 19:32
Reporterenkeli Assigned Tochristos  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionno change required 
Product Version5.38 
Fixed in Version5.40 
Summary0000151: Malformed PDF with PK prefix is recognized as ZIP archive
DescriptionHi, I came across several files with the following data:

00000000: 504b 0506 0000 0000 0000 0000 0000 0000 PK..............
00000010: 0000 0000 0000 0a25 5044 462d 312e 350a .......%PDF-1.5.
00000020: 25c3 a4c3 bcc3 b6c3 9f0a 3220 3020 6f62 %.........2 0 ob
00000030: 6a0a 3c3c 2f4c 656e 6774 6820 3320 3020 j.<</Length 3 0
00000040: 522f 4669 6c74 6572 2f46 6c61 7465 4465 R/Filter/FlateDe
00000050: 636f 6465 3e3e 0a73 7472 6561 6d0a 789c code>>.stream.x.
00000060: 454e bb0a c330 0cdc fd15 9a0b 4975 4e6c EN...0......IuNl
....

Apart from the first PK signature (End of central directory record, with null bytes, which is strange at the beginning of the file) the rest seems like a valid PDF file. When I remove everything up to "%PDF" then I can open the PDF file. Maybe the file is not valid at all, somehow malformed. Do you think it could be detected as something other than ZIP?
TagsNo tags attached.

Activities

enkeli

2020-03-05 07:50

reporter  

not-a-zip.pdf (11,041 bytes)

enkeli

2020-03-05 07:54

reporter   ~0003388

Sorry, I attached a wrong file. This is the one I mentioned in the issue.
not-a-zip.dat (11,064 bytes)

christos

2020-03-19 19:27

manager   ~0003392

Well, the start of the file PK\05\06 is the marker for the empty zip so that matches first... We can make this smarter but even the PDF definition says it has to start on the first byte.

enkeli

2020-03-20 12:47

reporter   ~0003393

Ah, I see. Let's say it is ZIP but its EOCD has `.ZIP file comment length` bytes 0000 and yet there is a comment which is far longer than 2 bytes can say (65k) Therefore I'd say it is an invalid empty ZIP archive. I do not know how `file` tool handles such files..

christos

2020-08-23 19:32

manager   ~0003458

There is not much we can do with corrupt files.

Issue History

Date Modified Username Field Change
2020-03-05 07:50 enkeli New Issue
2020-03-05 07:50 enkeli File Added: not-a-zip.pdf
2020-03-05 07:54 enkeli File Added: not-a-zip.dat
2020-03-05 07:54 enkeli Note Added: 0003388
2020-03-19 18:54 christos Assigned To => christos
2020-03-19 18:54 christos Status new => assigned
2020-03-19 19:27 christos Status assigned => feedback
2020-03-19 19:27 christos Note Added: 0003392
2020-03-20 12:47 enkeli Note Added: 0003393
2020-03-20 12:47 enkeli Status feedback => assigned
2020-08-23 19:32 christos Status assigned => closed
2020-08-23 19:32 christos Resolution open => no change required
2020-08-23 19:32 christos Fixed in Version => 5.40
2020-08-23 19:32 christos Note Added: 0003458