View Issue Details

IDProjectCategoryView StatusLast Update
0000446fileGeneralpublic2023-05-21 15:49
Reportermike Assigned Tochristos  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version5.41 
Fixed in Version5.45 
Summary0000446: PDF appendix allows header in first 1024 bytes; Magic looks in 256
DescriptionTL;DR, I think this line needs to be changed to 1024, not 256:

https://github.com/file/file/blob/FILE5_39/magic/Magdir/pdf#L39

----

Long version....

The Seventh Circuit of Appeals in the United States has started publishing documents that do not comply with the PDF specification, but which do comply with the PDF Compatibility and Implementation notes from Appendix H of the specification. See attached for an example, or this link should work:

http://media.ca7.uscourts.gov/cgi-bin/OpinionsWeb/processWebInputExternal.pl?Submit=Display&Path=Y2023/D04-27/C:22-2500:J:Brennan:aut:T:fnOp:N:3036932:S:0

The PDF specification says on page 92 that:

> The first line of a PDF file is a header identifying the version of the PDF [...] For a file conforming to PDF 1.7, the header should be:
>
> `%PDF=1.7

(See: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf#page=92)

Easy enough.

But Appendix H of the same spec says:

> Acrobat viewers require only that the header appear somewhere within the first 1024 bytes of the file.

(See: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf#page=1102)

A few years ago, this issue came up:

https://bugs.astron.com/view.php?id=104

If I'm understanding correctly, a fix was put in place to look in the first 256 bytes of the file, here:

https://github.com/file/file/blob/FILE5_39/magic/Magdir/pdf#L39

I think we just need to adjust this to look in the first 1024 bytes instead, and it should fix this and other issues.
Steps To Reproduce1. Download the file
2. Run `file the-file.pdf`
3. Note that it's detected as `data`.
4. Do `head the-file.pdf`
5. Note that `%PDF-` is there, but that there's text before it.
6. Study the spec and appendix H
7. Note that this file opens properly in Adobe Reader, and other PDF readers.
8. Note that file doesn't follow the implementation (aka the defacto specification)
TagsNo tags attached.

Activities

mike

2023-05-02 20:01

reporter  

document.pdf (337,981 bytes)

christos

2023-05-21 15:49

manager   ~0003928

Fixed, thanks!

Issue History

Date Modified Username Field Change
2023-05-02 20:01 mike New Issue
2023-05-02 20:01 mike File Added: document.pdf
2023-05-21 15:49 christos Assigned To => christos
2023-05-21 15:49 christos Status new => assigned
2023-05-21 15:49 christos Status assigned => resolved
2023-05-21 15:49 christos Resolution open => fixed
2023-05-21 15:49 christos Fixed in Version => 5.45
2023-05-21 15:49 christos Note Added: 0003928