View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000446 | file | General | public | 2023-05-02 20:01 | 2023-05-21 15:49 |
Reporter | mike | Assigned To | christos | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Product Version | 5.41 | ||||
Fixed in Version | 5.45 | ||||
Summary | 0000446: PDF appendix allows header in first 1024 bytes; Magic looks in 256 | ||||
Description | TL;DR, I think this line needs to be changed to 1024, not 256: https://github.com/file/file/blob/FILE5_39/magic/Magdir/pdf#L39 ---- Long version.... The Seventh Circuit of Appeals in the United States has started publishing documents that do not comply with the PDF specification, but which do comply with the PDF Compatibility and Implementation notes from Appendix H of the specification. See attached for an example, or this link should work: http://media.ca7.uscourts.gov/cgi-bin/OpinionsWeb/processWebInputExternal.pl?Submit=Display&Path=Y2023/D04-27/C:22-2500:J:Brennan:aut:T:fnOp:N:3036932:S:0 The PDF specification says on page 92 that: > The first line of a PDF file is a header identifying the version of the PDF [...] For a file conforming to PDF 1.7, the header should be: > > `%PDF=1.7 (See: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf#page=92) Easy enough. But Appendix H of the same spec says: > Acrobat viewers require only that the header appear somewhere within the first 1024 bytes of the file. (See: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf#page=1102) A few years ago, this issue came up: https://bugs.astron.com/view.php?id=104 If I'm understanding correctly, a fix was put in place to look in the first 256 bytes of the file, here: https://github.com/file/file/blob/FILE5_39/magic/Magdir/pdf#L39 I think we just need to adjust this to look in the first 1024 bytes instead, and it should fix this and other issues. | ||||
Steps To Reproduce | 1. Download the file 2. Run `file the-file.pdf` 3. Note that it's detected as `data`. 4. Do `head the-file.pdf` 5. Note that `%PDF-` is there, but that there's text before it. 6. Study the spec and appendix H 7. Note that this file opens properly in Adobe Reader, and other PDF readers. 8. Note that file doesn't follow the implementation (aka the defacto specification) | ||||
Tags | No tags attached. | ||||
Date Modified | Username | Field | Change |
---|---|---|---|
2023-05-02 20:01 | mike | New Issue | |
2023-05-02 20:01 | mike | File Added: document.pdf | |
2023-05-21 15:49 | christos | Assigned To | => christos |
2023-05-21 15:49 | christos | Status | new => assigned |
2023-05-21 15:49 | christos | Status | assigned => resolved |
2023-05-21 15:49 | christos | Resolution | open => fixed |
2023-05-21 15:49 | christos | Fixed in Version | => 5.45 |
2023-05-21 15:49 | christos | Note Added: 0003928 |