View Issue Details

IDProjectCategoryView StatusLast Update
0000104file[All Projects] Generalpublic2019-09-11 17:07
ReporterIlrandarAssigned Tochristos 
PrioritynormalSeverityminorReproducibilityhave not tried
Status assignedResolutionopen 
PlatformOSArchLinuxOS Version
Product Version5.37 
Target VersionFixed in Version 
Summary0000104: pdf file incorrectly reported as `data`
DescriptionSome pdf files downloaded from the internet are incorrectly reported as `data` by file. Their associated mime-type is `application/octet-stream` and not `application/pdf`. I join such a pdf to this report.
TagsNo tags attached.

Activities

Ilrandar

2019-09-10 21:04

reporter  

certificat_scolarité_l2_eco.pdf (1,184,843 bytes)

christos

2019-09-11 14:42

manager   ~0003288

These are the first few lines of the file:

HTTP/1.1 200 OK
Date: Tue, 10 Sep 2019 08:38:20 GMT
Server: Apache/2.4.38 (Debian)
Content-Disposition: attachment; filename="21808995-2019-certificat-scolarite.pdf"
Cache-Control: no-cache, private
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
Content-Length: 1184531
Content-Type: application/pdf

Here's where the pdf file starts:

%PDF-1.3

The tool you used to download it or the original file has junk in front. Of course some browsers ignore the junk and process it as a pdf file (because users want things to just work), but this is just crappy behavior. Most application will not open it properly, and it is also a security issue since you can masquerade files this way. It is also fragile. How many lines does it try to parse? 10? 1K of data? Who knows. Depends on the implementation. Of course file can also be modified to mimick this behavior at the loss of efficiency and encouraging people to produce junk...

Ilrandar

2019-09-11 17:07

reporter   ~0003295

Oh, I didn’t know I could open pdf files with a text editor.
I don’t think you should ignore junk in front of file. I just needed some way to get this file (and a few other) to be recognized as pdf files, but if I can just open them and get rid of the leading incorrect lines, I will just do it.
Thank you for your answer.
As far as I’m concerned, you can consider this issue closed.

Issue History

Date Modified Username Field Change
2019-09-10 21:04 Ilrandar New Issue
2019-09-10 21:04 Ilrandar File Added: certificat_scolarité_l2_eco.pdf
2019-09-11 14:39 christos Assigned To => christos
2019-09-11 14:39 christos Status new => assigned
2019-09-11 14:42 christos Status assigned => feedback
2019-09-11 14:42 christos Note Added: 0003288
2019-09-11 17:07 Ilrandar Note Added: 0003295
2019-09-11 17:07 Ilrandar Status feedback => assigned