View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000526 | file | General | public | 2024-05-15 02:58 | 2024-06-16 15:00 |
Reporter | maiphuc | Assigned To | christos | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | assigned | Resolution | open | ||
Platform | linux | OS | ubuntu | OS Version | 20.04 |
Summary | 0000526: html file gets misdetected as csv | ||||
Description | When I request https://www.doordash.com/ and save the response body to a file. This file should be HTML but the file command checks it as csv. | ||||
Steps To Reproduce | file test.txt | ||||
Tags | No tags attached. | ||||
|
|
|
That's unfortunate because it happens that that html file has 3 comma-separated-fields for the first 19 lines... File requires that only 10 lines have the same number of fields... So in this special case it misdetects: [11:13am] 337>cc -DCSV_LINES=40 -DDEBUG -DTEST is_csv.c [11:13am] 338>./a.out o 0 3 0 1 3 3 2 3 3 3 3 3 4 3 3 5 3 3 6 3 3 7 3 3 8 3 3 9 3 3 10 3 3 11 3 3 12 3 3 13 3 3 14 3 3 15 3 3 16 3 3 17 3 3 18 3 3 19 3 3 20 12 3 is csv 0 |
|
Thank you for your attention to this issue. I have a question. If a file can be identified as both an HTML file or a CSV file, So why do we choose CSV? |
|
the answer is a little complicated... Using -k (keep going) should print both. Now file(1) has both built-in recognition for formats that magic files can't easily handle (tar, csv, json, der, ctf, etc.) and the regular magic definitions (softmagic). The softmagic entries are sorted with respect to "strength" a heuristic for how specific a magic entry is, but the built-ins are not sorted and are applied in sequence before softmagic. Typically this is not a problem because the built-in ones usually don't have spurious matches. |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-05-15 02:58 | maiphuc | New Issue | |
2024-05-15 02:58 | maiphuc | File Added: test.txt | |
2024-05-18 15:13 | christos | Assigned To | => christos |
2024-05-18 15:13 | christos | Status | new => assigned |
2024-05-18 15:15 | christos | Status | assigned => feedback |
2024-05-18 15:15 | christos | Note Added: 0004049 | |
2024-05-21 09:18 | maiphuc | Note Added: 0004051 | |
2024-05-21 09:18 | maiphuc | Status | feedback => assigned |
2024-06-16 15:00 | christos | Note Added: 0004054 |