View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000332 | file | General | public | 2022-03-22 09:49 | 2022-04-04 17:48 |
Reporter | vinc17 | Assigned To | christos | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Product Version | 5.41 | ||||
Fixed in Version | 5.42 | ||||
Summary | 0000332: misdetects "[number] text" files as JSON data | ||||
Description | Some non-JSON text files have a form like "[number] text". For instance, Xorg.0.log log files start with something like [ 48.187] X.Org X Server 1.21.1.3 where "[number]" is a timestamp. Because "[number]" looks like a JSON object, "file" detects such files as JSON data, even though the object is followed by garbage when interpreted as JSON. | ||||
Steps To Reproduce | $ echo "[1] foo" | file - /dev/stdin: JSON data | ||||
Additional Information | I don't know whether this is related, but I can see in the src/is_json.c source: /* * if JSON_COUNT != 0: * count all the objects, require that we have the whole data file * otherwise: * stop if we find an object or an array */ [...] #if JSON_COUNT /* bail quickly if not counting */ if (lvl > 1 && (st[JSON_OBJECT] || st[JSON_ARRAYN])) return 1; #endif The "#if JSON_COUNT" and "bail quickly if not counting" comment seem contradictory (if JSON_COUNT != 0, then it is counting), so I'm wondering what is expected. | ||||
Tags | No tags attached. | ||||
|
See also commit 479e0995523c42b83a055781d27a0c651dc286e2, whose intent was to fix PR/69 (the same bug I had reported in the past). |
|
PR/165 reported that some json files are recognized as ASCII text, so the conditions for json file recognition were relaxed, resulting in some files being mistakenly recognized as json again。 I think that should be the reason。 |
|
Note that in PR/165, all the examples consisted in one JSON object, with no "garbage" following it. If rules are relaxed to allow very simple objects like some of the PR/165 examples, then garbage detection becomes important to avoid many false positives. Anyway, I suppose that the fix of PR/69 was wrong: the solution was not to discard simple JSON objects; instead, it should have detected garbage (i.e. any non-whitespace character) after a JSON object has been parsed. Examples with json_pp: $ echo '[]' | json_pp [] $ echo '[] ' | json_pp [] $ echo '[] 1' | json_pp garbage after JSON object, at character offset 4 (before "\n") at /usr/bin/json_pp line 59. |
|
I'm going to provide a very simple patch, with testcases. |
|
In json_parse for the end of the recursion (lvl == 0), return 0 (failure) if the end of the file has not been reached (whitespace has been skipped just before). Two testcases are provided: 1. A simple JSON array followed by whitespace (a newline character), which should be recognized as JSON data. 2. Ditto followed by a non-whitespace character (a digit); this is not a valid JSON file, thus should be recognized as ASCII text. PR332.patch (1,061 bytes)
diff --git a/src/is_json.c b/src/is_json.c index 93bb0456..604cf9e6 100644 --- a/src/is_json.c +++ b/src/is_json.c @@ -394,7 +394,7 @@ out: DPRINTF("End general: ", uc, *ucp); *ucp = uc; if (lvl == 0) - return rv && (st[JSON_ARRAYN] || st[JSON_OBJECT]); + return rv && uc == ue && (st[JSON_ARRAYN] || st[JSON_OBJECT]); return rv; } diff --git a/tests/json4.result b/tests/json4.result new file mode 100644 index 00000000..550d7ac2 --- /dev/null +++ b/tests/json4.result @@ -0,0 +1 @@ +JSON text data \ No newline at end of file diff --git a/tests/json4.testfile b/tests/json4.testfile new file mode 100644 index 00000000..7660873d --- /dev/null +++ b/tests/json4.testfile @@ -0,0 +1 @@ +[1] diff --git a/tests/json5.result b/tests/json5.result new file mode 100644 index 00000000..90965495 --- /dev/null +++ b/tests/json5.result @@ -0,0 +1 @@ +ASCII text \ No newline at end of file diff --git a/tests/json5.testfile b/tests/json5.testfile new file mode 100644 index 00000000..01bd52f2 --- /dev/null +++ b/tests/json5.testfile @@ -0,0 +1 @@ +[1] 2 |
|
FYI, I've also reported the bug in the Debian BTS and put a simplified patch there (no testcases, 2 lines of context removed) so that it can also be applied on the current Debian package: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1008247 |
|
Committed, thanks! |
Date Modified | Username | Field | Change |
---|---|---|---|
2022-03-22 09:49 | vinc17 | New Issue | |
2022-03-22 09:59 | vinc17 | Note Added: 0003721 | |
2022-03-25 03:51 | wgh | Note Added: 0003722 | |
2022-03-25 08:58 | vinc17 | Note Added: 0003723 | |
2022-03-25 09:16 | vinc17 | Note Added: 0003724 | |
2022-03-25 09:30 | vinc17 | File Added: PR332.patch | |
2022-03-25 09:30 | vinc17 | Note Added: 0003725 | |
2022-03-25 11:01 | vinc17 | Note Added: 0003726 | |
2022-04-04 17:48 | christos | Assigned To | => christos |
2022-04-04 17:48 | christos | Status | new => assigned |
2022-04-04 17:48 | christos | Status | assigned => resolved |
2022-04-04 17:48 | christos | Resolution | open => fixed |
2022-04-04 17:48 | christos | Fixed in Version | => 5.42 |
2022-04-04 17:48 | christos | Note Added: 0003731 |