View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0000332||file||General||public||2022-03-22 09:49||2022-04-04 17:48|
|Fixed in Version||HEAD|
|Summary||0000332: misdetects "[number] text" files as JSON data|
|Description||Some non-JSON text files have a form like "[number] text". For instance, Xorg.0.log log files start with something like|
X.Org X Server 184.108.40.206
where "[number]" is a timestamp. Because "[number]" looks like a JSON object, "file" detects such files as JSON data, even though the object is followed by garbage when interpreted as JSON.
|Steps To Reproduce||$ echo " foo" | file -|
/dev/stdin: JSON data
|Additional Information||I don't know whether this is related, but I can see in the src/is_json.c source:|
* if JSON_COUNT != 0:
* count all the objects, require that we have the whole data file
* stop if we find an object or an array
/* bail quickly if not counting */
if (lvl > 1 && (st[JSON_OBJECT] || st[JSON_ARRAYN]))
The "#if JSON_COUNT" and "bail quickly if not counting" comment seem contradictory (if JSON_COUNT != 0, then it is counting), so I'm wondering what is expected.
|Tags||No tags attached.|
||See also commit 479e0995523c42b83a055781d27a0c651dc286e2, whose intent was to fix PR/69 (the same bug I had reported in the past).|
||PR/165 reported that some json files are recognized as ASCII text, so the conditions for json file recognition were relaxed, resulting in some files being mistakenly recognized as json again。 I think that should be the reason。|
Note that in PR/165, all the examples consisted in one JSON object, with no "garbage" following it. If rules are relaxed to allow very simple objects like some of the PR/165 examples, then garbage detection becomes important to avoid many false positives. Anyway, I suppose that the fix of PR/69 was wrong: the solution was not to discard simple JSON objects; instead, it should have detected garbage (i.e. any non-whitespace character) after a JSON object has been parsed. Examples with json_pp:
$ echo '' | json_pp
$ echo ' ' | json_pp
$ echo ' 1' | json_pp
garbage after JSON object, at character offset 4 (before "\n") at /usr/bin/json_pp line 59.
||I'm going to provide a very simple patch, with testcases.|
In json_parse for the end of the recursion (lvl == 0), return 0 (failure) if the end of the file has not been reached (whitespace has been skipped just before).
Two testcases are provided:
1. A simple JSON array followed by whitespace (a newline character), which should be recognized as JSON data.
2. Ditto followed by a non-whitespace character (a digit); this is not a valid JSON file, thus should be recognized as ASCII text.
PR332.patch (1,061 bytes)
diff --git a/src/is_json.c b/src/is_json.c index 93bb0456..604cf9e6 100644 --- a/src/is_json.c +++ b/src/is_json.c @@ -394,7 +394,7 @@ out: DPRINTF("End general: ", uc, *ucp); *ucp = uc; if (lvl == 0) - return rv && (st[JSON_ARRAYN] || st[JSON_OBJECT]); + return rv && uc == ue && (st[JSON_ARRAYN] || st[JSON_OBJECT]); return rv; } diff --git a/tests/json4.result b/tests/json4.result new file mode 100644 index 00000000..550d7ac2 --- /dev/null +++ b/tests/json4.result @@ -0,0 +1 @@ +JSON text data \ No newline at end of file diff --git a/tests/json4.testfile b/tests/json4.testfile new file mode 100644 index 00000000..7660873d --- /dev/null +++ b/tests/json4.testfile @@ -0,0 +1 @@ + diff --git a/tests/json5.result b/tests/json5.result new file mode 100644 index 00000000..90965495 --- /dev/null +++ b/tests/json5.result @@ -0,0 +1 @@ +ASCII text \ No newline at end of file diff --git a/tests/json5.testfile b/tests/json5.testfile new file mode 100644 index 00000000..01bd52f2 --- /dev/null +++ b/tests/json5.testfile @@ -0,0 +1 @@ + 2
PR332.patch (1,061 bytes)
||FYI, I've also reported the bug in the Debian BTS and put a simplified patch there (no testcases, 2 lines of context removed) so that it can also be applied on the current Debian package: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1008247|
|2022-03-22 09:49||vinc17||New Issue|
|2022-03-22 09:59||vinc17||Note Added: 0003721|
|2022-03-25 03:51||wgh||Note Added: 0003722|
|2022-03-25 08:58||vinc17||Note Added: 0003723|
|2022-03-25 09:16||vinc17||Note Added: 0003724|
|2022-03-25 09:30||vinc17||File Added: PR332.patch|
|2022-03-25 09:30||vinc17||Note Added: 0003725|
|2022-03-25 11:01||vinc17||Note Added: 0003726|
|2022-04-04 17:48||christos||Assigned To||=> christos|
|2022-04-04 17:48||christos||Status||new => assigned|
|2022-04-04 17:48||christos||Status||assigned => resolved|
|2022-04-04 17:48||christos||Resolution||open => fixed|
|2022-04-04 17:48||christos||Fixed in Version||=> HEAD|
|2022-04-04 17:48||christos||Note Added: 0003731|