View Issue Details

IDProjectCategoryView StatusLast Update
0000332fileGeneralpublic2022-04-04 17:48
Reportervinc17 Assigned Tochristos  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version5.41 
Fixed in Version5.42 
Summary0000332: misdetects "[number] text" files as JSON data
DescriptionSome non-JSON text files have a form like "[number] text". For instance, Xorg.0.log log files start with something like

[ 48.187]
X.Org X Server 1.21.1.3

where "[number]" is a timestamp. Because "[number]" looks like a JSON object, "file" detects such files as JSON data, even though the object is followed by garbage when interpreted as JSON.
Steps To Reproduce$ echo "[1] foo" | file -
/dev/stdin: JSON data
Additional InformationI don't know whether this is related, but I can see in the src/is_json.c source:

/*
 * if JSON_COUNT != 0:
 * count all the objects, require that we have the whole data file
 * otherwise:
 * stop if we find an object or an array
 */
[...]
#if JSON_COUNT
        /* bail quickly if not counting */
        if (lvl > 1 && (st[JSON_OBJECT] || st[JSON_ARRAYN]))
                return 1;
#endif

The "#if JSON_COUNT" and "bail quickly if not counting" comment seem contradictory (if JSON_COUNT != 0, then it is counting), so I'm wondering what is expected.
TagsNo tags attached.

Activities

vinc17

2022-03-22 09:59

reporter   ~0003721

See also commit 479e0995523c42b83a055781d27a0c651dc286e2, whose intent was to fix PR/69 (the same bug I had reported in the past).

wgh

2022-03-25 03:51

reporter   ~0003722

PR/165 reported that some json files are recognized as ASCII text, so the conditions for json file recognition were relaxed, resulting in some files being mistakenly recognized as json again。 I think that should be the reason。

vinc17

2022-03-25 08:58

reporter   ~0003723

Note that in PR/165, all the examples consisted in one JSON object, with no "garbage" following it. If rules are relaxed to allow very simple objects like some of the PR/165 examples, then garbage detection becomes important to avoid many false positives. Anyway, I suppose that the fix of PR/69 was wrong: the solution was not to discard simple JSON objects; instead, it should have detected garbage (i.e. any non-whitespace character) after a JSON object has been parsed. Examples with json_pp:

$ echo '[]' | json_pp
[]
$ echo '[] ' | json_pp
[]
$ echo '[] 1' | json_pp
garbage after JSON object, at character offset 4 (before "\n") at /usr/bin/json_pp line 59.

vinc17

2022-03-25 09:16

reporter   ~0003724

I'm going to provide a very simple patch, with testcases.

vinc17

2022-03-25 09:30

reporter   ~0003725

In json_parse for the end of the recursion (lvl == 0), return 0 (failure) if the end of the file has not been reached (whitespace has been skipped just before).

Two testcases are provided:
1. A simple JSON array followed by whitespace (a newline character), which should be recognized as JSON data.
2. Ditto followed by a non-whitespace character (a digit); this is not a valid JSON file, thus should be recognized as ASCII text.
PR332.patch (1,061 bytes)   
diff --git a/src/is_json.c b/src/is_json.c
index 93bb0456..604cf9e6 100644
--- a/src/is_json.c
+++ b/src/is_json.c
@@ -394,7 +394,7 @@ out:
 	DPRINTF("End general: ", uc, *ucp);
 	*ucp = uc;
 	if (lvl == 0)
-		return rv && (st[JSON_ARRAYN] || st[JSON_OBJECT]);
+		return rv && uc == ue && (st[JSON_ARRAYN] || st[JSON_OBJECT]);
 	return rv;
 }
 
diff --git a/tests/json4.result b/tests/json4.result
new file mode 100644
index 00000000..550d7ac2
--- /dev/null
+++ b/tests/json4.result
@@ -0,0 +1 @@
+JSON text data
\ No newline at end of file
diff --git a/tests/json4.testfile b/tests/json4.testfile
new file mode 100644
index 00000000..7660873d
--- /dev/null
+++ b/tests/json4.testfile
@@ -0,0 +1 @@
+[1]
diff --git a/tests/json5.result b/tests/json5.result
new file mode 100644
index 00000000..90965495
--- /dev/null
+++ b/tests/json5.result
@@ -0,0 +1 @@
+ASCII text
\ No newline at end of file
diff --git a/tests/json5.testfile b/tests/json5.testfile
new file mode 100644
index 00000000..01bd52f2
--- /dev/null
+++ b/tests/json5.testfile
@@ -0,0 +1 @@
+[1] 2
PR332.patch (1,061 bytes)   

vinc17

2022-03-25 11:01

reporter   ~0003726

FYI, I've also reported the bug in the Debian BTS and put a simplified patch there (no testcases, 2 lines of context removed) so that it can also be applied on the current Debian package: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1008247

christos

2022-04-04 17:48

manager   ~0003731

Committed, thanks!

Issue History

Date Modified Username Field Change
2022-03-22 09:49 vinc17 New Issue
2022-03-22 09:59 vinc17 Note Added: 0003721
2022-03-25 03:51 wgh Note Added: 0003722
2022-03-25 08:58 vinc17 Note Added: 0003723
2022-03-25 09:16 vinc17 Note Added: 0003724
2022-03-25 09:30 vinc17 File Added: PR332.patch
2022-03-25 09:30 vinc17 Note Added: 0003725
2022-03-25 11:01 vinc17 Note Added: 0003726
2022-04-04 17:48 christos Assigned To => christos
2022-04-04 17:48 christos Status new => assigned
2022-04-04 17:48 christos Status assigned => resolved
2022-04-04 17:48 christos Resolution open => fixed
2022-04-04 17:48 christos Fixed in Version => 5.42
2022-04-04 17:48 christos Note Added: 0003731