View Issue Details

IDProjectCategoryView StatusLast Update
0000261file[All Projects] Generalpublic2021-04-27 19:39
ReporterbitstreamoutAssigned Tochristos 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64OSopenSUSEOS VersionTumbleweed
Product Version5.40 
Target VersionFixed in Version5.41 
Summary0000261: New version breaks subversion tests
DescriptionCurrently subversion build breaks at several points ... many of the breaks are caused by behaviour change of file in detecting ASCII text without newlines.
Steps To ReproduceWith version 4.30

echo xx | file -
/dev/stdin: ASCII text, with no line terminators

with version 5.40

echo -n xx | file -
/dev/stdin: data
TagsNo tags attached.

Activities

bitstreamout

2021-04-21 05:58

reporter   ~0003593

Just to correct typo ... the 'With version 4.30' should be 'With version 5.39'

```
/suse/werner> file --version
file-5.39
magic file from /etc/magic:/usr/share/misc/magic
/suse/werner> echo -n xx | file -
/dev/stdin: ASCII text, with no line terminators
/suse/werner>
```

bitstreamout

2021-04-21 08:38

reporter   ~0003594

Could it be that the condition

if (u < 3)

within the LOOKS_ macro should be

if (u < 2)

at least for ASCII and latin1

bitstreamout

2021-04-22 12:56

reporter   ~0003595

Duplicate of https://bugs.astron.com/view.php?id=256

bitstreamout

2021-04-22 15:25

reporter   ~0003596

Hmmm ... still problems with files without last newline

```
abuild@noether:~/rpmbuild/BUILD/subversion-1.14.1> wc -c /dev/shm/svn-test-work/working_copies/merge_tests-2/A/B/F/foo
3 /dev/shm/svn-test-work/working_copies/merge_tests-2/A/B/F/foo
abuild@noether:~/rpmbuild/BUILD/subversion-1.14.1> file /dev/shm/svn-test-work/working_copies/merge_tests-2/A/B/F/foo
/dev/shm/svn-test-work/working_copies/merge_tests-2/A/B/F/foo: data
abuild@noether:~/rpmbuild/BUILD/subversion-1.14.1> cat /dev/shm/svn-test-work/working_copies/merge_tests-2/A/B/F/foo && echo
foo
```

bitstreamout

2021-04-22 15:27

reporter   ~0003597

Test with file 5.39
```
file /abuild/oscbuild/openSUSE_Tumbleweed/dev/shm/svn-test-work/working_copies/merge_tests-2/A/B/F/foo
/abuild/oscbuild/openSUSE_Tumbleweed/dev/shm/svn-test-work/working_copies/merge_tests-2/A/B/F/foo: ASCII text, with no line terminators
```

bitstreamout

2021-04-22 15:47

reporter   ~0003598

The fails.log of subversion with file 5.40

fails.log (18,188 bytes)

bitstreamout

2021-04-23 06:58

reporter   ~0003599

Something goes wrong even with those commits for bug PR/256
```
abuild@noether:~/rpmbuild/BUILD/file-5.40> echo -e "fo" | $PWD/src/.libs/file -m $PWD/magic/magic -
/dev/stdin: ASCII text
abuild@noether:~/rpmbuild/BUILD/file-5.40> echo -e "xx" | $PWD/src/.libs/file -m $PWD/magic/magic -
/dev/stdin: data
abuild@noether:~/rpmbuild/BUILD/file-5.40> echo -e "hi" | $PWD/src/.libs/file -m $PWD/magic/magic -
/dev/stdin: ASCII text
abuild@noether:~/rpmbuild/BUILD/file-5.40> echo -en "hi" | $PWD/src/.libs/file -m $PWD/magic/magic -
/dev/stdin: ASCII text, with no line terminators
abuild@noether:~/rpmbuild/BUILD/file-5.40> echo -en "foo" | $PWD/src/.libs/file -m $PWD/magic/magic -
/dev/stdin: data
abuild@noether:~/rpmbuild/BUILD/file-5.40> echo -e "foo" | $PWD/src/.libs/file -m $PWD/magic/magic -
/dev/stdin: ASCII text
abuild@noether:~/rpmbuild/BUILD/file-5.40> echo -e "xxx" | $PWD/src/.libs/file -m $PWD/magic/magic -
/dev/stdin: data
```

bitstreamout

2021-04-23 07:36

reporter   ~0003600

I suggest the attached patch to count every ASCII character even if it appears several times

file-5.50-ascii.patch (699 bytes)
From a806b7c99870f76c5fcf3d34f9d91f37685e1a1c Mon Sep 17 00:00:00 2001
From: Werner Fink <werner@suse.de>
Date: Fri, 23 Apr 2021 09:32:09 +0200
Subject: [PATCH] Count every ASCII character

Signed-off-by: Werner Fink <werner@suse.de>
---
 src/encoding.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git src/encoding.c src/encoding.c
index 31d4d125..686be210 100644
--- a/src/encoding.c
+++ b/src/encoding.c
@@ -282,8 +282,7 @@ looks_ ## NAME(const unsigned char *buf, size_t nbytes, file_unichar_t *ubuf, \
 	} \
 	u = 0; \
 	for (i = 0; i < __arraycount(dist); i++) { \
-		if (dist[i]) \
-			u++; \
+		u += dist[i]; \
 	} \
 	if (u < MIN(nbytes, 3)) \
 		return 0; \
-- 
2.28.0

file-5.50-ascii.patch (699 bytes)

bitstreamout

2021-04-23 11:07

reporter   ~0003601

I see that commit 3096f87f823e1e936139e48d6a3bae9a95557861 had introduced the `if (dist[i]) u++` which misdetect smaller ASCII files with and without newlines

christos

2021-04-27 19:39

manager   ~0003603

The whole character count/distribution approach leads to more confusion as it tries to solve some corner cases. It is not worth using heuristics to resolve the corner cases. I've reverted the fix to PR/180 and that should bring back the original behavior.

Issue History

Date Modified Username Field Change
2021-04-20 11:24 bitstreamout New Issue
2021-04-21 05:58 bitstreamout Note Added: 0003593
2021-04-21 08:38 bitstreamout Note Added: 0003594
2021-04-22 12:56 bitstreamout Note Added: 0003595
2021-04-22 15:25 bitstreamout Note Added: 0003596
2021-04-22 15:27 bitstreamout Note Added: 0003597
2021-04-22 15:47 bitstreamout File Added: fails.log
2021-04-22 15:47 bitstreamout Note Added: 0003598
2021-04-23 06:58 bitstreamout Note Added: 0003599
2021-04-23 07:36 bitstreamout File Added: file-5.50-ascii.patch
2021-04-23 07:36 bitstreamout Note Added: 0003600
2021-04-23 11:07 bitstreamout Note Added: 0003601
2021-04-27 19:37 christos Assigned To => christos
2021-04-27 19:37 christos Status new => assigned
2021-04-27 19:39 christos Status assigned => resolved
2021-04-27 19:39 christos Resolution open => fixed
2021-04-27 19:39 christos Fixed in Version => 5.41
2021-04-27 19:39 christos Note Added: 0003603