Discussion:
ASCII files
(too old to reply)
Henk Robbers
2016-03-23 13:35:44 UTC
Permalink
LS

I want to determine whether a file should not be
considered ASCII text based on the percentage of non printable
characters in the file.

Which percentage is generally accepted?
--
Groeten; Regards.
Henk Robbers. http://members.chello.nl/h.robbers
Interactive disassembler: Digger; http://digger.atari.org
A Home Cooked C compiler: AHCC; http://ahcc.atari.org
Miro Kropáček
2016-03-27 15:15:35 UTC
Permalink
Post by Henk Robbers
I want to determine whether a file should not be
considered ASCII text based on the percentage of non printable
characters in the file.
Which percentage is generally accepted?
The 'file' tool source code may provide some good clues.
Arachide
2016-03-27 15:23:26 UTC
Permalink
Post by Henk Robbers
LS
I want to determine whether a file should not be
considered ASCII text based on the percentage of non printable
characters in the file.
Which percentage is generally accepted?
It depends on the language I think.
In standard english, you don't use characters above 128. But in french
(and others), there a a lot of letters with accents that appear above 128.

So, if you consider that the file is in english, you should have a
maximum of:

- CR/LF (13 and 10)
- TAB (9)
- Space (32)
- characters from 33 to 127.

If the file is in another language, some characters above 128.

In my opinion, the marker for a text file would be the total lack of
bytes under 32 except 9/10/13.

Guillaume.

Loading...