utf 8 - how to use shell to count Chinese characters in file encoded in UTF-8 -
cat doc.txt , following characters show:
你好 hello! 这是中文。this chinese doc.
i can use command
wc -w doc.txt
but show:
8 doc.txt
this command take characters 你好 , 这是中文 both single word, while in fact 你好 2 chinese words , 这是中文 four.
what want these chinese words counting right(there 12 words in example), out?
you can use -m
or --chars
option:
$ echo -n "你好" | wc -m
output:
2
Comments
Post a Comment