Monday, June 12, 2017

Japanese Character & String conversions in terminal

Unicode blocks for Japanese characters

  • 3000-303F is CJK Symbols and Punctuation.
  • 3040-309F is Hiragana.
  • 30A0-30FF is Katakana.
  • 4E00-9FFF is CJK Unified Ideographs.
  • FF00-FFEF is Half-width and Full-width Forms.
To convert between hiragana and katakana, shift code points by 0x60:
$ echo ひらがな|tr  $'[\u3040-\u309f]' $'[\u30a0-\u30ff]'
ヒラガナ
$ echo カタカナ|tr  $'[\u30a0-\u30ff]' $'[\u3040-\u309f]'
かたかな
To convert between full width and half width, use hyphen and tilde:
$ echo example|tr ' -~' $'\u3000\uff01-\uff5e'
example
$ echo example|tr $'\u3000\uff01-\uff5e' ' -~'
example

morpho!

No comments:

Post a Comment

Going one step further with Kotlin & gRPC

Recently, I tried using Quarkus with Kotlin for grpc. I have worked with grpc for communication between microservices in Java & Golang. ...