Oracle UTF8 encoding check

You can pretty easily see the raw bytes used for encoding a string:

This can be useful for comparing two strings — as one knows just because two strings are rendered in the same way, it does not mean that the UNICODE characters they are made of are the same, the actual UNICODE graphemes can be different.

Now that we have the raw representation of the string we can convert it back to UTF8. This can also serve as confirmation that the string was correctly encoded and that the encoding was UTF8 to start with.

A more automated way to check that the strings are correctly encoded is below (thanks StackOverflow). It relies on the fact that incorrect encoding to UTF8 results in the following characters (hex): “EFBFBD”.

 

Leave a Reply

Your email address will not be published. Required fields are marked *