Question 1

What are zero-width characters?

Accepted Answer

Zero-width characters are invisible Unicode characters that occupy no visual space in rendered text but are present in the underlying string data. The most common ones are zero-width space (U+200B), which is often inserted by web editors and word processors; zero-width joiner (U+200D), used in emoji sequences and complex scripts like Devanagari to control ligature formation; and zero-width non-joiner (U+200C), which prevents ligatures where they would normally appear. Other notable examples include the byte order mark (U+FEFF), soft hyphen (U+00AD), and various directional formatting characters like left-to-right mark (U+200E) and right-to-left mark (U+200F). These characters frequently enter codebases through copy-paste from websites, rich text editors, PDF documents, and messaging applications. They cause subtle bugs because they are invisible in most editors yet affect string length, equality comparisons, regular expression matching, and data parsing. This tool flags every zero-width character and shows its exact Unicode code point.

Question 2

How do I detect invisible Unicode characters?

Accepted Answer

Paste or type your text into the input panel on the left side of this tool. The inspector immediately analyses every character in your input and categorises each one as normal, invisible, control, or homoglyph. The character table on the right panel lists every character with its Unicode code point, name, and category. Problematic characters are highlighted with colour-coded badges in the summary banner at the top: red for invisible and control characters, orange for homoglyphs. You can click any badge to jump directly to the next occurrence of that character type in the editor panel. The editor itself uses inline decorations to mark invisible characters so you can see exactly where they sit within your text. Once you have identified the problematic characters, use the Clean and Copy button to strip all invisible and control characters and copy the cleaned text to your clipboard. This workflow is particularly useful when debugging string comparison failures or JSON parsing errors caused by hidden characters.

Question 3

What are homoglyphs?

Accepted Answer

Homoglyphs are characters from different Unicode scripts that appear visually identical or nearly indistinguishable from common ASCII characters but have entirely different code points. For example, Cyrillic "a" (U+0430) is a pixel-perfect match for Latin "a" (U+0061) in most fonts, and Greek omicron (U+03BF) looks identical to Latin "o" (U+006F). This visual similarity is exploited in internationalized domain name (IDN) homograph attacks, where attackers register domains like "xn--pypal-4ve.com" that display as "paypal.com" with a Cyrillic "a". In source code, homoglyphs create variables or function names that appear identical in the editor but are treated as completely separate identifiers by the compiler or interpreter. This can introduce security vulnerabilities or hard-to-trace bugs. This tool detects homoglyphs by comparing each character against a database of known lookalikes and flags them with an orange indicator, letting you quickly identify whether your text contains characters from unexpected scripts.

Question 4

Is my text data safe?

Accepted Answer

Yes, your text data is completely safe when using this tool. All character analysis and inspection happens entirely within your browser using client-side JavaScript. No text you paste or type into the inspector is ever transmitted to any external server, API endpoint, or third-party service. The Unicode analysis logic runs as a pure function that processes your input string locally, examining each character against built-in Unicode property tables and homoglyph databases that are bundled with the application code. There are no network requests made during analysis, which you can verify by opening your browser developer tools and monitoring the Network tab while using the tool. This client-side architecture means the tool works fully offline after the initial page load, making it suitable for inspecting sensitive text such as API keys, passwords, proprietary source code, or confidential documents. The clean and copy feature similarly operates entirely in-browser, using the Clipboard API to write the sanitized text directly to your system clipboard without any server round-trip.

Unicode Text Inspector

Common invisible Unicode characters

Why invisible characters cause bugs

Homoglyph attacks

Frequently Asked Questions