K-Lab

Unicode Text Inspector

Text Input
Characters
0 chars
#ChrCodepointNameCategoryBlockUTF-8UTF-16

All processing happens in your browser. No text is sent to any server.

About this tool

Common invisible Unicode characters

The most problematic invisible characters you will encounter are Zero-Width Space (U+200B), which is silently inserted by many web content editors and rich text processors; Zero-Width Joiner (U+200D), used in emoji sequences and complex scripts to control glyph joining; and Zero-Width Non-Joiner (U+200C), which prevents ligature formation in languages like Persian and Arabic. Soft Hyphen (U+00AD) marks optional line-break points but is invisible at other positions. The Byte Order Mark (U+FEFF) appears at the start of files saved by Windows applications like Notepad and can break shell scripts or JSON parsers that do not expect leading bytes. Right-to-Left Mark (U+200F) and Left-to-Right Mark (U+200E) control bidirectional text rendering and are commonly injected when copying text from Hebrew or Arabic sources. These characters typically enter your workflow through copy-paste from websites, PDF documents, word processors, or messaging applications that embed formatting metadata into plain text.

Why invisible characters cause bugs

Invisible characters break software in ways that are exceptionally difficult to diagnose because the source code or data looks correct to the human eye. String equality comparisons fail silently when one string contains a zero-width space that the other does not, leading to hash mismatches, failed dictionary lookups, and broken conditional logic. JSON and YAML parsers reject input that contains unexpected control characters or byte order marks, producing cryptic syntax errors that point to seemingly valid lines. Regular expressions fail to match because a zero-width character sits between otherwise matching characters, splitting the expected pattern. In source code, invisible characters can create variable names that appear identical in the editor but are treated as distinct identifiers by the compiler, causing "undefined variable" errors on lines where the variable is plainly visible. These issues are most common in code copied from web pages, PDF documentation, Slack or Teams messages, and Stack Overflow answers.

Homoglyph attacks

Homoglyph attacks exploit the visual similarity between characters from different Unicode scripts to deceive users and systems. In phishing campaigns, attackers register internationalized domain names where one or more Latin characters are replaced with Cyrillic, Greek, or other lookalikes. For example, replacing the Latin "a" in paypal.com with Cyrillic "a" (U+0430) produces a URL that displays identically in the browser address bar but resolves to a completely different server. Modern browsers mitigate this with Punycode display rules, but mixed-script domains can still bypass detection in email clients and chat applications. In source code, homoglyph attacks are even more insidious: a malicious contributor can introduce a variable or function name using lookalike characters from another script, creating a backdoor that passes code review because the identifier looks correct. Use this inspector to scan suspicious text, and use the Code Diff tool to compare two text versions and reveal hidden character differences.

Frequently Asked Questions

What are zero-width characters?

Zero-width characters are invisible Unicode characters that occupy no visual space in rendered text but are present in the underlying string data. The most common ones are zero-width space (U+200B), which is often inserted by web editors and word processors; zero-width joiner (U+200D), used in emoji sequences and complex scripts like Devanagari to control ligature formation; and zero-width non-joiner (U+200C), which prevents ligatures where they would normally appear. Other notable examples include the byte order mark (U+FEFF), soft hyphen (U+00AD), and various directional formatting characters like left-to-right mark (U+200E) and right-to-left mark (U+200F). These characters frequently enter codebases through copy-paste from websites, rich text editors, PDF documents, and messaging applications. They cause subtle bugs because they are invisible in most editors yet affect string length, equality comparisons, regular expression matching, and data parsing. This tool flags every zero-width character and shows its exact Unicode code point.

How do I detect invisible Unicode characters?

Paste or type your text into the input panel on the left side of this tool. The inspector immediately analyses every character in your input and categorises each one as normal, invisible, control, or homoglyph. The character table on the right panel lists every character with its Unicode code point, name, and category. Problematic characters are highlighted with colour-coded badges in the summary banner at the top: red for invisible and control characters, orange for homoglyphs. You can click any badge to jump directly to the next occurrence of that character type in the editor panel. The editor itself uses inline decorations to mark invisible characters so you can see exactly where they sit within your text. Once you have identified the problematic characters, use the Clean and Copy button to strip all invisible and control characters and copy the cleaned text to your clipboard. This workflow is particularly useful when debugging string comparison failures or JSON parsing errors caused by hidden characters.

What are homoglyphs?

Homoglyphs are characters from different Unicode scripts that appear visually identical or nearly indistinguishable from common ASCII characters but have entirely different code points. For example, Cyrillic "a" (U+0430) is a pixel-perfect match for Latin "a" (U+0061) in most fonts, and Greek omicron (U+03BF) looks identical to Latin "o" (U+006F). This visual similarity is exploited in internationalized domain name (IDN) homograph attacks, where attackers register domains like "xn--pypal-4ve.com" that display as "paypal.com" with a Cyrillic "a". In source code, homoglyphs create variables or function names that appear identical in the editor but are treated as completely separate identifiers by the compiler or interpreter. This can introduce security vulnerabilities or hard-to-trace bugs. This tool detects homoglyphs by comparing each character against a database of known lookalikes and flags them with an orange indicator, letting you quickly identify whether your text contains characters from unexpected scripts.

Is my text data safe?

Yes, your text data is completely safe when using this tool. All character analysis and inspection happens entirely within your browser using client-side JavaScript. No text you paste or type into the inspector is ever transmitted to any external server, API endpoint, or third-party service. The Unicode analysis logic runs as a pure function that processes your input string locally, examining each character against built-in Unicode property tables and homoglyph databases that are bundled with the application code. There are no network requests made during analysis, which you can verify by opening your browser developer tools and monitoring the Network tab while using the tool. This client-side architecture means the tool works fully offline after the initial page load, making it suitable for inspecting sensitive text such as API keys, passwords, proprietary source code, or confidential documents. The clean and copy feature similarly operates entirely in-browser, using the Clipboard API to write the sanitized text directly to your system clipboard without any server round-trip.