To start with, I started playing around manually, using a simple conversion app I wrote in Java, using the IDN class. After a few minutes playing around it was fairly clear that it was never going to happen with this approach. Generating unicode URLs using two valid english words - i.e. xn--apple-banana - was a faster approach. It still involved far too much effort, but introducing a few filters helped to cull out urls that are obviously not words. Some of the more interesting ones were:
- http://www.ωaψward.gr --> http://xn--award-beef.gr
- кill.gr --> xn--ill-bed.gr
- ႦႲႶႻႸႽ.org --> xn--endymion.org (endyion is SO a word)
- 汤.cn --> xn--ftw (汤, means soup in chinese, and soup is clearly ftw)
- f̹ace̸̸bo̸ok.com--> xn--facebook-deface.com (facebook defaced, get it?)
- ̱yahoo.cn --> xn--yahoo-end.cn ( Yahoo ended by homograph attack!)
I tried registering http://www.ωaψward.gr on greek registrars since it doesn't mix unicode from different languages, but I was thwarted, mainly by my complete lack of greek & verified by visa completely failing. (wtf is that shit anyway...srsly) If anyone more clueful than me can shed some light on unicode domain rules that would be cool (Yes Chris Weber, I am talking to you.)