html2xhtmlが使えない件
html2xhtmlをインストールしてみたが日本語には対応してないようだorz
$ curl http://www.yahoo.co.jp > /tmp/test.html $ ./html2xhtml /tmp/test.html input buffer overflow, can't enlarge buffer because scanner uses REJECT
日本語のせいだと思って注意書きを見たらやっぱり…
HTML ENCODINGS:
This program only works for input encodings that code every ASCI character (0-127) only with one byte. This includes, for example, ISO-8859-1, ISO-8859-15, UTF-8. However, the program does not work for encoding UTF-16. For converting UTF-16 files, please convert them first to UTF-8. UTF-8 is not guaranteed to work always properly, and more testing is required.
UTF-8はいつも動くとは限らないんだってさ…他の漢字コードにしてもダメ。使えねえorz