You'll find a list of all my blog posts in the blog archive.

Book search

Blog categories: 

So they're happily bashing windows live search over on The Register ...

However, on the book search side, Microsoft's live book search (no longer on - it's all on now) is miles beyond anything google book search has to offer.

It's like, google books will forever be in beta. Because, honest, who'll rescan all those cut-off, missing, hands-whisking-things-away-too-fast, way-to-wide-margined, and other similarly unreadable (and thus, unusable) pages in all the hundreds of books they have online? Sure, there's the "report errors to us" thingy, but seeing the quality of the books they do have, reporting all the errors in even one book would take hours. They should throw out the people who've scanned for them, and get in experts to redo the lot for them. Will they? Hell no. They've OCR'd these works, sorta kinda (if you squint), they can serve ads with any search results, and they're quite happy at leaving things in the abysmal state they're currently in.

As an additional minus, the pages in their .pdf files are stark black'n'white, instead of the sensible grayscale that anybody at all who has a clue about scanning text is using.

Bleh for google books, sez I.

Windows books, on the other hand, actually has every single page of every book they have online fully scanned and readable, and in grayscale, at that. Unfortunately, they're fixed up so you can't just extract the .jpegs or .whatnots and OCR them yourself, but hey, you can't get everything, can you? (It's a weird new JPEG2000/Part6 format. Here's hoping xpdf will catch those, soonish.)

And as a further bonus, MS' scans are available on the internet archive texts site, in multiple formats. (The .txt files are all helter-skelter, though.)

Whatever, here's a big well done! to MS live search.

Still, on the anti-microsoft front: get Firefox, kids. MS' Internet Explorer is so enormously unsafe that it should be banned.

Oooh. This just in ... lovely! However, diving into that, it's Apache errors all the way. Way to go ...