Building a Zite Replacement (Part 8)

Happy Halloween, all!

I'm sitting here handing out candy and glow necklaces to all comers so its a good time to write a new post.

It's been a while since much happened as I've been really busy with the beta release of Google Cloud Datalab, which is my day job. But now that is out and it's the weekend and lousy weather here in the Pacific Northwest it's been a good day t get back to things.

Today I did something I've been meaning to do for a while, which is to change the code to populate a MongoDB database rather than writing files to the file system. Interestingly it seems to be quite a bit slower than the file system but hopefully it will scale better and make up for things when I have ad-hoc queries to do.

Read more…

Graham Wheeler on

Node, npm and Express

Things have been slow on the blogging front but there has been progress on the Zite replacement. I'll write more about that soon but part of what I have been doing is looking into what server-side technology to use.

As far as a database goes, this seems like a no-brainer. I'm dealing with JSON documents that I can either spend some effort on normalizing to put into a SQL database, or simply keep them as is and put them in a database that supports that form, and the obvious choice then is MongoDB, which uses a binary form of JSON.

Read more…

Graham Wheeler on

Building a Zite Replacement (Part 7)

It's been a while since the last post but I haven't been idle. Here are some of the things I've been up to:

  • tweaking the code to parse content better
  • moving from IPython notebook to a library that I can use to do batch operations as well as interactive exploration modifying the code do do parallel fetches - or more precisely, to operate asynchronously; because of the Python GIL I still have just one thread for now. But I can kick off up to 40 HTTP requests at a time which speeds things up a lot, as I have about 4000 sites I'm working with now;
  • exploring the TextBlob library, a library that sits above the Python NLTK and can parse sentences and words (more on that below)
  • building a GUI application with Tkinter that lets me quickly view feeds, terms, categories and articles, delete feeds, tweak category examplars and see the results, and so on. This has been invaluable in building up and fine tuning my category examplars, although it is still a work in progress. It's been somewhat painful as I haven't used Tk in about two decades but I've mostly got it to do what I want.

    Read more…

Graham Wheeler on