Graham Wheeler's Random Forest

Stuff about stuff

Building a Zite Replacement (Part 7)

It’s been a while since the last post but I haven’t been idle. Here are some of the things I’ve been up to: tweaking the code to parse content better moving from IPython notebook to a library that I can use to do batch operations as well as interactive exploration modifying the code do do parallel fetches - or more precisely, to operate asynchronously; because of the Python GIL I still have just one thread for now.

Building a Zite Replacement (Part 6)

Following on from last episode, I took some of the clusters that had clear cohesion and made some initial category exemplars. Here are the first few: #!python {"title": "Art", "terms": ["canvas", "painting", "pastels", "sculpture", "gallery", "photography", "landscape", "portrait", "still-life", "exhibition", "sketch"]} {"title": "Literature", "terms": ["novel", "writer", "plot", "character", "author"]} {"title": "Religion", "terms": ["Jesus", "Christianity", "Allah", "Islam", "Judaism", "Sufi", "Hindu", "karma", "sprituality", "faith", "belief", "priest", "pastor", "prayer"]} {"title": "Cooking", "terms": ["ingredients", "bake", "roast", "fry", "stir", "cook", "cooking", "recipe", "flour", "sugar", "butter", "cups", "cup", "teaspoon", "tablespoons", "vanilla"]} Note that these are deliberately in the same format as the articles in articles.

Building a Zite Replacement (Part 5)

My initial experience with clustering was somewhat disappointing. Its clear I need to do some tuning of the approach. The first thing I did was to rerun the article download process, but instead of just keeping the top ten terms and dropping their TF-IDF values, I kept them all. I think there are better ways to select the terms to use for Jacard similarity. For starters, using a fixed number of terms could lead to keeping a wildly different range of TF-IDF values for different articles.

Building a Zite Replacement (Part 4)

Following my last post, I started gathering URLs of feeds to use for sample data. First I scraped the links that I had saved in Pocket (a scarily large number). It didn’t seem like Pocket had an easy way to export this, so I loaded up Pocket in Chrome, scrolled and scrolled and scrolled until I could scroll no more, then saved the resulting web page once it was done loading.

Building a Zite Replacement (Part 3)

Since yesterdays post on term extraction, I’ve made a few tweaks. In particular I only adjust capitalization on the first words of sentences, I’m keeping numbers and hyphenation, and if there are consecutive capitalized words I turn them into single terms. For example, the terms for the Donald Trump on vaccines article have changed from: vaccines Donald Trump children doses effective vaccinations diseases Carson debate to: vaccines children Donald Trump doses effective vaccinations diseases smaller vaccination debate babies autism cause schedule studies I’m not sure why ‘Carson’ was dropped; it’s possible that the text of the article changed between the two runs.

Building a Zite Replacement (Part 2)

In the previous post I gave an overview of what needs to be built for our Zite replacement. In this post we will look at how to load an RSS feed and generate key terms for each article. In order to fetch the feed we will make use of the feedparser package, so make sure to install that first with pip, conda, or whatever you use. Another thing we’re going to want is to strip HTML tags from the articles.

Building a Zite Replacement (Part 1)

The two most used apps on my phone are Zite and Pocket. Unfortunately last year Zite was bought by Flipboard and has slowly been getting worse. Recently the top sticky article on Zite has been a post on migrating your preferences to Flipboard, but suggests Zite is not much longer for this world. This would be okay if Flipboard was a suitable replacement, but it isn’t. It’s very flashy (which I don’t like), and just doesn’t seem to get things right when it comes to serendipitous discovery of interesting content.

A Clean Sweep

A long time ago when dinosaurs roamed the earth I was following an academic career. That got subverted but I enjoyed it while it lasted. Apart from graduate courses in compilers, I got to teach everything from computer architecture to assembly language programming to an introductory computing course for social science students. When teaching assembly language (in which I was lucky enough to be able to use the M68000, a dream of a processor), one of the samples I used to illustrate a number of topics like multi-dimensional arrays, recursion and function pointers, was a Mine Sweeper game.

Maslow's Hierarchy and your Team

I mentioned in a previous post that I would talk about Maslow’s Hierarchy of Needs in relation to team health. Maslow described a set of layers of needs with respect to human motivation: physiological - the basic needs for survival (food, water, shelter, air, etc) safety - health, well-being, security love/belonging - friends and family esteem - respect of others and (more importantly) self-respect self-actualization - becoming the most you can be self-transcendence - altruism and given of yourself to others Maslow considered the first four to be necessary for mental health and a pre-requisite for self-actualization and transcendence.

1-on-1s can take a hike!

Are you a people manager? Do you enjoy doing 1-1s with your reports? No? Then you’re doing them wrong! 1-1s are a valuable and important part of both managing your team and becoming a better manager. You should see them as as much for your benefit as for your reports - arguably more so! Not only are you getting the opportunity to keep your reports on track and on a growth path, but it’s a great opportunity to get feedback yourself.