Building a Zite Replacement (Part 6)

Following on from last episode, I took some of the clusters that had clear cohesion and made some initial category exemplars. Here are the first few:

1
2
3
4
5
6
7
{"title": "Art", "terms": ["canvas", "painting", "pastels", "sculpture", "gallery", "photography",
        "landscape", "portrait", "still-life", "exhibition", "sketch"]}
{"title": "Literature", "terms": ["novel", "writer", "plot", "character", "author"]}
{"title": "Religion", "terms": ["Jesus", "Christianity", "Allah", "Islam", "Judaism", "Sufi",
        "Hindu", "karma", "sprituality", "faith", "belief", "priest", "pastor", "prayer"]}
{"title": "Cooking", "terms": ["ingredients", "bake", "roast", "fry", "stir", "cook", "cooking",
        "recipe", "flour", "sugar", "butter", "cups", "cup", "teaspoon", "tablespoons", "vanilla"]}

Note that these are deliberately in the same format as the articles in articles.txt, although without as many fields.

Read more…

Graham Wheeler on

Building a Zite Replacement (Part 5)

My initial experience with clustering was somewhat disappointing. Its clear I need to do some tuning of the approach. The first thing I did was to rerun the article download process, but instead of just keeping the top ten terms and dropping their TF-IDF values, I kept them all. I think there are better ways to select the terms to use for Jacard similarity.

For starters, using a fixed number of terms could lead to keeping a wildly different range of TF-IDF values for different articles. It makes more sense to have some threshold value and keep all terms that exceed the threshold. That may mean more than ten terms for some articles and less for others.

Read more…

Graham Wheeler on

Building a Zite Replacement (Part 4)

Following my last post, I started gathering URLs of feeds to use for sample data. First I scraped the links that I had saved in Pocket (a scarily large number). It didn't seem like Pocket had an easy way to export this, so I loaded up Pocket in Chrome, scrolled and scrolled and scrolled until I could scroll no more, then saved the resulting web page once it was done loading. It was pretty easy to then scrape that to get the links. After sorting and uniq-ing those, and running them through feedfinder, I had somewhere north of 1000 feeds. However, these were very skewed to my interests and I wanted diversity so I pressed on and scraped a bunch of blog rolls and other link collections covering many other areas. In the end I got about 2,500 feed URLs to start with, in a file named 'feeds.txt'.

Read more…

Graham Wheeler on

Building a Zite Replacement (Part 3)

Since yesterdays post on term extraction, I've made a few tweaks. In particular I only adjust capitalization on the first words of sentences, I'm keeping numbers and hyphenation, and if there are consecutive capitalized words I turn them into single terms.

For example, the terms for the Donald Trump on vaccines article have changed from:

Read more…

Graham Wheeler on

Building a Zite Replacement (Part 2)

In the previous post I gave an overview of what needs to be built for our Zite replacement. In this post we will look at how to load an RSS feed and generate key terms for each article. In order to fetch the feed we will make use of the feedparser package, so make sure to install that first with pip, conda, or whatever you use.

Another thing we're going to want is to strip HTML tags from the articles. I did a Google for "HTML element stripper Python" and found this StackOverflow post with the code below that works great:

Read more…

Graham Wheeler on

Building a Zite Replacement (Part 1)

The two most used apps on my phone are Zite and Pocket. Unfortunately last year Zite was bought by Flipboard and has slowly been getting worse. Recently the top sticky article on Zite has been a post on migrating your preferences to Flipboard, but suggests Zite is not much longer for this world.

This would be okay if Flipboard was a suitable replacement, but it isn't. It's very flashy (which I don't like), and just doesn't seem to get things right when it comes to serendipitous discovery of interesting content. My feeling is that it is probably a great app for people who are interested in news and pop culture, but my interests run more specialized; I want to read about certain programming languages and fields of math, computer science and statistics.

Read more…

Graham Wheeler on

A Clean Sweep

A long time ago when dinosaurs roamed the earth I was following an academic career. That got subverted but I enjoyed it while it lasted. Apart from graduate courses in compilers, I got to teach everything from computer architecture to assembly language programming to an introductory computing course for social science students.

When teaching assembly language (in which I was lucky enough to be able to use the M68000, a dream of a processor), one of the samples I used to illustrate a number of topics like multi-dimensional arrays, recursion and function pointers, was a Mine Sweeper game. I recently dug it out and turned in into Python just for the heck of it.

So here it is. There are no fancy graphics, the field is just printed out as a 2D array of ASCII characters. To enter a move you use coordinates using a number for the row and letter for the column. E.g. 1C is the first row, third column (rows are numbered from 1, not 0). You can plant a flag (or clear it) by preceding your move with a '-'.

Read more…

Graham Wheeler on

Maslow's Hierarchy and your Team

I mentioned in a previous post that I would talk about Maslow's Hierarchy of Needs in relation to team health.

Maslow described a set of layers of needs with respect to human motivation:

  • physiological - the basic needs for survival (food, water, shelter, air, etc)
  • safety - health, well-being, security
  • love/belonging - friends and family
  • esteem - respect of others and (more importantly) self-respect
  • self-actualization - becoming the most you can be
  • self-transcendence - altruism and given of yourself to others

Maslow considered the first four to be necessary for mental health and a pre-requisite for self-actualization and transcendence. This hierarchy of needs is a useful model; it's not science but can help us understand shortcomings in ourselves and others that may need addressing to reach our full potential.

So how does this relate to teams and management? We can come up with a similar set of needs for healthy teams and team members:

Read more…

Graham Wheeler on

1-on-1s can take a hike!

Are you a people manager? Do you enjoy doing 1-1s with your reports? No? Then you're doing them wrong!

1-1s are a valuable and important part of both managing your team and becoming a better manager. You should see them as as much for your benefit as for your reports - arguably more so! Not only are you getting the opportunity to keep your reports on track and on a growth path, but it's a great opportunity to get feedback yourself. Unless you're the kind of manager who thinks managing is all about giving orders and having them carried out, in which case this post isn't for you; go enlist or something.

There are a number of reasons why managers (and employees) don't enjoy 1-1s:

Read more…

Graham Wheeler on

Simply Solving Sudoku

It's been a long time since I last blogged. I've been meaning to for oh so long but you know what they say about the road to hell. For a while I maintained my math blog but even that has been fallow for some time.

Fortunately, I stumbled upon Peter Norvig's article about solving Sudoku, and that has provided the impetus. His approach is probably the most sensible I have seen for a while; there seem to be some really bad solvers out there. I was unimpressed with the one by Skiena in the Algorithm Design Manual, although the truly ridiculous one has to be the so-overblown-I-took-a-whole-book approach in Programming Sudoku; the mind simply boggles at how complex some people make trivial things.

Because solving Sudoku is indeed trivial. There is nothing to it. All you need is a very simple backtracking search. I wrote the solver below originally more than 10 years ago in Javascript and it was used as a generator running on extremely low-powered Microsoft SPOT smart watches. For fun I dug it out, turned it into Python, and tested it on "the world's hardest Sudoku puzzle". How fast can you blink?

Read more…

Graham Wheeler on