Data Science

Project idea: mining crisis googlegroups

The Problem Let’s start with “what is the problem I’m trying to solve here”. I’m responsible for designing Hunchworks.  I’m also responsible for making it what the users need, and for understanding and fitting into the systems that they already use to do hunch-style collaborative inference. Now we know a bit about business analysis, so we’ve spent a while looking for these existing systems, and found them mainly in two places: closed discussion groups (e.g. Googlegroups), and Skype group chats. I’ve traced (and anonymised) a couple of threads by hand – the main point is the type of information that people post rather than who and what it is – but whilst I was doing it I realised there were mining tools that could help with this, and my usual problem of not being able to find information on X in a Googlegroup after it’s posted (no, the search doesn’t…

Data Science

Project idea: visualising crisismapping categories

I’m a crisismapper. I’ve seen or worked on most crisis Ushahidi maps since January 2010, and I’ve watched the categories used on them split and evolve over time (from Haiti’s emergencies and public health to Snowmageddon’s problems / solutions and beyond). Cat from Humanity Road has kept a screenshot of the categories on each of the main deployments since then – when I saw it today, it reminded me immediately of work that Global Pulse did on visualising category evolution across news articles, with articles as nodes connected by subject, and coloured by main category, as discovered by gisting the articles. Except this time, we don’t have to guess the categories (although doing that later by mining text in the reports in each category could be fun).  Each Ushahidi map comes with a set of categories – each report (piece of geolocated information) is tagged by the categories it belongs…