Software

This is not my journey

I spent some of my Christmas break thinking about work styles: what worked last year, what didn’t, and what I could do to improve my own.  I’ve got it down to just two things: “this is not my journey” and “do what the boss asks for”. People often talk of their jobs (and themselves) as something that they do now, as in at one particular point in time. That’s a little like saying “I’m in seat 29C” instead of “I’m flying from New York to Japan and when I get there I’m going to try out the heated toilet seats” when someone asks you where you are.  We are all on journeys – sometimes literally, but always on journeys through time, careers, relationships.  And if you want to think about your career, a journey is a useful idea. So last year I got really frustrated because I ended up doing…

Data Science

Web Scraping, part 1: files and APIs

Web scraping is extracting information from webpages, usually (but not always) as tables of data that you can save to csv files, json/xml files or databases. Design it first, then scrape it When you start on any piece of code, try asking yourself some design questions first; definitely do this if you’re thinking about something as potentially complex as web scraping code.   So you’ve seen a dataset hiding in a website – it might be a table of data that you need, or lists of data, or data spread across multiple pages on the site. Here are some questions for you: 1. Do you need to write scraper code at all? Is the dataset very small – if you’re talking about 10 values that don’t get updated, writing a scraper will take longer than just typing all the numbers into a spreadsheet. Has the site owner made this data…