Data Science

Basic Materials: Werdz an Regulah Expreshuns

[Cross-posted from ICanHazDataScience] Okay, that last post was a bit long for Emily… she fell asleep on my desk long before I’d finished typing.  So today we’re back to short and practical. Data is not just numbers.  Numbers are one of the basic types of data that appear again and again in data science.   Two of those types are words (as in written text, like this blogpost) and networks (as in objects connected with links – like a diagram of your twitter friends and your friends’ friends etc).  Today we’re looking at words. In the last post, I was looking at a set of online job descriptions.  We’ll leave the basics of webpage scraping til later (but if you’re curious, ScraperWiki’s notes are good) and assume that what we have is a set of text files that we’ve used the “Processing all the teh Files in Directory” post with the commands fin…

Data Science

Wut 2 Do Wif Data?

Data science is not about data. Data science is about insight – the knowledge and suggestions that you can glean by inspecting and using data. And that insight usually starts with a set of questions.  Here are some examples, hopefully making you think a bit more about your own questions (which in Emily’s case is the correlation between cuteness, cuddles and the amount of Meow Mix in her dish). You don’t always know what the good questions are, but you usually know (or pick) the framework that you’re asking them in.  This is how I usually approach this: Look at context – ask question (or get question from user) Get data Phrase question in way that data can answer Write down issues with data Clean data Investigate question Check conclusions and possible issues with conclusions Describe possible further investigations / data gathering Which might mean improving on the data that…