Software

Fake News Isn’t About Truth, It’s About Gaming Belief Systems

[cross-post from Medium] Thinking about #fakenews. Starting with “what is it”. * We’re not dealing with truth here: we’re dealing with gaming belief systems. That’s what fake news does (well, one of the things; another thing it does is make money from people reading it), and just correcting fake news is aiming at the wrong thing. Because… * Information leaves traces in our heads, even when we know what’s going on. If I jokingly tell you that I’ve crashed your car, then go ‘ha ha’, you know that I didn’t crash your car, but I’ve left a trace in your head that I’m an unsafe driver. The bigger the surprise of the thing you initially believe, the bigger the trace it leaves (this is why I never make jokes like that). * That’s important because #fakenews isn’t about the thing that’s being said. It’s about the things that are being…

Software

The Internet is made of beliefs

[cross-post from Medium] “Most people don’t have the time or headspace to handle IW: we’re going to need to tool up. Is not much, but I’m talking next month on belief, and how some of the pre-big-data AI tools and verification methods we used in mapping could be useful in this new (for many) IW world… am hoping it sparks a few people to build stuff.” — me, whilst thoroughly lost somewhere in Harlem. Dammit. I’ve started talking about belief and information warfare, and my thoughts looked half-baked and now I’m going to have to follow through. I said we’d need to tool up to deal with the non-truths being presented, but that’s only a small part of the thought. So here are some other thoughts. 1) The internet is also made of beliefs. The internet is made of many things: pages and and comment boxes and ports and protocols and tubes…

Augmented Intelligence

The Ethics of Algorithms

I opened a discussion on the ethics of algorithms recently, with a small thing about what algorithms are, what can be unethical about them and how we might start mitigating that. I kinda sorta promised a blogpost off that, so here it is. Algorithm? Wassat? Al-Khwarizmi (wikipedia image) Let’s start by demystifying this ‘algorithm’ thing. An algorithm (from Al-Khwārizmī, a 9th-century Persian mathematician, above) is a sequence of steps to solve a problem. Like the algorithm to drink coffee is to get a mug, add coffee to the mug, put the mug to your mouth, and repeat. An algorithm doesn’t have to be run a computer: it might be the processes that you use to run a business, or the set of steps used to catch a train. But the algorithms that the discussion organizers were concerned about aren’t the ones used to not spill coffee all over my face….

Software

Infosec, meet data science

I know you’ve been friends for a while, but I hear you’re starting to get closer, and maybe there are some things you need to know about each other. And since part of my job is using my data skills to help secure information assets, it’s time that I put some thoughts down on paper… er… pixels. Infosec and data science have a lot in common: they’re both about really really understanding systems, and they’re both about really understanding people and their behaviors, and acting on that information to protect or exploit those systems.  It’s no secret that military infosec and counterint people have been working with machine learning and other AI algorithms for years (I think I have a couple of old papers on that myself), or that data scientists and engineers are including practical security and risk in their data governance measures, but I’m starting to see more profound crossovers between the two….

Software

Notes from John Sarapata’s talk on online responses to organised adversaries

John Sarapata (@JohnSarapata) = head of engineering at Jigsaw  (= new name for Google Ideas).  Jigsaw = “the group at Google that tries to help users facing organized violence and oppression”.  A common thread in their work is that they’re dealing with the outputs from organized adversaries, e.g. governments, online mobs, extremist groups like ISIS. One example project is redirectmethod.org, which looks for people who are searching for extremist connections (e.g. ISIS) and shows them content from a different point of view, e.g. a user searching for travel to Aleppo might be shown realistic video of conditions there. [IMHO this is a useful application of social engineering in a clear-cut situation; threats and responses in other situations may be more subtle than this (e.g. what does ‘realistic’ mean in a political context?).] The Jigsaw team is looking at threats and counters at 3 levels of the tech stack: device/user: activities are consume and create content; threats include attacks by governments, phishing, surveillance,…

Software

Why am I writing about belief?

[Cross-post from LinkedIn] I’ve been meaning to write a set of sessions on computational belief for a while now, based on the work I’ve done over the years on belief, reasoning, artificial intelligence and community beliefs. With all that’s happening in our world now, both online and in the “real world”, I believe that the time has come to do this. We could start with truth. We often talk about ‘true’ and ‘false’ as though they’re immovable things: that every statement should be able to be assigned one of these values. But it’s a little more complicated than that. What we see as ‘true’ is often the result of a judgement we made, given our perception and experience of the world, that a belief is close enough to certain to be ‘true’. But what is there are no objective truths? In robotics, we talk about “ground truth” and the “god’s…

ICanHazDatascience

Data Science Tools, or what’s in my e-backpack?

One of the infuriating (and at the same time, strangely cool) things about development data science is that you quite often find yourself in the middle of nowhere with a job to do, and no access to the Internet (although this doesn’t happen as often as many Westerners think: there is real internet in most of the world’s cities, honest!). Which means you get to do your job with exactly what you remembered to pack on your laptop: tools, code, help files, datasets and academic papers.  This is where we talk about the tools. The Ds4B toolset This is the tools list for the DS4B course (if you’re following the course, don’t panic: install notes are here). Offline toolset: Already on the machine: Terminal window Calculator Anaconda: Jupyter notebooks Python (version 3) R (need to add this to Anaconda) Rstudio OpenRefine D3 libraries Tabula Excel or LibreOffice (opensource equivalent) QGIS (Mac users: note the separate instructions on this!) GDAL…

ICanHazDatascience

Data Science Ethics [DS4B Session 1e]

This is what I usually refer to as the “Fear of God” section of the course… Ethics Most university research projects involving people (aka “human subjects”) have to write and adhere to an ethics statement, and adhere to an overarching ethics framework, e.g. “The University has an ethical commitment to minimize the risks to research subjects and to ensure that individuals who participate in research projects conducted under its auspices… do so voluntarily and with an informed understanding of what their involvement will mean”.  Development data scientists are not generally subject to ethics reviews, but that doesn’t mean we shouldn’t also ask ourselves the hard questions about what we’re doing with our work, and the people that it might affect. At a minimum, if you make data public, you have a responsibility, to the best of your knowledge, skills, and advice, to do no harm to the people connected to that data.  Data…

ICanHazDatascience

Writing a problem statement [DS4B Session 1d]

Data work can sometimes seem meaningless.  You go through all the training on cool machine learning techniques, find some cool datasets to play with, run a couple of algorithms on them, and then.  Nothing. That sinking feeling of “well, that was useless”. I’m not discouraging play. Play is how we learn how to do things, where we can find ideas and connect more deeply to our data.  It can be a really useful part of the “explore the data” part of data science, and there are many useful playful design activities that can help with “ask an interesting question”.   But data preparation and analysis takes time and is full of rabbitholes:  interesting but time-consuming things that aren’t linked to a positive action or change in the world. One thing that helps a lot is to have a rough plan: something that can guide you as you work through a data…

ICanHazDatascience

Data Science is a Process [DS4B Session 1c]

People often ask me how they can become a data scientist. To which my answers are usually ‘why’, ‘what do you want to do with it’ and ‘let’s talk about what it really is’.  So let’s talk about what it really is.  There are many definitions of data science, e.g.: “A data scientist… excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge.” “The analysis of data using the scientific method” “A data scientist is an individual, organization or application that performs statistical analysis, data mining and retrieval processes on a large amount of data to identify trends, figures and other relevant information.” We can spend hours debating which definition is ‘right’, or we could spend those hours looking at what data scientists do in practice, getting some tools and techniques under our belts and finding a definition that works for each one of us personally….