The Ethics of Algorithms

I opened a discussion on the ethics of algorithms recently, with a small thing about what algorithms are, what can be unethical about them and how we might start mitigating that. I kinda sorta promised a blogpost off that, so here it is.

Algorithm? Wassat?

muh%cc%a3ammad_ibn_musa_al-khwarizmi

Al-Khwarizmi (wikipedia image)

Let’s start by demystifying this ‘algorithm’ thing. An algorithm (from Al-Khwārizmī, a 9th-century Persian mathematician, above) is a sequence of steps to solve a problem. Like the algorithm to drink coffee is to get a mug, add coffee to the mug, put the mug to your mouth, and repeat. An algorithm doesn’t have to be run a computer: it might be the processes that you use to run a business, or the set of steps used to catch a train.

But the algorithms that the discussion organizers were concerned about aren’t the ones used to not spill coffee all over my face. They were worried about the algorithms used in computer-assisted decision making in things like criminal sentencing, humanitarian aid, search results (e.g. which information is shown to which people) and the decisions made by autonomous vehicles; the algorithms used by or instead of human decision-makers to affect the lives of other human beings. And many of these algorithms get grouped into the bucket labelled “machine learning”. There are many variants of machine learning algorithm, but generally what they do is find and generalize patterns in data: either in a supervised way (“if you see these inputs, expect these outputs; now tell me what you expect for an input you’ve never seen before, but is similar to something you have”), reinforcement-learning way (“for these inputs, your response is/isn’t good) or unsupervised way (“here’s some data; tell me about the structures you see in it”). Which is great if you’re classifying cats vs dogs or flower types from petal and leaf measurements, but potentially disastrous if you’re deciding who to sentence and for how long.

 neuralnetwork architecture (wikipedia)

Simple neural network (wikipedia image)

Let’s anthropomorphise that. A child absorbs what’s around them: algorithms do the same. One use is to train the child/machines to reason or react, by connecting what they see in the world (inputs) with what they believe the state of the world to be (outputs) and the actions they take using those beliefs. And just like children, the types of input/output pairs (or reinforcements) we feed to a machine-learning based system affects the connections and decisions that it makes. Also like children, different algorithms have different abilities to explain why they made specific connections or responded in specific ways, ranging from clear explanations of reasoning (e.g. decision trees, which make a set of decisions based on each input) to something that can be mathematically but not cogently expressed (e.g. neural networks and other ‘deep’ learning algorithms, which adjust ‘weights’ between inputs, outputs and ‘hidden’ representations, mimicking the ways that neurons connect to each other in human brains).

Algorithms can be Assholes

Algorithms are behind a lot of our world now. e.g. Google (which results should you be shown), Facebook (which feeds you should see), medical systems detecting if you might have cancer or not. And sometimes those algorithms can be assholes.

algorithm_asshole_headlines

Headlines (screenshots)

Here are two examples: a Chinese program that takes facial images of ‘criminals’ and maps those images to a set of ‘criminal’ facial features that the designers claim have nearly 90% accuracy in determining if someone is criminal, from just their photo. Their discussion of “the normality of faces of non-criminals” aside, this has echoes of phrenology, and should raise all sorts of alarms about imitating human bias. The second example is a chatbot that was trained on Twitter data; the headline here should not be too surprising to anyone who’s recently read any unfiltered social media.

We make lots of design decisions when we create an algorithm. One decision is which dataset to use. We train algorithms on data. That data is often generated by humans, and by human decisions (e.g. “do we jail this person”), many of which are imperfect and biased (e.g. thinking that people whose eyes are close together are untrustworthy). This can be a problem if we use those results blindly, and we should always be asking about the biases that we might consciously or unconsciously be including in our data. But that’s not the only thing we can do: instead of just dismissing algorithm results as biased, we can also use them constructively, to hold a mirror up to ourselves and our societies, to show us things that we otherwise conveniently ignore, and perhaps should be thinking about addressing in ourselves.

In short, it’s easy to build biased algorithms with biased data, so we should strive to teach algorithms using ‘fair’ data, but when we can’t, we need to use other strategies for our models of the world, and can either talk about the terror of biased algorithms being used to judge us, or we can think about what they’re showing us about ourselves and our society’s decision-making, and where we might improve both.

What goes wrong?

If we want to fix our ‘asshole’ algorithms and algorithm-generated models, we need to think about the things that go wrong.  There are many of these:

  • On the input side, we have things like biased inputs or biased connections between cause and effect creating biased classifications (see the note on input data bias above), bad design decisions about unclean data (e.g. keeping in those 200-year-old people), and missing whole demographics because we didn’t think hard about who the input data covered (e.g. women are often missing in developing world datasets, mobile phone totals are often interpreted as 1 phone per person etc).
  • On the algorithm design side, we can have bad models: lazy assumptions about what input variables actually mean (just think for a moment of the last survey you filled out, the interpretations you made of the questions, and how as a researcher you might think differently about those values), lazy interpretations of connections between variables and proxies (e.g. clicks == interest), algorithms that don’t explain or model the data they’re given well, algorithms fed junk inputs (there’s always junk in data), and models that are trained once on one dataset but used in an ever-changing world.
  • On the output side, there’s also overtrust and overinterpretation of outputs. And overlaid on that are the willful abuses, like gaming an algorithm with ‘wrong’ or biased data (e.g. propaganda, but also why I always use “shark” as my first search of the day), and inappropriate reuse of data without the ethics, caveats and metadata that came with the original (e.g. using school registration data to target ‘foreigners’).

But that’s not quite all. As with the outputs of humans, the outputs of algorithms can be very context-dependent, and we often make different design choices, depending on that context, for instance last week, when I found myself dealing with a spammer trying to use our site at the same time as helping our business team stop their emails going into customers’ spam filters.  The same algorithms, different viewpoints, different needs, different experiences: algorithm designers have a lot to weigh up every time.

Things to fight

Accidentally creating a deviant algorithm is one thing; deliberately using algorithms (including well-meant algorithms) for harm is another, and of interest in the current US context.  There are good detailed texts about this, including Cathy O’Neill’s work, and Latzer, who categorised abuses as:

  • Manipulation
  • Bias
  • Censorship
  • Privacy violations
  • Social discrimination
  • Property right violations
  • Market power abuses
  • Cognitive effects (e.g. loss of human skills)
  • Heteronomy (individuals no longer have agency over the algorithms influencing them)

I’ll just note that these things need to be resisted, especially by those of us in a position to influence their propagation and use.

How did we get here?

Part of the issue above is in how we humans interface with and trust algorithm results (and there are many of these, e.g. search, news feed generators, recommendations, recidivism predictions etc), so let’s step back and look at how we got to this point.

And we’ve got here over a very long time: at least a century or two, to back when humans started using machines that they couldn’t easily explain because they could do tasks that had become too big for the humans.   We automate because humans can’t handle the data loads coming in (e.g. in legal discovery, where a team often has a few days to sift through millions of emails and other organizational data); we also automate because we hope that machines will be smarter than us at spotting subtle patterns. We can’t not automate discovery, but we also have to be aware of the ethical risks in doing it.  But humans working with algorithms (or any other automation) tend to go through cycles: we’re cynical and undertrust a system tip it’s “proved”, then tend to overtrust its results (these are both part of automation trust). In human terms, we’re balancing these things:

More human:

  • Overload
  • Incomplete coverage
  • Missed patterns and overlooked details
  • Stress
More automation:

  • Overtrust
  • Situation awareness loss (losing awareness because algorithms are doing processing for us, creating e.g. echo chambers)
  • Passive decision making
  • Discrimination, power dynamics etc

And there’s an easy reframing here: instead of replacing human decisions with automated ones, let’s concentrate more on sharing, and frame this as humans plus algorithms (not humans or algorithms), sharing responsibility, control and communication, between the extremes of under- and over-trust (NB that’s not a new idea: it’s a common one in robotics).

Things I try to do

Having talked about what algorithms are, what we do as algorithm designers, and the things that can and do go wrong with that, I’ll end with some of the things I try to do myself.  Which basically comes down to consider ecosystems, and measure properly.

Considering ecosystems means looking beyond the data, and at the human and technical context an algorithm is being designed in.  It’s making sure we verify sources, challenge both the datasets we obtain and the algorithm designers we work with, and have a healthy sense of potential risks and their management (e.g. practice both data and algorithm governance), and reduce bias risk by having as diverse (in thought and experience) a design team as we can, and access to people who know the domain we’re working in.

Measuring properly means using metrics that aren’t just about how accurately the models we create fit the datasets we have (this is too often the only goal of a novice algorithm designer, expressed as precision = how many of the things you labelled as X are actually X; and recall = how many of the things that are really X did you label as X?), but also metrics like “can we explain this” and “is this fair”.  It’s not easy but the alternative is a long way from pretty.

I once headed an organisation whose only rule was “Don’t be a Jerk”.  We need to think about how to apply “Don’t be a jerk” to algorithms too.

Strata talk on hunchworks technology

I try not to put too much dayjob stuff here, but sometimes I need to leave less-tidy breadcrumbs for myself.  Here’s the 10-minute (ish) talk I gave at Strata New York this year.

Intro

I’m Sara Farmer, and I’m responsible for technology at Global Pulse. This brings its own special issues.  One thing we’re passionate about is making systems available that help the world. And one thing we’ve learnt is that we can’t do that without cooperation both within and outside the UN.  We’re here to make sure analysts and field staff get the tools that they need, and that’s a) a lot of system that’s needed, and b) something that organisations across the whole humanitarian, development and open data space need too.

<Slide 1: picture of codejammers>

We’re talking to those other organisations about what’s needed, and how we can best pool our resources to build the things that we all need.  And the easiest way for us to do that is to release all our work as open-source software, and act as facilitators and connectors for communities rather than as big-system customers.

<Slide 2: open-source issues>

Open source isn’t a new idea in the UN – UNDP and UNICEF, amongst others, have trailblazed for us, and we’re grateful to have learnt from their experience in codejams, hackathons and running git repositories and communities.  And it’s a community of people like Adaptive Path and open source coders who make HunchWorks happen, and I’d like to publicly thank their dedication and the dedication of our small tech team.  We have more work to do, not least in building a proper open innovations culture across the UN (we’ve codenamed this “Blue Hacks”) and selecting licensing models that allow us to share data and code whilst meeting the UN’s core mandate, but in this pilot project it’s working well.

<slide 3: bubble diagram with other UN systems we’re connecting to>

We’re already building out the human parts of the system (trust, groups etc) but Hunchworks doesn’t work in isolation: it needs to be integrated into a wider ecosystem of both existing and new processes, users and technologies.  The humanitarian technology world is changing rapidly, and we’ve spent a lot of time thinking about what it’s likely to look like both soon and in the very near future.

So. For hunchworks to succeed, it must connect to four other system types.

We need users, and we need to build up our user population quickly. We’re talking about lab managers, development specialists, mappers, policy coordinators, local media and analysts. Those users will need also specialist help from other users in areas like maps, crowdsourcing, data and tools curation.

  • UN Teamworks – this is a Drupal-based content management system that a UNDP New York team has created to connect groups of UN people with each other, and with experts beyond the UN.
  • Professional networking systems like LinkedIn that many crisis professionals are using to create networks.
  • Note that some of our users will be bots – more on this in a minute.

Users will need access to data, or (in the case of data philanthropy), to search results.

  • UN SDI – a project creating standards and gazetteers for geospatial data across the UN.
  • CKAN  – data repository nodes, both for us and from open data initiatives.
  • Geonode, JEarth et al – because a lot of our data is geospatial.

They’ll need tools to help them make sense of that data. And bots to do some of that automatically.

  • We need toolboxes – ways to search through tools in the same way that we do already with data.  We’re talking to people like Civic Commons about the best ways to build these.
  • We’re building apps and plugins where we have to, but we’re talking about organisations putting in nodes around the world, so we’re hunting down open source and openly available tools wherever we can. We’re waiting for our first research projects to finish before we finalise our initial list, but we’re going to at least need data preparation, pattern recognition, text analysis, signal processing, graphs, stats, modelling and visualisation tools.
  • Because we want to send hunchworks instances to the back of beyond, we’re also including tools that could be useful in a disaster – like Ushahidi, Sahana, OpenStreetMap and Google tools.
  • And there are commercial tools and systems that we’re going to need to interface with too. We’re talking about systems like Hunch and a bunch of other suppliers that we’ll be talking to once we get the panic of our first code sprints out of the way.

And they need a ‘next’, a way to spur action, to go with the knowledge that Hunchworks creates.

  • We’re adding tools for this too. And also connecting to UN project mapping systems:
  • UN CRMAT – risk mapping and project coordination during regional development
  • UN CIMS – project coordination during humanitarian crises, an extension of the 3W (who, what, where) idea.

Which is a big vision to have and a lot to do after our first releases next spring. And yet another reason why we’re going to need to do all the partnering and facilitation that we can.

<slide 4: algorithms list>

So. You’ve seen how we’ve designed Hunchworks to help its users work together on hunches. But Hunchworks is more than just a social system, and there are a lot of algorithms needed to make that difference.  We have to design and implement the algorithms that make Hunchworks smart enough to show its users the information that is relevant to them when they need it (also known as looking for all the boxes marked “and then a miracle happens”).

And the first algorithms needs are here:

  • Similarity and complementarity metrics.  We need to work on both of these.  Now there’s a lot of work out there on how things are similar, but there’s not so much around about how people and their skills can complement each other.  We’ve been looking at things like robot team theories, autonomy and human-generated team templates as baselines for this.
  • Relevance. And for that, read “need some interesting search algorithms”. We’re looking into search, but we’re also looking at user profiling and focus of attention theories, including how to direct users’ peripheral attention onto things that are related to a hunch that they’re viewing.
  • Credibility. We’d like to combine all the information we have about hunches and evidence (including user support) into estimates of belief for each hunch, that we can use as ratings for hunches, people and evidence sources. There’s work in uncertain reasoning, knowledge fusion and gamification that could be helpful here, and there are some excellent examples already out there on the internet. As part of this, we’re also looking at how Hunchworks can be mapped onto a reasoning system, with hunches as propositions in that system. Under “everything old is new again”, we’re interested in how that correlates to 1980s reasoning systems too.
  • Hunch splitting, merging and clustering. We need to know when hunches are similar enough to suggest merging or clustering them.  We also would like to highlight when a hunch description and the evidence attached to it deviates far enough from its original description to consider splitting it into a group of related hunches. Luckily, one of our research projects has addressed exactly this problem – this is an example of how our internal algorithm needs are often the same as the users’ tool needs – and we’re looking into how to adapt it.

Mixing human insight with big data results. One of the things that makes Hunchworks more than just a social system is the way that we want to handle big data feeds. We don’t think it’s enough to give analysts RSS feeds or access to tools, and we’re often working in environments where time is our most valuable commodity.   The big question is how we can best combine human knowledge and insight with automated searches and analysis.

Let’s go back to Global Pulse’s mission.  We need to detect crises as they unfold in real time, and alert people who can investigate them further and take action to mitigate their effects.  It’s smart for us to use big data tools to detect ‘data exhaust’ from crises.  It’s smart for us to add human expertise to hunches that something might be happening.  But it’s smarter still for us to combine these two information and knowledge sources into something much more powerful.

We’ve argued a lot about how to do this, but the arguments all seem to boil down to one question: “do we treat big data tools as evidence or users in Hunchworks”?  If we treat big data tools as evidence, we have a relatively easy life – we can rely on users to use the tools to generate data that they attach to hunches, or can set up tools to add evidence to hunches based on hunch keywords etc.  But the more we talked about what we wanted to do with the tools, from being able to create hunches automatically from search results to rating each tool’s effectiveness on a given type of hunch, the more they started sounding like users.

So we’ve decided to use bots. Agents. Intelligent agents. Whatever you personally call a piece of code that’s wrapped in something that observes its environment and acts based on those observations, we’re treating them as a special type of Hunchworks user.  And by doing that, we’ve given bots the right to post hunches when they spot interesting patterns; the ability to be rated on their results, and the ability to be useful members of human teams.

<Slide 5: System issues>

And now I’ll start talking about the things that are difficult for us.  You’ve already seen that trust is incredibly important in the Hunchworks design. Whilst we have to build the system to enhance trust between users and users, users and bot, users and the system etc, we also have to build to deal with what happens when that trust is broken. Yes, we need security.

We need security to ensure that hidden hunches are properly distanced from the rest of the system.  I’ve worked responses where people died because they were blogging information, and we need to minimise that risk where we can.

We also need ways to spot sock puppet attacks and trace their effects through the system when they happen. This is on our roadmap for next year.

And then we have localisation. The UN has 6 official languages (arabic, chinese, english, french, russian and spanish), but we’re going to be putting labs into countries all over the world, each with their own languages, data sources and cultural styles.  We can’t afford to lose people because we didn’t listen to what they needed, and this is a lot of what the techs embedded in Pulse Labs will be doing. We’ll need help with that too.

<Slide 6: federation diagram>

Also on the ‘later’ roadmap is dealing with federation.  We’re starting with a single instance of Hunchworks so we can get the user management, integration and algorithm connections right.  But we’re part of an organisation that works across the world, including places where bandwidth is very limited and sometimes non-existent, and mobile phones are more ubiquitous than computers.  We’re also working out ways, as part of Data Philanthropy, to connect to appliances embedded within organisations that can’t, or shouldn’t, share raw data with us. Which means a federated system of platforms, with all the sychronisation, timing and interface issues that entails.

We aren’t addressing federation yet, but we are learning a lot in the meantime about its issues and potential solutions from systems like Git and from our interactions with the crisismapper communities and other organisations with similar operating structures and connectivity problems. Crisismapping, for example, is teaching us a lot about how people handle information across cultures and damaged connections whilst under time and resource stress.

Okay, I’ve geeked out enough on you. Back to Chris for the last part of this talk.

Slideset

Focussed on the technical challenges of building Hunchworks… slide titles are:

  • Open source and the UN – pic of codejam – licensing models, working models for UN technology creation (h/t unicef and undp, big h/t adaptive path and the codejammers)
  • Integration – bubble diagram with the other UN systems that we’re connecting to (CRMA and CIMS for actions, TeamWorks for the user base, non-UN systems for tools – CKAN for data, CivicCommons for toolbag etc)
  • Algorithms – list of algorithms we’re designing, discussion of uncertainty handling, risk management, handling mixed human-bot teams, similarity and complementarity metrics (including using hand-built and learnt team templates) etc
  • Security – how to handle hidden hunches, incursions and infiltrations. Diagram of spreading infiltration tracked across users, hunches and evidences.
  • Localisation – wordcloud of world languages; discuss languages and the use of pulse labs
  • Federation – clean version of my federation diagram – describe first builds as non-federated, but final builds being targeted at a federated system-of-systems, with some nodes having little or no bandwidth and the sychronisation and interface needs that creates

Autonomy

I live in two worlds – the real and the virtual. To some extent we all do now – moving our personae from physical to virtual connections with barely a thought about what this means. But thinking a little more, it struck me that this might just be a good example of autonomy in action.

Autonomy also lives in two worlds. For many people, it means the ability to act independently of controls, to think and act for oneself – for others, especially those used to autonomous systems, it means robots and the subtle grades of control, responsibility and trust that their human operators (or, increasingly, team-mates) share with them.

I could get boring about PACT (Pilot Authorisation and Control of Tasks) levels and other grading scales that range from the human having full control of a system’s actions to the system having full control. I could talk about Asimov’s laws, and the morals and legal minefield that start to come into play when a robot – designed by humans, but with its reasoning maybe optimized by machines – climbs up the PACT scale and starts to report back rather than wait for a human decision. I could even talk about variable autonomy levels and what will happen when the human-machine responsibility balance shift automatically to take account of their relative task loadings.

But I won’t. Because I think, to some extent, we’re already there. To me, autonomy isn’t really about putting smarter machines into the field, it’s about understanding that, once you accept systems as part of the team, you can get past the oh-Gad-it’s-Terminator moment and start thinking like a mixed human-machine team player instead of someone working with a load of humans who happen to have some whizzy toys at their disposal. And you know what falls out if you do this? Simplicity. Elegance. Because instead of going “ooh, wow, technology to replace the humans”, you play to all the team members’ strengths. Build systems and teams that use machines’ ability to sift and present data and do the mindless fusion tasks, but also use the humans’ strengths in pattern-finding, group reasoning and making sense of complex, uncertain situations.

And y’know what? This is happening every time my virtual self crawls around Twitter or Facebook. I’m using tools, I’m allowing my focus to be guided by filters and tags and ever-more-complex reasoning without entirely stopping to think about what this means (well, okay, I looked up the trending algorithms but hey), but I’m also adding my own human reasoning and focus on top, and using those machine steers to find and work with dozens of intersecting communities to report on and make sense of the world using yet more semi-autonomous tools. Which is a good thing. But in this landscape, with early AI techniques turning up in tools like RIF and the semantic web, and algorithm-based controls in our virtual lives being increasingly taken on trust, it’s maybe time to reflect on the subtleties and meaning of autonomy and how it really already applies to us all.

I meant to write a post about remembering to include the human in systems, to not to be blinded by technology.  I’m not sure I did that, but I enjoyed the journey to a different point all the same.

Postscript. I wondered this morning if a truer test of autonomy would be arguments – not gentle reasoning, but the type of discussion usually brought on when two people are both sure of a completely opposite conclusion. Like, say, when map-reading together.  And then an ‘aha’ moment. PACT et al aren’t really about the philosophical and moral nuances of sharing responsibility et al with machines, oh no. It’s simpler: they’re about machine courtesy – embedded politesse if you will.  Or at least I would have reached that conclusion, but I think it’s your turn gentle reader.  No, no, after you…