Some thoughts from working in data-driven startups.
Data scientists, like consultants, aren’t needed all the time. You’re very useful as the data starts to flow and the business questions start to gel, but before that there are other skills more needed (like coding); and afterwards there’s a lull before your specialist skills are a value-add again.
There is no data fiefdom. Everyone in the organization handles data: your job is to help them do that more efficiently and effectively, supplementing with your specialist skills when needed, and getting out of the way once people have the skills, experience and knowledge to fly on their own. That knowledge should include knowing when to call in expert help, but often the call will come in the form of happening to walk past their desk at the right time.
Knowing when to get involved is a delicate dance. Right at the start, a startup is (or should be) still working out what exactly it’s going to do, feeling its way around the market for the place where its skills fit; even under Lean, this can resemble a snurfball of oscillating ideas, out of which a direction will eventually emerge: it’s too easy to constrain that oscillation with the assumptions and concrete representations that you need to make to do data science. You can (and should) do data science in an early startup, but it’s more a delicate series of nudges and estimates than hardcore data work. As the startup starts to gel around its core ideas and creates code is a good time to join as a data scientist because you can start setting good behavior patterns for both data management and the business questions that rely on it, but beware that it isn’t an easy path: you’ll have a lot of rough data to clean and work with, and spend a lot of your life justifying your existence on the team (it helps a lot if you have a second role that you can fall back on at this stage). Later on, there will be plenty of data and people will know that you’re needed, but you’ll have lost those early chances to influence how and what data is stored, represented, moved, valued and appraised, and how that work links back (as it always should) to the startup’s core business.
There are typically 5 main areas that the first data nerd in a company will affect and be affected by (h/t Carl Anderson from Warby Parker ): data engineering, supporting analysts, supporting metrics, data science and data strategy. These are all big, 50-person-company areas that will grow out of initial seeds: data engineering, analyst support, metric support, data science and data governance.
Data engineering is the business of making data available and accessible. That starts with the dev team doing their thing with the datastores, pipelines, APIs etc needed to make the core system run. It’ll probably be some time before you can start the conversation about a second storage system for analysis use (because nobody wants the analysts slowing down their production system) so chances are you’ll start by coding in whatever language they’re using, grabbing snapshots of data to work on ’til that happens.
To start with, there will also be a bunch of data outside the core system, that parts of the company will run on ’til the system matures and includes them; much of it will be in spreadsheets, lists and secondary systems (like CRMs). You’ll need patience, charm, and a lot of applied cleaning skills to extract these from busy people and make the data in them more useful to them. Depending on the industry you’re in, you may already have analysts (or people, often system designers, doing analysis) doing this work. Your job isn’t to replace them at this; it’s to find ways to make their jobs less burdensome, usually through a combination of applied skills (e.g. writing code snippets to find and clean dirty datapoints), training (e.g. basic coding skills, data science project design etc), algorithm advice and help with more data access and scaling.
Every company has performance metrics. Part of your job is to help them be more than window-dressing, by helping them link back to company goals, actions and each other (understanding the data parts of lean enterprise/ lean startup helps a lot here, even if you don’t call it that); it’s also to help find ways to measure non-obvious metrics (mixtures, proxies etc; just because you can measure it easily doesn’t make it a good metric; just because it’s a good metric doesn’t make it easy to measure).
Data science is what your boss probably thought you were there to do, and one of the few things that you’ll ‘own’ at first. If you’re a data scientist, you’ll do this as naturally as breathing: as you talk to people in the company, you’ll start getting a sense of the questions that are important to them; as you find and clean data, you’ll have questions and curiosities of your own to satisfy, often finding interesting things whilst you do. Running an experiment log will help here: for each experiment, what was the business need, the dataset, what did you do, and most importantly, what did you learn from it. That not only frames and leaves a trail for someone understanding your early decisions; it will also leave a blueprint for other people you’re training in the company on how data scientists think and work (because really you want everyone in a data-driven company to have a good feel for their data, and there will be a *lot* of data). Some experiments will be for quick insights (e.g. into how and why a dataset is dirty); others will be longer and create prototypes that will eventually become either internal tools or part of the company’s main systems; being able to reproduce and explain your code will help here (e.g. Jupyter notebooks FTW).
With data comes data problems. One of these is security; others are privacy, regulatory compliance, data access and user responsibilities. These all fall generally under ‘data governance’. You’re going to be part of this conversation, and will definitely have opinions on things like data anonymisation, but early on, much of it will be covered by the dev team / infosec person, and you’ll have to wait for the right moment to start conversations about things like potential privacy violations from combining datasets, data ecosystem control and analysis service guarantees. You’ll probably do less of this part at first than you thought you would.
Depending on where in the lifecycle you’ve joined, you’re either trying to make yourself redundant, or working out how you need to build a team to handle all the data tasks as the company grows. Making yourself redundant means giving the company the best data training, tools, and structural and scientific head start you can; leaving enough people, process and tech behind to know that you’ve made a positive difference there. Building the team means starting by covering all the roles (it helps if you’re either a ‘unicorn’ — a data scientist who call fill all the roles above — or pretty close to being one; a ‘pretty white pony’ perhaps) and gradually moving them either fully or partially over to other people you’ve either brought in or trained (or both). Neither of these things is easy; both are tremendously rewarding and will grow your skills a lot.
Last part (of 4) from a talk I gave about hacking belief systems.
Who can help?
My favourite quote is “ignore everything they say, watch everything they do”. It was originally advice to girls about men, but it works equally well with both humanity and machines.
If you want to understand what is happening behind a belief-based system, you’re going to need a contact point with reality. Be aware of what people are saying, but also watch their actions; follow the money, and follow the data, because everything leaves a trace somewhere if you know how to look for it (looking may be best done as a group, not just to split the load, but also because diversity of viewpoint and opinion will be helpful).
Verification and validation (V&V) become important. Verification means going there. For most of us, verification is something we might do up front, but rarely do as a continuing practice. Which, apart from making people easy to phish, also makes us vulnerable to deliberate misinformation. Want to believe stuff? You need to do the leg-work of cross-checking that the source is real, finding alternate sources, or getting someone to physically go look at something and send photos (groups like findyr still do this). Validation is asking whether this is a true representation of what’s happening. Both are important; both can be hard.
One way to start V&V is to find people who are already doing this, and learn from them. Crisismappers used a bunch of V&V techniques, including checking how someone’s social media profile had grown, looking at activity patterns, tracking and contacting sources, and not releasing a datapoint until at least 3 messages had come in about it. We also used data proxies and crowdsourcing from things like satellite images (spending one painful Christmas counting and marking all the buildings in Somalia’s Afgooye region), automating that crowdsourcing where we could.
Another group that’s used to V&V is the intelligence community, e.g. intelligence analysts (IAs). As Heuer puts it in the Psychology of Intelligence Analysis (see the CIA’s online library), “Conflicting information of uncertain reliability is endemic to intelligence analysis, as is the need to make rapid judgments on current events even before all the evidence is in”. Both of these groups (IAs and crisismappers) have been making rapid judgements about what to believe, how to handle deliberate misinformation, what to share and how to present uncertainty in it for years now. And they both have toolsets.
Is there a toolset?
The granddaddy of belief tools is statistics. Statistics helps you deal with and code for uncertainty; it also makes me uncomfortable, and makes a lot of other people uncomfortable. My personal theory on that is it’s because statistics tries to describe uncertainty, and there’s always that niggling feeling that there’s something else we haven’t quite considered when we use do this.
Statistics is nice once you’ve pinned down your problem and got it into a nicely quantified form. But most data on belief isn’t like that. And that’s where some of the more interesting qualitative tools come in, including structured analytics (many of which are described in Heuer’s book “structured analytic techniques”). I’ll talk more about these in a later post, but for now I’ll share my favourite technique: Analysis of Competing Hypotheses (ACH).
ACH is about comparing multiple hypotheses; these might be problem hypotheses or solution hypotheses at any level of the data science ladder (e.g. from explanation to prediction and learning). The genius of ACH isn’t that it encourages people to come up with multiple competing explanations for something; it’s in the way that a ‘winning’ explanation is selected, i.e. by collecting evidence *against* each hypothesis and choosing the one with the least disproof, rather than trying to find evidence to support the “strongest” one. The basic steps (Heuer) are:
Hypothesis. Create a set of potential hypotheses. This is similar to the hypothesis generation used in hypothesis-driven development (HDD), but generally has graver potential consequences than how a system’s users are likely to respond to a new website, and is generally encouraged to be wilder and more creative.
Evidence. List evidence and arguments for each hypothesis.
Diagnostics. List evidence against each hypothesis; use that evidence to compare the relative likelihood of different hypotheses.
Refinement. Review findings so far, find gaps in knowledge, collect more evidence (especially evidence that can remove hypotheses).
Inconsistency. Estimate the relative likelihood of each hypothesis; remove the weakest hypotheses.
Sensitivity. Run sensitivity analysis on evidence and arguments.
Conclusions and evidence. Present most likely explanation, and reasons why other explanations were rejected.
This is a beautiful thing, and one that I suspect several of my uncertainty in AI colleagues were busy working on back in the 80s. It could be extended (and has been); for example, DHCP creativity theory might come in useful in hypothesis generation.
What can help us change beliefs?
Which us brings us back to one of the original questions. What could help with changing beliefs? My short answer is “other people”, and that the beliefs of both humans and human networks are adjustable. The long answer is that we’re all individuals (it helps if you say this in a Monty Python voice), but if we view humans as systems, then we have a number of pressure points, places that we can be disrupted, some of which are nicely illustrated in Boyd’s original OODA loop diagram (but do this gently: go too quickly with belief change, and you get shut down by cognitive dissonance). We can also be disrupted as a network, borrowing ideas from biostatistics, and creating idea ‘infections’ across groups. Some of the meta-level things that we could do are:
Teaching people about disinformation and how to question (this works best on the young; we old ‘uns can get a bit set in our ways)
Machines have a belief system either because we hard-wire that system to give a known output for a known input (all those if-then statements), or because they learn it from data. We touched on machine learning in part 1 of this series (remember the racist chatbot?), and on how it’s missing the things like culture, error-correction and questioning that we install in small humans. But before we get to how we might build those in, let’s look at the basic mechanics of machine learning.
At the heart of machine learning are algorithms: series of steps that, repeated, create the ability to cluster similar things together, classify objects, and build inference chains. Algorithms are hackable too, and three big things tend to go wrong with them: inputs, models and willful abuse. Bias and missing data are issues on the input side: for instance, if an algorithm is fed racially-biased cause-to-effect inputs, it’s likely to produce a similarly-biased system, and although we can interpolate between data points, if there’s a demographic missing from our training data, we’re likely to misrepresent them.
And although we may appear to have algorithms that automatically turn training data into models, there are almost always human design decisions being used to tune them, from the interpretations that we make of inputs (e.g. mouseclicks are equivalent to interest in a website, or that we make assumptions like 1 phone per person in a country means that everyone has a phone, when it could mean, as is common in West Africa, that some people have multiple phones and others have none) to the underlying mathematical models that they use.
Back in the day, we used to try to build models of the world by hand, with assertions like “all birds can fly”, “birds have wings”, “fred is a bird so fred can fly”, and many arguments over what to do about penguins. That took time and created some very small and fragile systems, so now we automate a lot of that, by giving algorithms examples of input and output pairs, and leaving them to learn enough of a model that we get sensible results when we feed in an input it hasn’t seen before. But sometimes we build systems whose models don’t fit the underlying mechanics of the world that they’re trying to capture, either because we’re using the wrong type of model (e.g. trying to fit straight lines to a curve), the input data doesn’t cover the things that we need it to learn, the world has changed since we built the model, or any of a dozen other things have gone wrong.
Machines can also be lied to. When we build models from data, we usually assume that the data, although inexact, isn’t deliberately wrong. That’s not true any more: systems can be ‘gamed’ with bad data (either misinformation, e.g. accidentally, or disinformation, e.g. deliberately), with machine learning from data on the internet being especially susceptible to this.
And issues in machine belief aren’t abstract: as more systems are used to make decisions about people, Lazar’s catalogue of algorithm risks become important to consider in building any algorithm (these risks include manipulation, bias, censorship, privacy violations, social discrimination, property rights violations, market power abuses, cognitive effects and heteronomy, e.g. individuals no longer having agency over the algorithms influencing their lives).
Humans and machines can go wrong in similar ways (missing inputs, incorrect assumptions, models that don’t explain the input data etc), but the way that humans are wired adds some more interesting errors.
When we’re shown a new fact (e.g. I crashed your car), our brains initially load in that fact as true, before we consciously decide whether it’s true or not. That is itself a belief, but it’s a belief that’s consistent with human behaviour. And it has some big implications: if we believe, even momentarily, that something false is true, then by the time we reject it, it’s a) left a trace of that belief in our brain, and b) our brain has built a set of linked beliefs (e.g. I’m an unsafe driver) that it doesn’t remove with it. Algorithms can have similar memory trace problems; for instance, removing an input-output pair from a support vector machine is unlikely to remove its effect from the model, and unpicking bad inputs from deep learning systems is hard.
Humans have imperfect recall, especially when something fits their unconscious bias (try asking a US conservative about inauguration numbers). And humans also suffer from confirmation bias: they’re more likely to believe something if it fits their existing belief framework. Humans also have mental immune systems, where new fact that are far enough away from their existing belief systems are immediately rejected. I saw this a lot when I worked as an innovations manager (we called this the corporate immune system): if we didn’t introduce ideas very carefully and/or gradually, the cognitive dissonance in decision-makers’ heads was too great, and we watched them shut those ideas out.
We’re complicated. We also suffer from the familiarity backfire effect, as demonstrated by advertising: the more someone tries to persuade us that a myth is wrong, the more familiar we become with that myth, and more likely we are to believe it (familiarity and belief are linked). At the same time, we can also suffer from the overkill backfire effect, as demonstrated by every teenager on the planet ever, where the more evidence we are given against a myth, and the more complex that evidence is compared to a simple myth, the more resistant we are to believing it (don’t despair: the answer is to avoid focussing on the myth whilst focussing on more simple alternative explanations).
And we’re social: even when presented with evidence that could make us change our minds, our beliefs are often normative (the beliefs that we think other people expect us to believe, often held so we can continue to be part of a group), and resistance to new beliefs is part of holding onto our in-group identity. Counters to this are likely to include looking at group dynamics and trust, and working out how to incrementally change beliefs not just at the individual but also at the group level.
Human and machine beliefs have some common issues and characteristics. First, they have to deal with the frame problem: you can’t measure the whole world, so there are always parts of your system potentially affected by things you haven’t measured. This can be a problem; we spent a lot of the early 80s reasoning about children’s building blocks (Winston’s blocks world) and worrying about things like the “naughty baby problem”, where sometime during our careful reasoning about the world, an outside force (the “naughty baby”) could come into the frame and move all the block orders around.
We also have input issues, e.g. unknown data and uncertainty about the world. Some of that uncertainty is in the data itself; some of it is in the sources (e.g. other people) and channels (e.g. how facts are communicated); a useful trick from intelligence system design is to assess and score those 3 sources of uncertainty separately. Some uncertainty can be reduced by averaging: for instance, if you ask several farmers for the weight of a bull, each individual guess is likely to be wrong, but an average of the guesses is usually close to the real weight (this is how several cool tricks in statistics work).
And then there’s human bias. A long time ago, if you wanted to study AI, you inevitably ended up reading up on cognitive psychology. Big data didn’t exist (it did, but not applied to everything: it was mainly in things like real-time sonar processing systems), the Cyc system was a Sisyphean attempt to gather enough ‘facts’ to make a large expert system, and most of our efforts revolved around how to make sense of the world given a limited number of ‘facts’ about it (and also how to make sense of the world, given a limited number of uncertain data points and connections between them). And then the internet, and wikipedia, and internet 2.0 where we all became content providers, and suddenly we had so much data that we could just point machine learning algorithms at it, and start finding useful patterns.
We relied on two things: having a lot of data, and not needing to be specific: all we cared about was that, on average, the patterns worked. So we could sell more goods because the patterns that we used to target the types of goods to the types of people most likely to buy them, with the types of prompts that they were most likely to respond to. But we forgot the third thing: that almost all data collection has human bias in it — either in the collection, the models used or the assumptions made when using them. And that is starting to be a real problem for both humans and machines trying to make sense of the world from internet inputs.
And sometimes, beliefs are wrong
Science is cool — not just because blowing things up is acceptable, but also because good science knows that it’s based on models, and actively encourages the possibility that those models are wrong. This leads to something called the Kuhn Cycle: most of the time we’re building out a model of the world that we’ve collectively agreed is the best one available (e.g. Newtonian mechanics), but then at some point the evidence that we’re collecting begins to break that collective agreement, and a new stronger model (e.g. relativity theory) takes over, with a strong change in collective belief. It usually takes an intelligent group with much humility (and not a little arguing) to do this, so it might not apply to all groups.
And sometimes people lie to you. The t-shirt above was printed to raise funds for the Red Cross after the Chile 2010 earthquake. It was part of a set of tweets sent pretending to be from a trapped mother and child, and was believed by the local fire brigade, who instead found an unaffected elderly couple at that address. Most crisismapping deployments have very strong verification teams to catch this sort of thing, and have been dealing with deliberate misinformation like this during crises since at least 2010. There’s also been a lot of work on veracity in data science (which added Veracity as a fourth “V” to “Volume, Velocity, Variety”).
And sometimes the person lying is you: if you’re doing data science, be careful not to deceive people by mistake, e.g. by not starting your axes at zero (see above), you can give readers a false impression of how significant differences between numbers are.
This is part of a talk on hacking belief systems using old AI techniques. The first part covered human beliefs, machine beliefs, and deep learning and the internet as belief-based systems.
Lean and Agile are based on belief
Most system development I see now uses agile principles: short, iterative cycles of development, and an emphasis on learning, preferably alongside the system’s users, and adapting their systems to increase some measure of customer or user value. Lean is a similar idea at the company level, based around learning quickly about what works through prototyping and measurement (aka the “build-measure-learn” cycle). Lean Startup is the small-scale version of this; Lean Enterprise is the large-organisation one.
I’m going to focus on Lean Enterprise because it’s more complex. In an established organization, Lean’s usually a combination of learning and adapting existing business, and exploring new business ideas (with innovations people bridging across these). Core ideas in this include purpose, customer value and incremental learning.
Purpose is the ‘why’ of an organization, and acts as the common vision that everything in the organisation works towards (Daniel Pink’s videodescribes this well).
Customer value is what the customer needs instead of the things we think they’d like to have. Concentrating on customer value reduces waste in the organisation.
Iterative learning is about learning in small increments so we can adapt quickly when markets or needs change, and don’t bet too many resources on an idea that might not work out.
The build-measure learn cycle is at the heart of this. We start with ideas, build the smallest thing that we can learn something useful about them from (this might be code, but it might also be wireframes, screenshots, a pitch, a talk), test it, measure the results of our tests, learn from them, adapt our idea and iterate again.
And at the heart of this cycle are value hypotheses: testable guesses about the problems that we think we’re trying to solve (“our mailing lists are too general”), and the solutions that we think could solve them (“more people will vote for us if we target our advertisements to them”). We make each guess, often framed as a “null hypothesis” (a statement that what we’re seeing in our results is probably due to random chance rather than the effect that we were hoping to see), then try to prove or disprove it.
There’s a lot of design thinking behind hypothesis generation, and there’s a whole different post to write on how hypotheses breed smaller hypotheses and form hierarchies and structure, but the point here is that a hypothesis is an expression of a belief, and that belief and its management is at the core of Lean.
Lean value trees link beliefs together
A lean value tree (LVT) links an organisation’s purpose to its beliefs, metrics (the ways it measures itself), hypotheses and experiments. The layers in the tree build on each other, and are:
The organisation’s purpose (aka mission): who is it, and what its vision is for itself. This is a stable belief, owned by the organisation’s executive.
Goals that support the purpose, framed as desired, stable and measurable outcomes.
Strategic bets on those goals: different ways that we could satisfy a goal, with metrics associated with them. Stable in the medium term, owned below executive level.
Promises of value: value-driven initiatives against those bets. Stable in the medium-term, owned by the product team.
Hypotheses: measurable, testable guesses that support or refute the promises of value. Experiments on these are very short term (often quick and dirty) ways to learn about the problem space and potential solutions, and are owned by the development team.
Beliefs (hypotheses) are built into the tree, but belief is also built into how we manage the tree. One layer is called “strategic bets” because organisations rarely have the resources (teams, time, money) to do everything on their LVTs, and need to prioritise where they put those resources, by allocating them to the more ‘important’ things on the tree, where “important” is belief-based: a combination of beliefs in how big the market connected to a bet is, and how much value (financial or otherwise) is in that market.
Hypothesis driven development is an extension of the arc from test driven development to behavior driven development: its purpose is to learn, and to experiment quickly with ideas. Before a team tests a HDD hypothesis, it fills out a card (like the one above) describing the work done, the changes that we expect from that work, and a success condition (a “signal”) that the team believes will happen if the experiment succeeds. Writing the success condition (e.g. “1% increase in donations”) before testing is important because it a) forces the team to concentrate on what they would accept as an outcome, and b) stops them fitting the test to the results they obtained (which could perhaps be called “optimism driven development”).
LVTs are useful models for talking not only about what we want within our own organisations, but also for testing our assumptions and beliefs about the goals and actions of other organisations, including organisations competing with us in the same markets.
We’re really optimising on other peoples’ beliefs
Lean practitioners talk a lot about customer value, and about including users in its feedback loops (either directly, or by tracking the changes in user behavior after changing the system). This optimizes systems not on an organisation’s beliefs, but on its beliefs about other peoples’ beliefs, filtered through those other peoples’ behavior. This is why things like good survey design and execution are difficult, with many types of observer effect (e.g. people doing and saying what they think you want to hear, responding differently to different ways of asking etc) that need to be allowed for.
But this isn’t just about understanding customer beliefs; it’s also about using that knowledge to predict and adjust customer behavior. Data scientists have a diagram for that, and talk about their work as a continuum from traditional statistics summarising data about events that have already happened (descriptive and diagnostic analytics), to predicting what might happen and how we could influence that (predictive and prescriptive analytics). But there is another box at the top of this descriptive-diagnostic-predictive-prescriptive diagram, describing a feedback loop where we learn and adapt our environment given that learning. This box is where the belief hackers, the roboticists and the lean enterprise people live.
This is part 1 of a talk I gave recently on hacking belief systems using old AI techniques.
I have a thing about belief systems. Not religious belief, but what it means to believe something, when accuracy does and doesn’t matter, and why “strongly held belief” is a better concept to aim at than “true”. It’s something I’ve been thinking about for a while, and I believe (!) it’s a conversation we need to have both as technologists who base many of our work decisions on beliefs, and often work on a belief-based system (the internet), and in the current climate of “fake news”, uncertainty and deliberate disinformation.
I care about belief because I care a lot about autonomy: the ways that humans and machines can work together, sharing control, responsibility and as teams, and we don’t talk enough yet about the human-machine sharing of belief systems and cultures that go with that. I also work in two professions that are fundamentally based on belief: data science, and lean/agile system development, and care about two topics that badly need a better theory of belief: that deep learning systems are doubling-down on human inequality and bias, and that America’s collective belief systems are currently being ‘hacked’. If that isn’t enough, there’s also a smaller interest, that Aspergers and neurotypical people hold the world in their minds differently, and whether that could be useful.
There are many definitions of belief, and many interpretations of those definitions. For this talk, I looked primarily at three old definitions: doxa, faith-based division of ‘facts’ into ‘true’ or ‘false’ (doxa is the root word of ‘orthodoxy’); pistis, which is more nuanced, and allows for evidence and different degrees of confidence (pistis is the root word of epistemology), and normative belief: what you think other people expect you to believe. Each of these exist in both humans and systems, and each can be manipulated (albeit often in different ways).
You don’t escape the machine
We need to find a way to frame human and machine belief in the same way, so we can talk about their joint effects and manipulations, and how to apply theories about one to the other. We can frame our discussion of human beliefs in the same way that we talk about robots; humans as self-contained units that rely entirely on their sensors (things that gather information about the world, e.g. eyes) and effectors (things that can change the state of the world, e.g. hands), and build belief-based models of their world based on those interactions.
There are many other philosophies about what it means to be human, but the one I’m holding here is that we don’t escape the machine: we will probably never totally know our objective truths, we’re probably not in the matrix, but we humans are all systems whose beliefs in the world are completely shaped by our physical senses, and those senses are imperfect. We’ll rarely have complete information either (e.g. there are always outside influences that we can’t see), so what we really have are very strong to much weaker beliefs.
There are some beliefs that we accept as truths (e.g. I have a bruise on my leg because I walked into a table today), but mostly we’re basing what we believe on a combination of evidence and personal viewpoint, e.g. “it’s not okay to let people die because they don’t have healthcare”, where that personal viewpoint is formed from earlier evidence and learning.
First technique: uncertain reasoning
So now we have humans and bots framed the same way, let’s look at some of the older ‘bot reasoning systems that might help. First up are belief networks (also known as Bayesian networks, probabilistic networks etc).
Let’s look at the structure of these networks first. This is the classic tiniest network: a way of talking about the interactions between wet grass, rainfall and a sprinkler system (which may or may not be on). The arrows (and not every probabilistic network had arrows) show causality: if it rains, the grass gets wet; if the sprinkler is on, the grass gets wet, if it rains, the sprinkler is less likely to be on. And with this tiny network, we can start to talk about interactions between beliefs that don’t just follow the arrows, e.g. if the grass is wet, why?
This is how we used to do things before the explosion of data that came with the Internet, and everyone and everything generating data across it. Artificial Intelligence was about reasoning systems, where we tried to replicate either experts (e.g. “expert systems”) or the beliefs than an average rational person would hold (e.g. “normative belief”). Without enough data to throw at a deep learning system and allow the system to work out connections, we focussed on building those connections ourselves. And frankly, most expert systems were pretty dumb (“if x then y” repeated, usually with a growing morass of edge cases); the interesting work happened where people were thinking about how to handle uncertainty in machine reasoning.
Here’s that network again, but in its full form, with the interactions between nodes. Each of those links applies Bayes’ theory to describe how our belief in the state of the thing at the end of a set of links, e.g. wet grass, depends on the states of the things at the starts of the links, e.g. sprinkler and rain (aside: I first learnt about this network on the week that I also learnt that running across an American lawn at 7am in summertime could get you soaked). These things chain together, so for instance if I have a weather forecast, I can give a belief about the (future) grass being wet via my belief in (future) rain.
Back in the day there were many uncertain reasoning techniques that we could slot into networks like this: Bayesian probability theory, multi-state logics (e.g. (true, false, unknown) as three possible values for a ‘fact’), modal logics (reasoning about ‘necessary’ and ‘possible’), interval probabilities, fuzzy logic, possibility theory, interval probabilities, but eventually Bayesian probability theory became dominant in the way we thought about probability (and as far as I know, is the only uncertainty theory taught on most data science courses beyond classical frequentist probability). These techniques have influenced other potentially useful tools like structured analytics, and it might be useful to reexamine some of them later.
Meanwhile, even a small network like this raises lots of questions: what about other reasons for wet grass, what if we’re wrong, what if it’s a really important decision (like surgery) etc etc etc. There are framings for these questions too (the frame problem, risk-averse reasoning etc), that could be useful for one of my starting problems: deep learning systems doubling-down on human inequality and bias.
Learning = forming beliefs
We’ve talked a bit about human belief, and about machine representations of uncertainty and belief. The next part of the human-bot connection is to explore how computers have beliefs too, how we’re not thinking about those belief sets in the same way that we think about human beliefs, and about how we really should start to do that.
We train our algorithms like we train our children: in most machine learning, we give them examples of right and wrong; in some machine learning, we correct them when their conclusions aren’t sound. Except we don’t do all the things that we do with children when we train a machine: with children, we install culture, correct ‘errors’ and deviations from that, and parents from the pistis traditions do many other things to help children question their environment and conclusions.
And this lack of the other things is starting to show, and really starting to matter, whether it be researchers touting their great classification scores without wondering if there’s a racial bias in sentencing in China, chatbots mirroring our worst online selves, or race bias being built into government systems that make decisions about people’s lives from education to health to prison sentences all over the USA. We’re starting to talk about the problems here, but we also need to talk about the roots and mitigations of those problems, and some of those might come from asking what the algorithmic equivalent of a good childhood learning is. I don’t have a good answer for that yet, but gut feel is that autonomy theory could be part of that response.
Beliefs can be shared
In machine learning we’re often talking about a single set of beliefs from a single viewpoint as a good representation of the world. When we look at humans, we talk a lot more about other peoples’ beliefs, and the interactions between their belief systems and viewpoints.
When we talk about belief sharing, we’re talking about communication: the ways in which we pass or share beliefs. And communication is totally and routinely hackable, either in-message through the messages used, messages adapted, subtexts created, or by adapting communications channels and using old-school network effects like simple and complex contagion.
As technologists, we spend a lot of our time talking about other people’s beliefs. But we also need to talk about what it means to understand what someone else’s beliefs are. When we talk about other people’s beliefs, we’re really talking about our beliefs about their beliefs which may be offered through a communication channel (e.g. surveys) which is subject to their beliefs about our beliefs and how we’re likely to react to them. This is important, complicated, and something that I had many fascinating meetings about when I worked on intelligence analysis… which also has a set of techniques that could help.
Humans and machines can share beliefs too. There’s a whole field of study and practice (autonomy theory) on how machines (originally UAVs) and humans share control, authority, viewpoint and communication. PACT is one of the early models — a relative youngster at about 15–20 years old; with the rise of things like self-driving cars, new models mirroring this work are appearing, but although there are discussions about sensing and action, there hasn’t been much yet on shared belief and culture.
Yes, culture, because when you mix humans and bots in UAV teams, the same cultural issues that happen when you mix, say, Italian and American pilots in a team, start to happen between the bots and the humans: the bots’ social rules differ, their responses differ and building shared mental models can become strained. When we look at the internet as a mixed human-machine belief system, this earlier work might become important.
Beliefs can be hacked
You can hack computer beliefs; you can also hack human beliefs. Here’s a quote from a recent article about Trump’s election campaign. This looks suspiciously like a computer virus infecting and being carried by humans in their belief systems, with network contagion and adaptation (spreading and learning). We have a crossover with the bots too, with sensors (Facebook likes) and effectors (adverts): this takes the work beyond the simple network effects of memes and phemes (memes carrying false information).
Cambridge Analytica’s system was large but apparently relatively simple, from the outside looking like an adapted marketing system; mixing that with belief and autonomy theories could produce something much more effective, albeit very ethically dubious.
Even when you know better, it’s hard to manage the biases created by someone hacking your belief system. This is a classic optical illusion (Muller-Lyer); I know that the horizontal lines are the same length, I’ve even measured them, but even knowing that, it’s still hard to get my brain to believe it.
Optical illusions are a fun brain hack, and one that we can literally see, but there are subtle and non-visual brain hacks that are harder to detect and counter. We’ll look at some of those later, but one insidious one is the use of association and memory traces: for example, if I borrow your car, tell you as a joke that I’ve crashed it, and immediately tell you it’s a joke, the way that your brain holds new information (by first importing it as true and then denying it — see Daniel Gilbert’s work for more on this) means that you not only keep a little trace of me having crashed your car in your memory, you also keep associated traces of me being an unsafe driver and are less likely to lend me your car again. This is a) why I don’t make jokes like that, and b) the point of pretty much any piece of advertising content ever (“you’re so beautiful if you have this…”).
The Internet is made of belief
The internet is made of many things: pages and and comment boxes and ports and protocols and tubes (for a given value of ‘tubes’). But it’s also made of belief: it’s a virtual space that’s only tangentially anchored in reality, and to navigate that virtual space, we all build mental models of who is out there, where they’re coming from, who or what to trust, and how to verify that they are who they say they are, and what they’re saying is true (or untrue but entertaining, or fantasy, or… you get the picture).
Location independence makes verification hard. The internet is (mostly) location-independent. That affects perception: if I say my favorite color is green, then the option to physically follow me and view the colours I like is only available to a few people; others must either follow my digital traces and my friends’ traces (which can be faked), believe me or decide to hold an open mind until more evidence appears.
When the carrier that we now use for many of our interactions with the world is so easily hacked (for both human and computer beliefs), we need to start thinking of counters to those hacks. These notes are a start at trying to think about what those counters might sensibly be.
Cambridge Analytica basically used customer segmentation and targeting: standard advertising stuff (and some cynicism about that: iirc one of the other campaigns ditched them) that will probably become standard for campaigns if it hasn’t already (full disclosure: am helping out on a campaign). Cool (if unethical) use of surveys as probes though. Get the feeling they didn’t do as much as they could have done, but that was enough. Not sure how I feel about gaming elections right now: part of me says bad, another part that it’s politics as normal, just scaled and personalised.
Meanwhile, on the Democratic side, big data seems to be a problem. We need to fix this. So repeat after me “big data is not data science”. Get the data, study the data, but understand that exploring data is just part of the arc between questions and storytelling, that humans are complex and working with them is about both listening and persuading.
It’s hard to argue with people if you don’t know where they’re coming from. One way is to ask: engage with people who are vehemently disagreeing with you, find out more about them as people, about their environments and motives. Which definitely should be done, but it also helps to do some background reading…
The Guardian’s started in on this: a round-up of 5 non-liberal articles every week, complete with backgrounder on each author and why the article is important. It doesn’t hurt that some of these authors are friends of friends and therefore maybe approachable with some questions. It’s also worth checking out things like BlueFeed RedFeed.
I’ve taken some flack lately for trying to understand Trump supporters. I’m slowly coming round to amending that to trying to understand Trump voters — especially the ones who voted with their noses held.
Us vs Them
I still can’t be pulled into “them vs us”. God it would be easier sometimes to build barricades and see ‘them’ as evil people supporting an evil leader, but truth is we’re all human, and unless we get out of this together, it’s going to be way beyond hard to get out at all.
I spent the lead-up to the election in a house in Trump country (neighbours with guns and banners about them having guns and I don’t mean in a jolly countryside lets bag some pheasants kinda way; Trump signs and flags everywhere including my local pub; that damned hat on people I’d spend time at the bar with) and whilst I wouldn’t necessarily ask for empathy (those flags were on some damn nice houses), I would ask for understanding the narratives and doing some deep soul-searching to see if any of them might be true, because that’s where we start the honest conversations about where we are today.
And yes, I’m going to fight by every nonviolent means possible too — I have much less to lose than others around me (older, no children, nobody who depends on me); I’m hoping that if enough of us fight that way, we’ll never have to fight in the alternative.
Could Ring Theory help?
I’ve been thinking a lot today about people, their interactions, margins, fairness and how to be a better ally, friend and compatriot (thank you Barnaby for making me think hard about this). We humans are complex beasts: en masse it’s hard to apply things like Ring Theory (the “comfort in, dump out” theory for comforting individuals and the layers of people around them that we often talked about in the Crisismappers’ Cancer Survival Chat — yes, there was such a thing, yes there were many people you wouldn’t expect in it, no I didn’t — I was there as an admin/ friend…) because everyone has different pain from different things, especially when that pain is being deliberately seeded in many areas at once. I’m not sure what the answer is, but mutual compassion, respect and walking in the other guys’ shoes have definitely got to be in there somewhere.
Bottom line: understand where other people are coming from. See if you can get them to understand where you are coming from too. Am not talking about the trolls (ignore them), but the people who are genuinely trying to argue with you…
Thinking about #fakenews. Starting with “what is it”.
* We’re not dealing with truth here: we’re dealing with gaming belief systems. That’s what fake news does (well, one of the things; another thing it does is make money from people reading it), and just correcting fake news is aiming at the wrong thing. Because…
* Information leaves traces in our heads, even when we know what’s going on. If I jokingly tell you that I’ve crashed your car, then go ‘ha ha’, you know that I didn’t crash your car, but I’ve left a trace in your head that I’m an unsafe driver. The bigger the surprise of the thing you initially believe, the bigger the trace it leaves (this is why I never make jokes like that).
* That’s important because #fakenews isn’t about the thing that’s being said. It’s about the things that are being implied. Always look for the thing being implied. That’s what you have to counter.
* Some of those things are, e.g. “Liberals are unpatriotic”. “Terrorists are a real and present threat *to you*”. Work out counters for these, and mechanisms for those counters. F’example: wearing US flags at protests and being loudly patriotic whilst standing up for basic rights is a good idea.
* Yes, straighten the record, but you’re not aiming at the person (or site) spouting fake news. What you *are* trying to change is their readers’ belief in whether something is true.
* America is a big country. Not everyone can go and see what’s true or not. Which means they have to trust someone else to go look for them. The Internet is even bigger. Some of the things on it (e.g. beliefs about other people’s beliefs) don’t have physical touchpoints and are impossible to confirm or deny as ‘truth’.
* Which means you’re trying to change the beliefs of large groups of people, who have a whole bunch of trust issues (both overtrust for in-group, and serious distrust of out-group people) and no direct proof.
* You know who else hacks trust and beliefs in large groups? Salesmen and advertisers. Learn from them (oh, and propagandists, but you might want to be careful what you learn there).
* People often hold conflicting beliefs in their heads (unless they’re Aspie: Aspies have a hard time doing this). Niggling doubts are levers, even when people are still being defensive and doubling-down on their stated beliefs. Look for the traces of these.
* But go gentle. Create too much cognitive dissonance, and people will shut down. Learn from the salesmen on this.
* People are more likely to trust people they know. Get to know the people whose beliefs you want to change (even if it means hanging out in conservative chat channels). Also know that your attention is a resource: learn to distinguish between people who are engaged and might listen (hint: they’re often the ones shouting at you), people who won’t, and sock puppets.
* More advertising tricks: look for influencers (not just on Twitter ‘cos it’s easy goddammit; check in the real world too). There’s only one you: use that you wisely.
* A field guide to earthlings (the Aspie reference)
* Social psychology: a very short introduction
“Most people don’t have the time or headspace to handle IW: we’re going to need to tool up. Is not much, but I’m talking next month on belief, and how some of the pre-big-data AI tools and verification methods we used in mapping could be useful in this new (for many) IW world… am hoping it sparks a few people to build stuff.” — me, whilst thoroughly lost somewhere in Harlem.
Dammit. I’ve started talking about belief and information warfare, and my thoughts looked half-baked and now I’m going to have to follow through. I said we’d need to tool up to deal with the non-truths being presented, but that’s only a small part of the thought. So here are some other thoughts.
1) The internet is also made of beliefs. The internet is made of many things: pages and and comment boxes and ports and protocols and tubes (for a given value of ‘tubes’). But it’s also made of belief: it’s a virtual space that’s only tangentially anchored in reality, and to navigate that virtual space, we all build mental models of who is out there, where they’re coming from, who or what to trust, and how to verify that they are who they say they are, and what they’re saying is true (or untrue but entertaining, or fantasy, or… you get the picture).
2) This isn’t new, but it is bigger and faster. The US is a big country; news here has always been either hyperlocal or spread through travelers and media (newspapers, radio, telegrams, messages on ponies). These were made of belief too. Lying isn’t new; double-talk isn’t new; what’s new here is the scale, speed and number of people that it can reach.
3) Don’t let the other guys frame your reality. We’re entering a time where misinformation and double-talk are likely to dominate our feeds, and even people we trust are panic-sharing false information. It’s not enough to pick a media outlet or news site or friend to trust, because they’ve been fooled recently too; we’re going to have to work out together how best to keep a handle on the truth. As a first step, we should separate out our belief in a source from our belief in a piece of information from them, and factor in our knowledge about their potential motivations in that.
4) Verification means going there. For most of us, verification is something we might do up front, but rarely do as a continuing practice. Which, apart from making people easy to phish, also makes us vulnerable to deliberate misinformation. We want to believe stuff? We need to do the leg-work of cross-checking that the source is real (crisismappers had a bunch of techniques for this, including checking how someone’s social media profile had grown, looking at activity patterns), finding alternate sources, getting someone to physically go look at something and send photos (groups like findyr still do this). We want to do this without so much work every time? We need to share that load; help each other out with #icheckedthis tags, pause and think before we hit the “share” button.
5) Actions really do speak louder than words. There will most likely be a blizzard of information heading our way; we will need to learn how to find the things that are important in it. One of the best pieces of information I’ve ever received (originally, it was about men) applies here: “ignore everything they say, and watch everything they do”. Be aware of what people are saying, but also watch their actions. Follow the money, and follow the data; everything leaves a trace somewhere if you know how to look for it (again, something that perhaps is best done as a group).
6) Truth is a fragile concept; aim for strong, well-grounded beliefs instead. Philosophy warning: we will probably never totally know our objective truths. We’re probably not in the matrix, but we humans are all systems whose beliefs in the world are completely shaped by our physical senses, and those senses are imperfect. We’ll rarely have complete information either (e.g. there are always outside influences that we can’t see), so what we really have are very strong to much weaker beliefs. There are some beliefs that we accept as truths (e.g. I have a bruise on my leg because I walked into a table today), but mostly we’re basing what we believe on a combination of evidence and personal viewpoint (e.g. “it’s not okay to let people die because they don’t have healthcare”). Try to make both of those as strong as you can.
I haven’t talked at all about tools yet. That’s for another day. One of the things I’ve been building into my data science practice is the idea of thinking through problems as a human first, before automating them, so perhaps I’ll roll these thoughts around a bit first. I’ve been thinking about things like perception, e.g. a camera’s perception of a car color changes when it moves from daylight to sodium lights, and adaptation (e.g. using other knowledge like position, shape and plates) and actions (clicking the key) and when beliefs do and don’t matter (e.g. they’re usually part of an action cycle, but some action cycles are continuous and adaptive, not one-shot things), how much of data work is based on chasing beliefs and what we can learn from people with different ways of processing information (hello, Aspies!), but human first here.
I opened a discussion on the ethics of algorithms recently, with a small thing about what algorithms are, what can be unethical about them and how we might start mitigating that. I kinda sorta promised a blogpost off that, so here it is.
Al-Khwarizmi (wikipedia image)
Let’s start by demystifying this ‘algorithm’ thing. An algorithm (from Al-Khwārizmī, a 9th-century Persian mathematician, above) is a sequence of steps to solve a problem. Like the algorithm to drink coffee is to get a mug, add coffee to the mug, put the mug to your mouth, and repeat. An algorithm doesn’t have to be run a computer: it might be the processes that you use to run a business, or the set of steps used to catch a train.
But the algorithms that the discussion organizers were concerned about aren’t the ones used to not spill coffee all over my face. They were worried about the algorithms used in computer-assisted decision making in things like criminal sentencing, humanitarian aid, search results (e.g. which information is shown to which people) and the decisions made by autonomous vehicles; the algorithms used by or instead of human decision-makers to affect the lives of other human beings. And many of these algorithms get grouped into the bucket labelled “machine learning”. There are many variants of machine learning algorithm, but generally what they do is find and generalize patterns in data: either in a supervised way (“if you see these inputs, expect these outputs; now tell me what you expect for an input you’ve never seen before, but is similar to something you have”), reinforcement-learning way (“for these inputs, your response is/isn’t good) or unsupervised way (“here’s some data; tell me about the structures you see in it”). Which is great if you’re classifying cats vs dogs or flower types from petal and leaf measurements, but potentially disastrous if you’re deciding who to sentence and for how long.
Simple neural network (wikipedia image)
Let’s anthropomorphise that. A child absorbs what’s around them: algorithms do the same. One use is to train the child/machines to reason or react, by connecting what they see in the world (inputs) with what they believe the state of the world to be (outputs) and the actions they take using those beliefs. And just like children, the types of input/output pairs (or reinforcements) we feed to a machine-learning based system affects the connections and decisions that it makes. Also like children, different algorithms have different abilities to explain why they made specific connections or responded in specific ways, ranging from clear explanations of reasoning (e.g. decision trees, which make a set of decisions based on each input) to something that can be mathematically but not cogently expressed (e.g. neural networks and other ‘deep’ learning algorithms, which adjust ‘weights’ between inputs, outputs and ‘hidden’ representations, mimicking the ways that neurons connect to each other in human brains).
Algorithms can be Assholes
Algorithms are behind a lot of our world now. e.g. Google (which results should you be shown), Facebook (which feeds you should see), medical systems detecting if you might have cancer or not. And sometimes those algorithms can be assholes.
Here are two examples: a Chinese program that takes facial images of ‘criminals’ and maps those images to a set of ‘criminal’ facial features that the designers claim have nearly 90% accuracy in determining if someone is criminal, from just their photo. Their discussion of “the normality of faces of non-criminals” aside, this has echoes of phrenology, and should raise all sorts of alarms about imitating human bias. The second example is a chatbot that was trained on Twitter data; the headline here should not be too surprising to anyone who’s recently read any unfiltered social media.
We make lots of design decisions when we create an algorithm. One decision is which dataset to use. We train algorithms on data. That data is often generated by humans, and by human decisions (e.g. “do we jail this person”), many of which are imperfect and biased (e.g. thinking that people whose eyes are close together are untrustworthy). This can be a problem if we use those results blindly, and we should always be asking about the biases that we might consciously or unconsciously be including in our data. But that’s not the only thing we can do: instead of just dismissing algorithm results as biased, we can also use them constructively, to hold a mirror up to ourselves and our societies, to show us things that we otherwise conveniently ignore, and perhaps should be thinking about addressing in ourselves.
In short, it’s easy to build biased algorithms with biased data, so we should strive to teach algorithms using ‘fair’ data, but when we can’t, we need to use other strategies for our models of the world, and can either talk about the terror of biased algorithms being used to judge us, or we can think about what they’re showing us about ourselves and our society’s decision-making, and where we might improve both.
What goes wrong?
If we want to fix our ‘asshole’ algorithms and algorithm-generated models, we need to think about the things that go wrong. There are many of these:
On the input side, we have things like biased inputs or biased connections between cause and effect creating biased classifications (see the note on input data bias above), bad design decisions about unclean data (e.g. keeping in those 200-year-old people), and missing whole demographics because we didn’t think hard about who the input data covered (e.g. women are often missing in developing world datasets, mobile phone totals are often interpreted as 1 phone per person etc).
On the algorithm design side, we can have bad models: lazy assumptions about what input variables actually mean (just think for a moment of the last survey you filled out, the interpretations you made of the questions, and how as a researcher you might think differently about those values), lazy interpretations of connections between variables and proxies (e.g. clicks == interest), algorithms that don’t explain or model the data they’re given well, algorithms fed junk inputs (there’s always junk in data), and models that are trained once on one dataset but used in an ever-changing world.
On the output side, there’s also overtrust and overinterpretation of outputs. And overlaid on that are the willful abuses, like gaming an algorithm with ‘wrong’ or biased data (e.g. propaganda, but also why I always use “shark” as my first search of the day), and inappropriate reuse of data without the ethics, caveats and metadata that came with the original (e.g. using school registration data to target ‘foreigners’).
But that’s not quite all. As with the outputs of humans, the outputs of algorithms can be very context-dependent, and we often make different design choices, depending on that context, for instance last week, when I found myself dealing with a spammer trying to use our site at the same time as helping our business team stop their emails going into customers’ spam filters. The same algorithms, different viewpoints, different needs, different experiences: algorithm designers have a lot to weigh up every time.
Things to fight
Accidentally creating a deviant algorithm is one thing; deliberately using algorithms (including well-meant algorithms) for harm is another, and of interest in the current US context. There are good detailed texts about this, including Cathy O’Neill’s work, and Latzer, who categorised abuses as:
Property right violations
Market power abuses
Cognitive effects (e.g. loss of human skills)
Heteronomy (individuals no longer have agency over the algorithms influencing them)
I’ll just note that these things need to be resisted, especially by those of us in a position to influence their propagation and use.
How did we get here?
Part of the issue above is in how we humans interface with and trust algorithm results (and there are many of these, e.g. search, news feed generators, recommendations, recidivism predictions etc), so let’s step back and look at how we got to this point.
And we’ve got here over a very long time: at least a century or two, to back when humans started using machines that they couldn’t easily explain because they could do tasks that had become too big for the humans. We automate because humans can’t handle the data loads coming in (e.g. in legal discovery, where a team often has a few days to sift through millions of emails and other organizational data); we also automate because we hope that machines will be smarter than us at spotting subtle patterns. We can’t not automate discovery, but we also have to be aware of the ethical risks in doing it. But humans working with algorithms (or any other automation) tend to go through cycles: we’re cynical and undertrust a system tip it’s “proved”, then tend to overtrust its results (these are both part of automation trust). In human terms, we’re balancing these things:
Missed patterns and overlooked details
Situation awareness loss (losing awareness because algorithms are doing processing for us, creating e.g. echo chambers)
Passive decision making
Discrimination, power dynamics etc
And there’s an easy reframing here: instead of replacing human decisions with automated ones, let’s concentrate more on sharing, and frame this as humans plus algorithms (not humans or algorithms), sharing responsibility, control and communication, between the extremes of under- and over-trust (NB that’s not a new idea: it’s a common one in robotics).
Things I try to do
Having talked about what algorithms are, what we do as algorithm designers, and the things that can and do go wrong with that, I’ll end with some of the things I try to do myself. Which basically comes down to consider ecosystems, and measure properly.
Considering ecosystems means looking beyond the data, and at the human and technical context an algorithm is being designed in. It’s making sure we verify sources, challenge both the datasets we obtain and the algorithm designers we work with, and have a healthy sense of potential risks and their management (e.g. practice both data and algorithm governance), and reduce bias risk by having as diverse (in thought and experience) a design team as we can, and access to people who know the domain we’re working in.
Measuring properly means using metrics that aren’t just about how accurately the models we create fit the datasets we have (this is too often the only goal of a novice algorithm designer, expressed as precision = how many of the things you labelled as X are actually X; and recall = how many of the things that are really X did you label as X?), but also metrics like “can we explain this” and “is this fair”. It’s not easy but the alternative is a long way from pretty.
I once headed an organisation whose only rule was “Don’t be a Jerk”. We need to think about how to apply “Don’t be a jerk” to algorithms too.