… if you’re looking for my most recent work, check https://medium.com/@sarajayneterp
[Cross-post from Medium https://medium.com/misinfosec/disinformation-datasets-8c678b8203ba]
“Genius is Knowing Where To Look” (Einstein)
I’m often asked for disinformation datasets — other data scientists wanting training data, mathematician friends working on things like how communities separate and rejoin, infosec friends curious about how cognitive security hacks work. I usually point them at the datasets section on my Awesome Misinformation repo, which currently contains these lists:
- Nationstate-level social media messages: Twitter Election Integrity (state-backed infoops archives: regularly updated), Grafika Information Operations Archive (2018 Twitter and Reddit IRA datasets), 538 list of IRA tweets (2012–2018), Ushadrons (-2018), Hamilton 2.0 (continously updated), authoritarian interference tracker (regularly updated)
- Online advertisements: House intelligence facebook ads sample and House Intelligence Committee Facebook ads — static collections
- Bot/Cyborg/Troll accounts: botsentinel, probabot (rates accounts from bot to not) — continually updated
- Websites and articles: fake news challenge (articles: static collection), false, misleading, clickbaity and satirical ‘news’ sources (the original ‘Melissa’ list, 2016: static list), GDI dataset (domains, articles — regularly updated)
- Pre-2016 misinformation datasets: Pheme 8.2 annotated news corpus, Jonathan Albricht’s datasets of bot posts (to 2016) — static collections
That’s just the data that can be downloaded. There’s a lot of implicit disinformation data out there. For example groups like EUvsDisinfo, NATO Stratcom, OII Comprop all have structured data on their websites.
You’re not going to get it all
That’s a place to start, but there’s a lot more to know about disinformation data sources. One of them, as pointed out by Lee Foster at Cyberwarcon, last month, is that these datasets are rarely a complete picture of disinformation around an event. Lee’s work is interesting: he did what many of us do: as soon as an event kicked off, his team started collecting social media data around it. What they did next was to compare that dataset against the data output officially by social media companies (Twitter, Facebook). There were gaps — big gaps — in the officially released data; understandable in a world where attribution is hard and campaigns work hard to include non-trollbots (aka ordinary people) in the spread of disinformation.
Some people can get better data than others. For instance, Twitter’s IRA dataset has obfuscated user ids, but academics can ask for unobfuscated data. It’s worth asking, but also worth asking yourself about the limitations placed on you by things like non-disclosure agreements.
I’ve seen this before
So what happens is that people who are serious about this subject collect their own data. And lots of them collect data at the same time. Which sits on their personal drives (or somewhere online) whilst other researchers are scrabbling round for datasets on events that have passed. I’ve seen this before. Everything old is new again — and in this case, it’s just what we saw with crisismapping data. There were people all over the world — locals, humanitarian workers, data enthusiasts etc, who had data that was useful in each disaster that hit, which meant that a large part of my work as a crisis data nerd was quietly gently extracting that data from people and getting it to a place online where it could be used, in a form that it could be found. We volunteers built an online repository, the Humanitarian Data Project, which informed the build of the UN’s repository, the Humanitarian Data Exchange — I also worked with the Humanitarian Data Language team on ways to meta-tag datasets so the data needed was easier to find. There’s a lot of learning in there to be transferred.
And labelled data is precious, so very precious
Disinformation is being created at scale, and at a scale beyond the ability of human tagging teams. That means we’re going to need automation (or rather augmentation — automating some of the tasks so the humans can still do the ‘hard’ parts of finding, labelling and managing disinformation campaigns, their narratives and artefacts). And to do that, we generally need datasets that are labelled in some way, so the machines can ‘learn’ from the humans (there are other ways to learn, like reinforcement learning, but I’ll talk about them another time). Unsurprisingly, there is very little in the way of labelled data in this world. The Pheme project labelled data; I helped with the Jigsaw project on a labelled dataset that was due for open release; I’ve also helped create labelling schemes for data at GDI, and am watching conversations about starting labelling projects at places like the Credibility Coalition.
That’s it — that’s a start on datasets for disinformation research. This is a living post, so if there are more places to look please tell me and I’ll update this and other notes.
When I talk about security going back to thinking about the combination of physical, cyber and cognitive, people sometimes ask me why now? Why, apart from the obvious weekly flurries of misinformation incidents, are we talking about cognitive security now?
Big, Fast, Weird
I usually answer with the three Vs of big data: volume, velocity, variety (the fourth V, veracity, is kinda the point of disinformation, so we’re leaving it out of this discussion).
- The internet has a lot of text data floating around it, but its variety isn’t just in all the different platforms and data formats needed to scrape or inject into it — it’s also in the types of information being carried. We’re way past the Internet 1.0 days of someone posting the sports scores online and a bunch of hackers lurking on bulletin boards: now everyone and their grandmother is here, and the (sniffable, actionable and adjustable) data flows include emotions, relationships, group sentiment (anyone thinking about market sentiment should be at least a little worried by now) and group cohesion markers.
- There’s a lot of it — volumes are high enough that brands and data scientists can spend their days doing social media analysis, looking at cliques, message spread, adaption and reach.
- And it’s coming in fast: so fast that an incident manager can do AB-testing on humans in real time, adapting messages and other parts of each incident to fit the environment and head towards incident goals faster, more efficiently etc. Ideally that adaptation is much faster than any response, which fits the classic definition of “getting inside the other guy’s OODA loop”.
NB The internet isn’t the only system carrying these things: we still have traditional media like radio, television and newspapers, but they’re each increasingly part of these larger connected systems.
So what next?
Another question I get a lot is “so what happens next”. Usually I answer that one by pointing people at two books: The Cuckoo’s Egg and Walking Wounded — both excellent books about the evolution of the cybersecurity industry (and not just because great friends feature in them), and say we’re at the start of The Cuckoo’s Egg, where Stoll starts noticing there’s a problem in the systems and tracking the hackers through them.
I think we’re getting a bit further through that book now. I live in America. Someone sees a threat here, someone else makes a market out of it. Cuddle-an-alligator — tick. Scorpion lollipops in the supermarket — yep. Disinformation as a service / disinformation response as a service — also in the works, as predicted for a few years now. Disinformation response is a market, but it’s one with several layers to it, just as the existing cybersecurity market has specialists and sizes and layers.
Markets: sometimes botany, sometimes agile
Frank is a very wise, very experienced friend (see books above), who calls our work on AMITT “botany” — building catalogs of techniques and counters slower than the badguys can maraud across our networks, when we really should be out there chasing them. He’s right. Kinda.
I read Adam Shostack’s slides on threat modelling in 2019 today. He talks about the difference between “waterfall” (STRIDE, kill chain etc) and “agile” threat modelling. I’ve worked on both: on big critical systems that used waterfall/“V” methods because you don’t really get to continuously rebuild an aircraft or ship design, and on agile systems that we trialled with and adapted to end-user needs. (I’ve also worked on lean production, where classically speaking, agile is where you know the problemspace and are iterating over solutions, and lean is iterations on both the problem and solution spaces. This will become important later). This is one of the splits: we’ll still need the slower, deliberative work that gives labels and lists defences and counters for common threats (the “phishing” etc equivalents of cognitive security), but we also need that rapid response to things previously unseen that keeps white-hat hackers glued to their screens for hours, and there’s a growing market too in tools to support them. (as an aside, I’m part of a new company, and this agile/ waterfall split finally gives me a word to describe “that stuff over there that we do on the fly”).
Also because I’m old I can remember when universities had no clue where to put their computer science group — it was sometimes in the physics department, sometimes engineering, or maths, or somewhere wierder still; later on, nobody quite knew where to put data scientists as they cut across disciplines and used techniques from wherever made sense, from art to hardcore stats. This market will shake out that way too. Some of the tools, uses and companies will end up as part of day-to-day infosec. Others will be market-specific (Media and adtech are already heading that way); others again will meet specific needs on the “influence chain”, like educational tools and narrative trackers. Perhaps a good next post would be an emerging-market analysis?
At Truth&Trust Online, someone asked me about the overlaps between misinformation research and IoT security. There’s more than you’d think, and not just in the overlaps between people like Chris Blask who are working on both problem sets.
I stopped for a second, then went “Oh. I recognize this problem. It’s exactly what we did with data and information fusion (and knowledge fusion too, but you know a lot of that now as just normal data science and AI). Basically it’s about what happens when you’re building situation pictures (mental models of what is happening in the world) based on data that’s come from people (the misinformation, or information fusion part) and things (the IoT, or data fusion part). And what we basically did last time was run both disciplines separately – the text analysis and reasoning in a different silo to the sensor-based analysis and reasoning – til it made sense to start combining them (which is basically what became information fusion). That’s how we got the *last* pyramid diagram – the DIKW model of data under information under knowledge under wisdom (sorry: it’s been changed to insight now), and similar ideas of transformations and transitions in information (in the Shannon information theory sense of the word) between layers.
We’ll probably do a similar thing now. Both disciplines feed into situation pictures; both can be used to support (or refute) each other. Both contain all the classics like protecting information CIA: confidentiality, integrity, accessibility. I tasked two people at the conference to start delving into this area further (and connect to Chris) – will see where this goes.
[Cross-post from Medium https://medium.com/misinfosec/short-thought-the-unit-should-be-person-not-account-81c48002aaa]
I’ve been thinking, inspired partly by Amy Zhang’s paper on mailing lists vs social media use https://twitter.com/amyxzh/status/1173812276211662848?s=20
We have a bunch of issues with online identity. Like, I have at least 20 different ways to contact some of my friends, send half my life trying to separate out people being themselves from massive coordinated cross-platform campaigns, and dozens of issues with privacy, openness (like do we throw a message into the infinite beerhall that’s twitter or deliberately email to just a few chosen peeps). How much of this has happened because our base unit of contact has changed from an individual human to an online account?
I’m wondering if there’s a way to switch that back again. Zeynep Tufecki said that people stayed on Facebook despite its shortcomings because that’s where the school emergency alerts, group organisation etc were. What if we could make those things platform-independent again? I mean we have APIs, yes? They’re generally broadcast, or broadcast-and-feedback, yes?
I guess this is two ideas. One is to challenge the idea that everything has to be instant-to-instant. Yeah, sure, we want to chat with our friends. But do we really need instant chat on everything? If we drop that, can we build healthier models?
The second idea is to challenge the account-as-user idea. Remember addressbooks? Like those real physical paper books that you listed your friends, family etc names, addresses, phone numbers, emails etc in? What if we had a system that went back to that, and when you sent a message to someone it went to their system of choice in your style of choice (dm, group, public etc). I get that you’re all unique etc, and I’m still cool with some of you having multiple personalities, but this 20 ways to contact a person — that’s got old, and fast.
The third (because who doesn’t like a fourth book in a trilogy) is to give people introvert time. Instead of having control over our electronic lives by putting down the electronics, have a master switch for “only my mother can contact me right now”.
[Cross-post from Medium https://medium.com/misinfosec/writing-about-countermeasures-1671d231e8a2]
The AMITT framework so far is a beautiful thing — we’ve used it to decompose different misinformation incidents into stages and techniques, so we can start looking for weak points in the ways that incidents are run, and in the ways that their component parts are created, used and put together. But right now, it’s still part of the “admiring the problem” collection of misinformation tools -to be truly useful, AMITT needs to contain not just the breakdown of what the blue team thinks the red team is doing, but also what the blue team might be able to do about it. Colloquially speaking, we’re talking about countermeasures here.
Go get some counters
Now there are several ways to go about finding countermeasures to any action:
- Go look at ones that already exist. We’ve logged a few already in the AMITT repo, against specific techniques — for example, we listed a set of counters from the Macron election team as part of incident I00022.
- Pick a specific tactic, technique or procedure and brainstorm how to counter it — the MisinfosecWG did this as part of their Atlanta retreat, describing potential new counters for two of the techniques on the AMITT framework.
- Wargame red v blue in a ‘safe’ environment, and capture the counters that people start using. The Rootzbook exercise that Win and Aaron ran at Defcon AI Village was a good start on this, and holds promise as a training and learning environment.
- Run a machine learning algorithm to generate random countermeasures until one starts looking more sensible/effective than the others. Well, perhaps not, but there’s likely to be some measure of automation in counters eventually…
Learn from the experts
So right. We get some counters. But hasn’t this all been done before. Like if we’re following the infosec playbook to do all this faster this time around (we really don’t have 20 years to get decent defences in place — we barely have 2…) then shouldn’t we look at things like courses of action matrices? Yes. Yes we should…
Courses of Action Matrix 
So this thing goes with the Cyber Killchain — the thing that we matched AMITT to. Down the left side we have the stages; 7 in this case; 12 in AMITT. Along the top we have six things we can do to disrupt each stage. And in each grid square, we have a suggestion of an action (I suspect there are more than one of these for each square) that we could take to cause that type of disruption at that stage. That’s cool. We can do this.
The other place we can look is at our other parent models, like the ATT&CK framework, the psyops model, marketing models etc, and see how they modelled and described counters too — for example, the mitigations for ATT&CK T1193 Spearphishing.
Make it easy to share
Checking parent models is also useful because this gives us formats for our counter objects— which is basically that these are of type “mitigation”, and contain a title, id, brief description and list of techniques that they address. Looking at the STIX format for course-of-action gives us a similarly simple format for each counter against tactics — a name, description and list of things it mitigates against.
We want to be more descriptive whilst we find and refine our list of counters, so we can trace our decisions and where they came from. A more thorough list of features for a counter list would probably include:
- brief description
- list of tactics can be used on
- list of techniques can be used on
- expected action (detect, deny etc)
- who could take this action (this isn’t in the infosec lists, but we have many actors on the defence side with different types of power, so this might need to be a thing)
- anticipated effects (both positive and negative — also not in the infosec lists)
- anticipated effort (not sure how to quantify this — people? money? hours? but part of the overarching issue is that attacks are much cheaper than defences, so defence cost needs to be taken into account)
And be generated from a cross-table of counters within incidents, which looks similar to the above, but also contains the who/where/when etc:
- brief description
- list of tactics it was used on
- list of techniques it was used on
- action (detect, deny etc)
- who took this action
- effects seen (positive and negative)
- resources used
- incident id (if known)
- date (if known)
- counters-to-the-counter seen
“Boundaries for Fools, Guidelines for the Wise…”
At this stage, older infosec people are probably shaking their heads and muttering something about stamp collecting and bingo cards. We get that. We know that defending against a truly agile adversary isn’t a game of lookup, and as fast as we design and build counters, our counterparts will build counters to the counters, new techniques, new adaptations of existing techniques etc.
But that’s only part of the game. Most of the time people get lazy, or get into a rut — they reuse techniques and tools, or it’s too expensive to keep moving. It makes sense to build descriptions like this that we can adapt over time. It also helps us spot when we’re outside the frame.
Right. Time to get back to those counters.
[Cross-post from Medium https://medium.com/misinfosec/responses-to-misinformation-885b9d82947e]
There is no one, magic, response to misinformation. Misinformation mitigation, like disease control, is a whole-system response.
MisinfosecWG has been working on infosec responses to misinformation. Part of this work has been creating the AMITT framework, to provide a way for people from different fields to talk about misinformation incidents without confusion. We’re now starting to map out misinformation responses, e.g.
- At the technique level — T0025 leak altered documents was countered in France during the Macron election.
- At the tactic level — we can create a courses of action matrix that lists ways to detect, deny, disrupt, degrade, deceive or destroy activities in each tactic stage.
- At the procedure level — we can look at sequences of responses that may be more effective than individual responses in isolation.
Today I’m sat in Las Vegas, watching the Rootzbook misinformation challenge take shape. I’m impressed at what the team has done in a short period of time (and has planned for later). It also has a place on the framework — specifically at the far-right of it, in TA09 Exposure. Other education responses we’ve seen so far include:
- Immunisation through gameplay “pre-bunking”, e.g. the game https://getbadnews.com/#intro
- Education on specific techniques, e.g. the pineapple pizza education on division tactics
- The Finnish education model
- Other counters being explored by groups like the CredCo media literacy working group
Education is an important counter, but won’t be enough on its own. Other counters that are likely to be trialled with it include:
- Tracking data providence to protect against context attacks (digitally sign media and metadata in a way that media includes the original URL in which it was published and private key is that of the original author/publisher)
- Forcing products altered by AI/ML to notify their users (e.g. there was an effort to force Google’s very believable AI voice assistant to announce it was an AI before it could talk to customers)
- Requiring legitimate news media to label editorials as such
- Participating in the Cognitive Security Information Sharing and Analysis Organization (ISAO)
- Forcing paid political ads on the Internet to follow the same rules as paid political advertisements on television
- Baltic community models, e.g. Baltic “Elves” teamed with local media etc
Jonathan Stray’s paper “Institutional Counter-disinformation Strategies in a Networked Democracy” is a good primer on counters available on a national level.
I’m one of the DEFCON AI Village core team, and there’s quite a bit of disinformation activity in the Village this year, including:
- Saturday 10:00 AM: Rand Waltzman “MD: Multimedia Disinformation – Is there a Doctor in the House?!”
- Rootz misinformation CTF for kids
- And lots of stuff on deep fakes and other things that affect disinformation
Why talk about disinformation* at a hacking event? I mean, shouldn’t it be in the fluffy social science candle-waving events instead? What’s it doing in the AI Village? Isn’t it all a bit, kinda, off-topic?
Nope. It’s in exactly the right place. Misinformation, or more correctly its uglier cousin, disinformation, is a hack. Disinformation takes an existing system (communication between very large numbers of people) apart and adapts it to fit new intentions – whether that’s temporarily destroying the system’s ability to function (the “Division” attacks that we see on people’s trust in each other and in the democratic systems that they live within), changing system outputs (influence operations to dissuade opposition voters or change marginal election results) or making the system easy to access and weaken from outside (antivax and other convenient conspiracy theories). And a lot of this is done at scale, at speed, and across many different platforms and media – which if you remember your data science history is the three Vs: volume, variety and velocity (there was a fourth v: veracity, but erm misinformation guys!)
And the AI part? Disinformation is also called Computational Propaganda for a reason. So far, we’ve been relatively lucky: the algorithms used by disinformation’s current ruling masters, Russia, Iran et al, have been fairly dumb (but still useful). We had bots (scripts pretending to be social media users, usually used to amplify a message, theme or hashtag til algorithms fed it to to real users) so simple you could probably spot them from space – like, seriously, sending the same message 100s of times a day at a rate even Win (who’s running the R00tz bots exercise at AI Village) can’t type at, backed up by trolls – humans (the most famous of which were in the Russian Internet Research Agency) spreading more targetted messages and chaos, with online advertising (and its ever so handy demographic targetting) for more personalised message delivery. That luck isn’t going to last. Isn’t lasting. Bots are changing. The way they’re used is changing. The way we find disinformation is changing (once, sigh, it was easy enough to look for #qanon on twitter, to find a whole treasure trove of crazy).
The disinformation itself is starting to change: goodbye straight-up “fake news” and hunting for high-frequency messages, hello more nuanced incidents that means anomaly detection and pattern-finding across large volumes of disparate data and its connections. And as a person who’s been part of both MLsec (the intersection of machine learning/AI and information security), and Mmisinfosec (the intersection of misinformation and information security), I *know* that looks just like a ‘standard’ (because hell, there are no standards but we’ll pretend for a second there are) MLsec problem. And that’s why there’s disinformation in the AI village.
If you get more curious about this, there’s a whole separate community, Misinfosec http://misinfosec.org, working on the application of information security principles to misinformation. Come check us out too.
* “Is there a widely accepted definition of mis vs disinformation?” Well not really, not yet (there’s lots of discussion about it in places like the Credibility Coalition‘s Terminology group, reading papers like Fallis’ “what is Disinformation?“. Clare Wardle’s definitions of dis, mis, and mal information are used a lot. But most active groups pick a definition and get on with the work – for instance this is MisinfosecWG’s working definition “We use misinformation attack (and misinformation campaign) to refer to the deliberate promotion of false, misleading or mis-attributed information. Whilst these attacks occur in many venues (print, radio, etc), we focus on the creation, propagation and consumption of misinformation online. We are especially interested in misinformation designed to change beliefs in a large number of people.” and my personal one is that we’re heading towards disinformation as the mass manipulation of beliefs that isn’t necessarily with fake content (text, images, videos etc) but usually includes fake context (misattribution of source, location, date, context etc) and use of real content to manipulate emotion in specific directions. Honestly, it’s like trying to define pornography – trying to find the right definitions is important, but can get in the way of the work of keeping it out of the mainstream, and if it’s obvious, it’s obvious. We’ll get there, but in the meantime, there’s work to do.
[Cross-posted from Medium https://misinfocon.com/misinformation-has-stages-7e00bd917108]
Now we just need to work out what the stages should be…
The Credibility Coalition’s Misinfosec Working Group (“MisinfosecWG”) maps information security (infosec) principles onto misinformation. Our current work is to develop a tactics, techniques and procedures (TTP) based framework that gives misinformation researchers and responders a common language to discuss and disrupt misinformation incidents.
We researched several existing models from different fields, looking for a model that was both well-supported and familiar to people, and well suited for the variety of global misinformation incidents that we were tracking. We fixed on stage-based models, that divide an incident into a sequence of stages, e.g. “recon” or “exfiltration”, and started work mapping known misinformation incidents to the ATT&CK framework, which is used by the infosec community to share information about infosec incidents. Here’s the ATT&CK framework, aligned with its parent model, the Cyber Killchain:
Cyber Killchain stages (top), ATT&CK framework stages (bottom)
The ATT&CK framework adds more detail to the last three stages of the Cyber Killchain. These stages are known as “right-of-boom,” as opposed to the four “left-of-boom” Cyber Killchain stages, which happen before bad actors gain control of a network and start damaging it.
Concentrating on the ATT&CK model made sense when we started doing this work. It was detailed, well-supported, and had useful concepts, like being able to group related techniques together under each stage. The table below is the Version 1.0 strawman framework that we created; an initial hypothesis about the stages with example techniques that a misinformation campaign might use.
Table 1: Early strawman version of the ATT&CK framework for misinformation 
This framework isn’t perfect. It was never designed to be perfect. We recognized that we are dealing with many different types of incidents, each with potentially very different stages, routes through them, feedback loops and dependencies (see the Mudge quote below), so we created this strawman to start a conversation about what more is needed. Behind that, we started working in two complementary directions: bottom-up from the incident data, and top-down from other frameworks that are used to plan similar activities to misinformation campaigns, like psyops and advertising.
ATT&CK may be missing a dimension…
The ATT&CK framework has missing dimensions, which is why we introduced the misinformation pyramid. A misinformation campaign is a longer-scale activity (usually months, sometimes years), composed of multiple connected incidents — one example is the IRA campaign that focussed on the 2016 US elections. The attackers designing and running a campaign see the entire campaign terrain: they know the who, what, when, why, how, the incidents in that campaign, the narratives (stories and memes) they’re deploying, and the artifacts (users, hashtags, messages, images, etc.) that support those narrative frames.
Defenders generally see just the artifacts, and are left guessing about the rest of the pyramid. Misinformation artifacts are right-of-boom: the themes seemingly coming out of nowhere, the ‘users’ attached to conversations, etc. This is what misinformation researchers and counters have typically concentrated on. This is what the ATT&CK framework is good at, and why we have invested effort on it by cataloguing and breaking campaigns and incidents down into techniques, actors, action flows.
But this only covers part of each misinformation attack. There are stages “left-of-boom” too. Although difficult to identify, there are key artifacts in this campaign phase too. This is the other part of our work. We’re working from the attacker point of view, listing and comparing stages we’d expect them to be working through, based on what we know about marketing/advertising, psyops and other analyses. We’ve compared a key set of stage-based models from these disciplines to the Cyber Killchain, as seen in the table below.
Table 2: Comparison between cyber killchain, marketing, psyops and other models
This is a big beast, so let’s look at its components.
First, the marketing funnels. These are about the journey of the end consumer of a marketing campaign — the person who watches an inline video, sees a marketing image online, and so on, and is ideally persuaded to change their view, or buy something related to a brand. This is a key consideration when listing stages: whose point of view is this? Do we understand an incident from the point of view of the people targeted by it (which is what marketing funnels do), the point of view of the people delivering it (most cyber frameworks), or the people defending against it? We suggest that the correct point of view for misinformation is that of the creator/attacker, because attackers go through a set of stages, all of which are essentially invisible to a defender, yet each of these stages can potentially be disrupted.
Marketing funnels, meanwhile, are “right-of-boom.” They begin at the point in time where the audience is exposed to an idea or narrative and becomes aware of it. This is described as the “customer journey,” which is a changing mental state, from seeing something to taking an interest in it, to building a relationship with a brand/idea/ideology, and subsequently advocating it to others.
This same dynamic plays out in online misinformation and radicalisation (e.g. Qanon effects), with different hierarchies of effects that might still contain the attraction, trust and advocacy phases. Should we reflect these in our misinformation stage list? We can borrow from the marketing funnel and map these stages across to the Cyber Killchain (above), and by adding in stages for marketing planning and production (market research, campaign design, content production, etc.) and seeing how they are similar to an attacker’s game plan, we can begin planning how to disrupt and deny these left-of-boom activities.
When considering the advocacy phase, in relation to other misinformation models, we see this fitting the ‘amplification’ and ‘useful idiot’ stages (as noted above in Table 2). This is new thinking, and modeling how an ‘infected’ node in the system isn’t just repeating a message, but might be or become a command node too, is something to consider.
Developing the misinformation framework also requires adopting and acknowledging the role of psyops, as its point of view is clear: it’s all about the campaign producer who controls every stage, from a step-by-step list of things to do, from the start through to a completed operation, including hierarchy-aware things like getting sign-offs and permissions.
Left-of-boom, psyops maps closely to the marketing funnel, with the addition of a “planning” stage, while right-of-boom it glosses over all the end-consumer-specific considerations, in a process flow defined by “production, distribution, dissemination.” This does, however, add a potentially useful evaluation stage. One of the strengths of working at scale online is the ability to hypothesis test (eg. AB test) and adapt quickly at all stages of a campaign. Additionally, when running a set of incidents, after-action reviews can be invaluable in learning and adjusting the higher-level tactics such as adjusting the list of stages, the target platforms, or determining the most effective narrative styles and assets.
Psyops stages (https://2009-2017.state.gov/documents/organization/148419.pdf)
As we develop misinformation-specific stage-based models and see more of them (maybe it’s something to do with all the talks our misinfosec family have given?), things like Tactics, Techniques and Procedures (“TTPs”) and Information Sharing Analysis Center (“ISAC”) are appearing in misinformation presentations and articles. Two noteworthy models are the Department of Justice (DOJ) model and one recently outlined by Bruce Schnieier. First the DOJ model, which is a thing of beauty:
page 26 of https://www.justice.gov/ag/page/file/1076696/download
This clearly presents what each stage looks like from both the attacker (‘adversary’) and defender points of view (the end consumer isn’t of much interest here.) It’s a solid description of early IRA incidents, yet is arguably too passive for some of the later ones. This is where we start inserting our incident descriptions and mapping them to stages. This is where we start asking about how our adversaries are exercising things like command & control. When we say “passive”, we mean this model works for “create and amplify a narrative”, but we’re fitting something like “create a set of fake groups and make them fight each other”, which takes on a more active and more command & control-like presence. This is a great example of how we can create models that work well for some, but not all, of the misinformation incidents that we’ve seen, or expect to see.
We have some answers. More importantly, we have a starting point. We are now taking these stage-based models and extracting the best assets, methods, and practices (what looks most useful to us today), such as testing various points of view, creating feedback loops, monitoring activity, documenting advocacy, and so on. Our overarching goal is to create a comprehensive misinformation framework that covers as much of the incident space as possible, without becoming a big mess of edge cases. We use our incident analyses to cross-check and refine this. And we accept that we might — might — just have more than one model that’s appropriate for this set of problems.
“We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.” ― Richard P. Feynman
Addendum: yet more models…
Ben Decker’s models look at the groups involved in different stages of misinformation, and the activities of each of those groups. This focuses on misinformation campaigns as a series of handoffs between groups: from the originators of content, to command and control signals via Gab/Telegram, etc., for signal receivers to post that content to social media platforms, then amplify its messages with social media messages that eventually get picked up by professional media. This has too many groups to fit neatly onto a marketing model, and appears to be on a different axis to psyops and DOJ models, but still seems important.
As a further axis — the stage models we’ve discussed above are all tactical — the steps that an attacker would typically go through in a misinformation incident. There are also strategies to consider, including Ben Nimmo’s “four Ds” (Distort, Distract, Dismay, Dismiss — commonly-used IRA strategies), echoed in Clint Watt’s online manipulation generations. In infosec modelling, this would get us into a Courses of Action Matrix. We need to get on with creating the list of stages: we’ll leave that part until next time.
Clint Watts matrix, and 5Ds, with common tactics (from Boucher) mapped to them.
- Walker et al, Misinfosec: applying information security paradigms to misinformation campaigns, WWW’19 workshop
[Cross-posted from Medium https://medium.com/misinfosec/online-infowar-doesnt-have-to-use-misinformation-3d5cf5759ef3]
Reading through some of the DFRlab stories on 2018’s election misinformation campaigns (e.g. this one), and I get a familiar sinking feeling. I see the Russian posts about violence against black people, or immigrants doing jobs in conditions that non-immigrants won’t consider, and I find myself thinking “you know what, this is true, and yep, I agree with a lot of it”.
Just because I study misinformation doesn’t mean I’m immune to it, any more than knowing about the psychology of advertising makes me any less likely to buy a specific pair of shoes that I got repeatedly exposed to over the Christmas season (yes, yes, I know, but they’re gorgeous…). But this is interesting because it’s not misinformation in the content — it’s misrepresentation of the source, in this case a group called “Black Matters US”, a Russian Internet Research Agency effort that, amongst other things, organised anti-police-treatment rallies in the USA.
And that’s the thing: one of the most common tactics being used in online information warfare isn’t to create beautiful big lies like pizzagate. It’s to take existing fractures in society, and widen them further. To keep pushing at our emotions til we’re angry or frightened and confused. And countering that tactic is going to need a combination of good provenance tracking, calm voices, and the long hard political work of healing the damage that made it possible.