Notes from John Sarapata’s talk on online responses to organised adversaries

Download PDF
John Sarapata (@JohnSarapata) = head of engineering at Jigsaw  (= new name for Google Ideas).  Jigsaw = “the group at Google that tries to help users facing organized violence and oppression”.  A common thread in their work is that they’re dealing with the outputs from organized adversaries, e.g. governments, online mobs, extremist groups like ISIS.
One example project is redirectmethod.org, which looks for people who are searching for extremist connections (e.g. ISIS) and shows them content from a different point of view, e.g. a user searching for travel to Aleppo might be shown realistic video of conditions there. [IMHO this is a useful application of social engineering in a clear-cut situation; threats and responses in other situations may be more subtle than this (e.g. what does ‘realistic’ mean in a political context?).]
The Jigsaw team is looking at threats and counters at 3 levels of the tech stack:
  • device/user: activities are consume and create content; threats include attacks by governments, phishing, surveillance, brigading, intimidation
  • wire: activities are find then transfer; threats include DNS hijacking, TOR bridge probes
  • server: activities are hosting; threats include DDOS
[They also appear to be looking at threats and counters on a meta level (e.g. the social hack above).]
Examples of emergent countermeasures outside the team include people bypassing censorship in Turkey by using Google’s public DNS 8.8.8.8, and people in China after the 2008 Szechwan earthquake posting images of school collapses and investigating links between these (ultimately leading to finding links between collapses, school contractors using substandard concrete and officials being bribed to ignore this) despite government denial of issues.  These are about both reading and generating content, both of which need to be protected.
There are still unsolved problems, for example communications inside a government firewall.  Firewalls (e.g. China’s Great Firewall) generally have slow external pipes with internal alternatives (e.g. Sino Weibo), so people tend to consume information from inside. Communication of external information inside a firewall isn’t solved yet, e.g mesh networks aren’t great; the use of thumb drives to share information in Cuba was one way around this, but there’s still more to do.  [This comment interested me because that’s exactly the situation we’ve been dealing with in crises over the past few years: using sneakernet/ mopeds,  point-to-point, meshes etc., and there may be things to learn in both directions.]
Example Jigsaw projects and apps include:
  • Unfiltered.news, still in beta: creates a knowledge graph when Google scans news stories (this is language independent). One of the cooler uses of this is being able to find things that are reported on in every country except yours (e.g. Russia, China not showing articles on the Panama Papers).
  • Anti-phishing: team used stuff from Google’s security team for this, e.g. using Password Alert (alerts when user e.g. puts their company password into a non-company site) on Google accounts.
  • Government Attack Warning. Google can see attacks on gmail, google drive etc accounts: when a user logs in, Google displays a message to them about a detected attack, including what they could do.
  • Conversation AI. Internet discussions aren’t always civil, e.g. 20-25 governments including China and Russia have troll armies now, amplified by bots (brigading); conversation AI is machine classification/detection of abuse/harassment in text; the Jigsaw team is working on machine learning approaches together with the youtube comment cleanup team.  The team’s considered the tension that exists between free speech and reducing threats: their response is that detection apps must lay out values, and Jigsaw values include that conversation algorithms are community specific, e.g. each community decides its limits on swearing etc.; a good example of this is Riot Games. [This mirrors a lot of the community-specific work by community of community groups like the Community Leadership Forum].  Three examples of communities using Conversation AI: a Youtube feature that flags potential abuse to channel owners (launching Nov 2016). Wikipedia: flagging personal attacks (e.g. you are full of shit) in talk pages (Wikipedia has a problem with falling numbers of editors, partly because of this). New York Times: scaling existing human moderation of website comments (NYT currently turns off comments on 90% of pages because they don’t have enough human moderators). “NYT has lots of data, good results”.  Team got interesting data on how abuse spreads after releasing a photo of women talking to their team about #gamergate, then watching attackers discuss online (4chan etc) who of those women to attack and how, and the subsequent attacks.
  • Firehook: censorship circumvention. Jigsaw has the Uproxy plugin  for peer-to-peer information sharing across censorship boundaries (article), but needs to do more, eg look at the whole ecosystem.  Most people use proxy servers (e.g. VPNs), but a government could disallow VPNs: we need many different proxies and ways to hide them.  Currently using WebRTC for peer to peer proxies (e.g. Germany to Turkey using e.g NAT hole punching), collateral freedom and domain fronting, e.g. GreatFire routing New York Times articles through Amazon and GitHub.  Domain fronting (David Fyfield article) uses the fact that e.g. CloudFlare hosts many sites: the user connects to an allowed host, https encrypts it, then uses the encrypted header to go to a blocked site on the same host.  There are still counters to this; China first switched off GitHub access (then had to restore it), and used the Great Cannon to counter GreatFire, e.g. every 100th load of Baidu Analytics injects malware into external machines and creates a DDOS botnet. NB the firewall here was on path, not in path: a machine off to one side listens for banned words and breaks connections, but Great Cannon is inside the connection; and with current access across the great firewall, people see different pages based on who’s browsing.
  • DDOS: http://www.digitalattackmap.com/ (with Arbor Networks), shows DDOS in real time. Digital attacks are now mirroring physical ones, and during recent attacks, e.g. Hong Kong’s Umbrella Revolution, Jigsaw protected sites on both sides (using Project Shield, below) because Google thinks some things, like DDOS, are unfair.  Interesting point: trolling as the human equivalent of DDOS [how far can this comparison go in designing potential counters?].
  • Project Shield: reused Google’s PSS (Page Speed Service) to protect news sites, human rights organization etc from DDOS attacks. Sites are on Google cloud: can scale up number of VMs used and nginx allows clever uses with reverse proxies, cookie challenges etc.  Example site: Krebs on Security was being DDOSed (nb a DDOS attack on a site costs about $50 online), moved from host Akamai to Google with Project Shield. Team is tracking Twitter user bragging about this and other attacks (TL;DR: IoT attack, e.g. baby monitors; big botnet, Mirai botnet source code now released, brought down Twitter, Snapchat).  Krebs currently getting about 5 attacks a day, e.g. brute-force, Slowloris, Hulk (bandwidth, syn flood, post flood, cache busting, WordPress pingback etc), and Jigsaw gets the world’s best DDOSers hacking and testing their services.
Audience questions:
  • Qs: protecting democracy in US, e.g. botnets, online harassment etc.? A: don’t serve specific countries but worldwide.
  • Q: google autofill encouraging hatespeech? A: google reflects the world as it is; google search reflects what people do, holds a mirror back up to you. Researching machine learning bias on google suggest results and bias in training data.  Don’t want to censor, but don’t want to propagate bad things.
  • Q: not censor but inform, can you e.g. tell a user “your baby monitor is hacked”? A: privacy issue, eg g connecting kit and emails to ip addresses.
  • Q: people visiting google site to attack… spamming google auto complete, algorithms? A: if google detects people gaming them, will come down hard.
  • Q: standard for “abusive”? how to compare with human?. A: is training data, Wikipedia is saying what’s abusive. Is all people in the end.
  • Q: how deal with e.g. misinformation? A: unsolved problem, politically sensitive, e.g. who gets to decide what’s fake and true? censorship and harassment work will take time.
  • Q: why Twitter not on list of content? A: Twitter might not want this, team is resource constrained, e.g. NYT models are useless on youtube because NYT folks use proper grammar and spelling.
  • Q: how to decide what to intervene in?A:  e.g. Google takes sides against ISIS, who are off the charts on eg genocide.
  • Q: AWS Shield; does Google want to commercialise their stuff? A: no. NB CloudFlare Galileo is also response to google work.
  • Q: biggest emerging tech threat online? Brigading, e.g. groups of people and bots. This breaches physical and online, includes eg physical threats and violence, and is hard to detect and attribute.
Other references: