Software

Infosec, meet data science

I know you’ve been friends for a while, but I hear you’re starting to get closer, and maybe there are some things you need to know about each other. And since part of my job is using my data skills to help secure information assets, it’s time that I put some thoughts down on paper… er… pixels. Infosec and data science have a lot in common: they’re both about really really understanding systems, and they’re both about really understanding people and their behaviors, and acting on that information to protect or exploit those systems.  It’s no secret that military infosec and counterint people have been working with machine learning and other AI algorithms for years (I think I have a couple of old papers on that myself), or that data scientists and engineers are including practical security and risk in their data governance measures, but I’m starting to see more profound crossovers between the two….

Software

Notes from John Sarapata’s talk on online responses to organised adversaries

John Sarapata (@JohnSarapata) = head of engineering at Jigsaw  (= new name for Google Ideas).  Jigsaw = “the group at Google that tries to help users facing organized violence and oppression”.  A common thread in their work is that they’re dealing with the outputs from organized adversaries, e.g. governments, online mobs, extremist groups like ISIS. One example project is redirectmethod.org, which looks for people who are searching for extremist connections (e.g. ISIS) and shows them content from a different point of view, e.g. a user searching for travel to Aleppo might be shown realistic video of conditions there. [IMHO this is a useful application of social engineering in a clear-cut situation; threats and responses in other situations may be more subtle than this (e.g. what does ‘realistic’ mean in a political context?).] The Jigsaw team is looking at threats and counters at 3 levels of the tech stack: device/user: activities are consume and create content; threats include attacks by governments, phishing, surveillance,…

Software

Why am I writing about belief?

[Cross-post from LinkedIn] I’ve been meaning to write a set of sessions on computational belief for a while now, based on the work I’ve done over the years on belief, reasoning, artificial intelligence and community beliefs. With all that’s happening in our world now, both online and in the “real world”, I believe that the time has come to do this. We could start with truth. We often talk about ‘true’ and ‘false’ as though they’re immovable things: that every statement should be able to be assigned one of these values. But it’s a little more complicated than that. What we see as ‘true’ is often the result of a judgement we made, given our perception and experience of the world, that a belief is close enough to certain to be ‘true’. But what is there are no objective truths? In robotics, we talk about “ground truth” and the “god’s…

Software

WriteSpeakCode/ PyLadies joint meetup 2015-10-22: Tales of Open Source: rough notes

Pyladies: international mentorship program for female python coders meetup,com, NYC Pyladies Lisa moderating, Panelists: Maia McCormick, Anna Herlihy, Julian Berman, Ben Darnell, David Turner Intros: Maia: worked on Outreachy (formerly OPW) – gives stipends to women and minorities to work on OS code; currently at Spring Anna: works at MongoDb, does a lot of Mongo OS work. Julian: works at Magnetic (ad company); worked on Twisted, started OS project (schema for validating Json projects) Ben: Tornado maintainer, working on OS distributed database on Go. David: ex FSF, OpenPlans, now at Twitter, “making git faster”. Q: how to find OS projects, how to get started? D: started contributing to Xchat… someone said “wish chat had the following feature”… silence… recently, whatever the company is working on. Advice: find the right project, see if they’re interested, then write the feature. B: started on python interpreter, was using game library, needed bindings for…

Data Science

Looking at data with Python: Matplotlib and Pandas

I like python. R and Excel have their uses too for data analysis, but I just keep coming back to Python. One of the first things I want to do once I’ve finally wrangled a dataset out of various APIs, websites and pieces of paper, is to have a good look at what’s in it.  Two python libraries are useful here: Pandas and Matplotlib. Pandas is Wes McKinney’s library for R-style dataframe (data in rows and columns) manipulation, summary and analysis. Matplotlib is John D Hunter’s library for Matlab-style plots of data. Before you start, you’ll need to type “pip install pandas” and “pip install matplotlib” in the terminal window.   It’s also convention to load the libraries into your code with these two lines: import pandas as pd import matplotlib.pyplot as plt Some things in Pandas (like reading in datafiles) are wonderfully easy; others take a little longer to learn….

Software

Singularity

i’ve been thinking today about the singularity: the point at which machines become smarter than humans, about an internet of things so smart that we don’t know how to manage it with our existing software paradigms.  And I wondered: a good manager will already be managing entities that are much smarter than them (because you don’t want your best thinkers doing the paperwork, management is another discipline/ skill etc etc); is it perhaps time to think about how to use those management skills on clusters of machines?

Software

Notes from meetup: data-driven design 2.0 (Data-driven architecture), 2015-08-24

Meetup: data-driven design 2.0 (Data-driven architecture), 2015-08-24 Basics: AIANY chapter (http://main.aiany.org/)  #datadrivendesign http://www.meetup.com/Transforming-Architectural-Practice-Meetup/events/224093716/ First panel was http://main.aiany.org/eOCULUS/newsletter/data-in-the-built-environment-new-sources-new-strategies/ Melissa Marsh on intros and bios…  “Transforming architectural practice series” = thinking differently about the process of arch: tools, practice, how they run their business (leads to thinking differently about product).  Panelists showing how taken on data-led practice changes how arch does their work… incorporating different methodologies, s/m/l/xl data.  Today = moving from data sources and collection to examples within projects, how to set up projects and client relationships differently.  Came out of feedback from June event.  Continuing looking at future of design relationships.  Panelists:  Jeff Ferzoco (linepointpath),  Zak Kostura (ARUP, hiph performance structures – currently form found roof system for MX city)… thinking about project setup and info sharing and how it’s changing client relationships.  Darrick Borowski – on tools and techniques… data-driven design = ask better questions at the beginning… back and…

Software

This is not my journey

I spent some of my Christmas break thinking about work styles: what worked last year, what didn’t, and what I could do to improve my own.  I’ve got it down to just two things: “this is not my journey” and “do what the boss asks for”. People often talk of their jobs (and themselves) as something that they do now, as in at one particular point in time. That’s a little like saying “I’m in seat 29C” instead of “I’m flying from New York to Japan and when I get there I’m going to try out the heated toilet seats” when someone asks you where you are.  We are all on journeys – sometimes literally, but always on journeys through time, careers, relationships.  And if you want to think about your career, a journey is a useful idea. So last year I got really frustrated because I ended up doing…

Data Science

Web Scraping, part 1: files and APIs

Web scraping is extracting information from webpages, usually (but not always) as tables of data that you can save to csv files, json/xml files or databases. Design it first, then scrape it When you start on any piece of code, try asking yourself some design questions first; definitely do this if you’re thinking about something as potentially complex as web scraping code.   So you’ve seen a dataset hiding in a website – it might be a table of data that you need, or lists of data, or data spread across multiple pages on the site. Here are some questions for you: 1. Do you need to write scraper code at all? Is the dataset very small – if you’re talking about 10 values that don’t get updated, writing a scraper will take longer than just typing all the numbers into a spreadsheet. Has the site owner made this data…

Software

Ruby day 9: Local power!

This. Just this: local mappers made more changes to the map of the Philippines during Typhoon Ruby than anyone else in the world (by a very very big margin). Anyone who doesn’t believe in the strength of local people to build their own resilience should look very, very hard at these numbers. Ruby’s all over now for the mappers – DHN is de-activated, everyone’s gone back to work.  There’s still a lot of work to do on the cleanup: MarkC mentioned 35000+ houses destroyed and 200000 people without shelter, and there will still be OSM mapping to do for that.  This weekend Celina’s running a “train the trainers” OSM event in Manila: if you’re one of the people who created the figures above, please please go and help spread your skills further!