What’s the questions again?

Oh the questions, the questions. The more meetings I go to, the more I see the deep fundamental importance of asking and attempting to answer the right questions. And there are many wrong questions out there, especially (but definitely not limited to just) in AI. Having said there are wrong questions, I probably need to clarify that. There are very few invalid questions: most questions provoke a response, or thought, and therefore have a motive or reason. Now some questions are just plain immoral (such as “what’s the best way to kill a million people”), others are just plain insensible (“Hey you! The troll with the big bike! How come you’re so ugly?”) and some are just there for the fun of it. So also having said that there are wrong questions, I need to qualify again that it’s a subjective thing. There are good questions to ask here and now, and there are bad questions to ask here and now. I’m just hoping that I have enough good taste to tell the difference.

So: questions to think about sometime on this blog, with the proviso that some of these questions are just for fun, with no promise of anything even approaching a bounded sensible answer:

* How can we organise machine thought?
* How is machine thought similar to and different from human thought?
* How can we make machines more creative?
* How do humans make sense of information?
* What do we do when we’re not sure?
* What makes us human?
* How are some people geniuses, and can we replicate that with machines?

Sometime, I may try to arrange these into the possible and not here/not now categories. But that may have to wait until after I’ve attempted to start answering them.

The city needs to go green?

No, not green as in environmental (although that would be a good idea too); green as in look to developments in defence for where it might go next.

Parts of the city seem to be buying Bayesian statisticians as if they were going out of fashion (they’re not: from what I’m hearing on the tech grapevines, belief is apparently the new black for this season). Which makes sense in a world where predictions are shifting from ‘did we see something like this before and what happened’ to ‘we haven’t got a clue what’s happening next and the best we can do is guess from fragments’. I spent a day moving on a bit from that, playing with situation awareness techniques, and wondered if that was where the city might go next. Only time will tell…

Structure Mapping Theory

More old notes to be edited later… so many of these to be done…

In analogy, the relations between objects (e.g. friend(Bill,Ted)) are matched rather than the individual attributes (e.g. tall(Bill)) of those objects. This is distinct from literal similarity where both attributes and relations are matched, mere-appearance matches where primarily attributes are matched, and anomaly where very few attribute or relations are matched. Metaphor (e.g. ‘Bill is a rock’) is seen as a reduced form of analogy with usually just one (contextual?) attribute being matched. Abstraction is seen as a similar process to analogy but with mappings between objects that have few or no attributes, and in abstraction, all the relations are matched rather than just some. This view of analogy appears to be confirmed by empirical psychological studies (Falkenhainer89).

Analogy can be split into three distinct subprocesses (Falkenhainer87.IJCAI): (i) access, (ii) mapping and inference and (iii) evaluation and use. Before processing begins, it is assumed that there is a current situation of interest (the target). Access finds a body of knowledge (the base) that is analogous or similar to the current target. Mapping finds similarities or correspondences (mappings) between the base and target, and may also transfer information (inference) from the base to the target. Evaluation of the analogy gives a measure or estimate of the quality of match in the context of other knowledge about the target’s domain, against three different types (structural, validity and relevance) of quality criteria. Structural criteria include the degree of structural similarity, the number of similarities and differences and the amount and type of knowledge added to the target description (the candidate inferences) in using this analogy. Validity checks that any new knowledge makes sense within the current knowledge domain. Relevance assesses whether the analogy and candidate inferences are useful to the current task or aim of the system.
Analogies thus formed are used for analogical reasoning, similarity-based generalisation, or analogical learning.

Structure mapping theory (SMT, Gentner83) asserts that an analogy is the application of a relational structure that normally applies in one knowledge domain (the base domain) to another, different, knowledge domain (the target domain); unlike less-structural psychological theories, it also sees analogy and similarity as connected processes. It uses graph-matching to describe the constraints that people use in creating analogies and interpreting similarity. SMT does not capture the full richness of analogy, but is primarily interested in, and applied to, the mapping and a subset of the evaluation subprocesses described above (e.g. it does not include the access subprocess, and uses structural criteria only).

The core of structure mapping algorithms is finding the biggest common isomorphic pair of subgraphs in two semantic structures [Veale98]. Much of the work in this field has concentrated on subgraph heuristics to produce near-optimal matches in polynomial time. All of it is based on or related to SMT. In SMT, graph matches are assumed to be exact correspondences only (structurally consistent); this constraint may need updating to better reflect the inexact matches and representations used in human analogical reasoning. Note that there are two matching processes in play here: the mapping of objects (graph nodes), and the mapping of relations (link labels), that these two processes are often run at the same time, and that the relational matches may determine the matches between objects. SMT is also tidy: attributes are discarded unless they are actively involved in a match, and higher-order relations and interconnections are more likely to be preserved (systematicity, Gentner83).

SMT underlies much work on computational models of analogy, metaphor, case-based reasoning and example-based machine translation [Veale98]. SMT is used for analogical learning (Falkenhainer87.IJCAI, JonesLangley95) and information retrieval. Analogical learning uses analogy to generate new knowledge about a domain; examples include the extension of the ARCS/ACME structure-matching programs with algorithms from the Eureka learning system. Similarity-based retrieval models include MAC/FAC (many are called but few are chosen, Gentner95); this is a two-stage algorithm, where the MAC stage is a rough filter, and the FAC stage is performed by a structure-matching engine (SME).

Implementations of SMT include the Structure Mapping Engine (SME, Falkenhainer89), SAPPER, ACME and ARCS. SME represents each objects as a description group (dgroup), a list of items (entities and expressions) associated with that object, where functions, attributes and relations within these items are represented as Prolog-style clauses. Each dgroup is mapped onto a graph with a node for every item in it, and an arc from every item i to any item j which is used as an argument to i.
The SME algorithm is divided into four stages: Local match construction, Gmap construction, Candidate inference construction and Match evaluation. A local match is a pair of base and target items which could potentially match; in local match construction, a match hypothesis is created for each of these pairs. SME does not create a match hypothesis for every possible pair of base and target items, but uses a set of rules (for instance, only matching items that appear in the same expression) to filter them. Which rules are used here determine whether the output of SME will be analogy, literal similarity or mere appearance. Because they are based on dgroups, match hypotheses also form a directed acyclic graph, with exact matches fixed only between predicates. GMAP construction is the heart of SME processing. Here, the local matches are combined into gmaps, where a global mapping (gmap) is a structurally consistent mapping between the base and target object, consisting of correspondences between the items and entities in their description groups, candidate inferences and a structural evaluation score. Candidate inferences are the structurally grounded inferences in the target domain suggested by each gmap, where the substitutions used in structurally grounded inferences are consistent with the existing target information, and there is an intersection between their ancestor nodes and the network representing the target domain. In match evaluation, a score is given for each local match, using a set of rules (for instance, expressions with the same functors are assigned a score of 0.5). These scores are combined using Dempster-Shafer theory (in a Belief Maintenance System) to give a score for each match hypothesis. The scores for all the match hypotheses that contribute to an individual gmap are then summed to give a structural evaluation score for that gmap.

Veale’s SAPPER system [Veale98] represents each domain as a graph in which nodes represent concepts (e.g. SME’s items) and edges between nodes represent relations between those concepts. The algorithm uses much simpler rules than SME. It first looks for common nodes between the two graphs being compared, then uses two simple rules to update the match: the triangulation rule and the squaring rule. This creates a partial match pmap for the domain, and is a spreading-activation algorithm that Veale claims is more efficient than the corresponding part of SME [Veale97]. The pmaps formed are graded by richness, then the algorithm attempts to combine all the pmaps, starting with the richest as a base.

Contenders to SMT for modelling analogy include high-level perception (Hofstadter’s CopyCat) and theories of semantic/pragmatic constraints. High-Level Perception emphasises analogy as a mixture of representation and mapping, and treats data gaps in a similar manner to recent work on protein structure alignment. Holyoak and Thagard’s work on the Analogical Constraint Mapping Engine (ACME) and ARCS systems assume that analogy is goal-driven; Holyoak [Holyoak89] argues that the SMT view of analogy only addresses the structural constraints on a mapping, and that semantic and pragmatic constraints should be included in the mapping too.

Where an analogy is drawn between two separate areas of knowledge, then the use of information fusion by them is simple: the joining of the information held in the two pieces of knowledge into a coherent whole. More complex forms of creativity combine together several pieces of knowledge; this is where information fusion techniques will be most useful and provide most insight into the process.

Insight

Is insight creativity applied to thought processes? I mean, if art is creativity using materials, and maths creativity using symbols, can we distill the notion of insight right down to creativity using partial world views? It could fit: the idea of generating then analyzing and reducing sets of ideas down to ones with higher value (for a given notion of value), of combining partial views, refining them and creating leaps by adapting or removing base assumptions. So are the people who spot the moving patterns and understand trends fast enough to profit from them creative (as they’re adapting their models quickly) rather than purely logical. Possibly. Could we use structure-mapping to model those processes? Some, but that would definitely need deep parsers, large knowledgebases and one heck of a legal framework…

When we created the 4-tier fusion pyramid (data-information-knowledge-insight/wisdom; you’ve been using it for years chaps, and now you can finally credit Mark Bedworth. Who isn’t me, btw…), I placed insight on the top tier. It made sense as a simple analogy; if all the raw materials in intelligence processing could be heaped together, then the distillation of a larger amount of knowledge into a smaller amount of insight could be viewed in a similar way to the distillation of a larger amount of data into a smaller amount of information. It also tacitly acknowledged that the representation of insight might be quite different to that of knowledge, that somehow (as with data to information), the linkages within and context of that knowledge could be used and shown more concisely. And now I need to think very very carefully about what exactly I meant by that. But first, I need to sleep. Goodnight.

Thinking Simply

I’m not a very intelligent person, but I am lucky enough to know several of these creatures. I got to wondering this weekend about what, apart from the ability to assimilate and use vast amounts of information faster and more efficiently than us mere mortals, really sets them apart.

And the thing that strikes me most is their ability to think very very simply. Now I’m not talking about thinking simplistically, i.e. with the sort of logics found mainly in the Daily Mail or the town pub after 10 pints. I’m talking about the ability to take a topic, a known topic, to reach into its core and retrieve an idea that seems so simple, so obvious, that you wish you’d thought of it yourself. And that little inner voice tells you that you could have done, if only you’d noticed, but you know in your heart that only someone very smart has the gift of thinking that simply.

And then I started wondering how this simplicity might be connected to my long-ago quest to understand creativity; if what the smart people were doing was having the courage, confidence and tools to rearrange existing concepts that build up around a topic over time. Which took me back to an old book (the structure of scientific revolutions), a coarse precis of which is that science moves at two speeds: slow methodical progress interspersed with great leaps of the imagination.

Meanwhile, I’ve been reading some of my old notes and, once again, can’t understand them. Context is all sometimes…

Toolmaking

One of the things that occasionally fascinates me is how we define ourselves as human by defining other animals as somehow, well, inferior. And one of the areas that we’ve traditionally done that til recently is definig ourselves as the only toolmakers. Now anyone who’s ever looked at an empty snail shell with a hole in the side without going “wow, I never knew they had escape hatches” automatically knows that isn’t true. But how untrue? Once more to the literature boys…

Several books and articles (e.g. this 1940s article by Kenneth Oakley) on man the toolmaker take toolmaking as a given; these works move straight to which tools for what purpose, and skip the reasons and methods by which we might have become toolmakers at all. There’s a species of early man, homo habilis (“able man” or sometimes “man the toolmaker”) named after its toolmaking skills, mainly for an ability, 2.6 million years ago, to create and use a cutting edge on a small stone.

I’ll probably come back to people later, as toolmaking appears to be bound up with creativity, definition and even religion. For now, I’ll concentrate on the animals, as a useful control group. So, the ones that I know of are:

And lots more behaviours being observed in labs. There are even behavioural ecology groups out there studying this. In the end, this is really a non-post because the arguments are all there, the evidence is known, including that animals will use whatever materials are to hand (or claw or beak) for the tasks that they have, and that well-fed, well-cared-for captive animals have more time and inclination (even with adjustments for other factors) to be creative; the questions really are only of degree. And that’s before we talk about elephants and primates painting.

And then there is the use of animals as tools by humans. Which is a whole different subject, and not one for today. Suffice to say that the Mk 7 Dolphin does exist, and even has patents attached.

Defining Creativity

Sometimes I’m going to post some old notes here. This is one of them.

There are many different attempts to define creativity, and work on automating creativity is often inhibited by the au­thors’ own definitions. The process of creativity has been divided into several stages. Hadamard’s description of Poincare’s four stages is used by Boden amongst others: these are

  • preparation ­ define the problem, and attempt to solve it by rational means.
  • incubation ­ generate novel patterns or concepts.
  • inspiration ­ recognise a creative solution.
  • verification ­ rationally compare the solution to the problem.

The preparation and verification stages may not exist be­cause there may not always be a given aim to creative work. Incubation and inspiration are however central to creativity: it always contains a two­ part process of generation of concepts then evaluation of how creative those concepts are.

Defining the problem

The act of finding a problem is usually part of the creative process. Some creativity systems are very focussed on this: for example, in flexible means­end analysis (Jones+Langley) the problem is defined as a current world state and a set of goal conditions. This part of creativity is very closely related to conven­tional learning theory: Thornton (C.Thornton, 1998) has argued that the bias inherent in any recursive learning algo­rithm can be viewed as a form of creativity.
Although I have stated that preparation and verification are not necessarily essential to creativity, they are very impor­tant: perhaps the difference between creativity and random­ness, between human creativity and madness is in its con­nection to a purpose or communication (for example, even in describing art, we speak of its expression).

Generating novel concepts

Three main types of creativity are Boden’s (M.Boden, 1990) improbabilist and impossibilist creativity, and a chaotic form of creativity seen in many of the neural­ network based approaches.

  • improbabilist creativity is the construction of new concepts from existing ones, often combining previously­unconnected information to solve a previously­ unseen problem. The lightbulb puzzle (including the information that lightbulbs are hot just after they are on) is an example. Improba­bilist creativity was explored by Koestler (his ‘bisoci­ation of matrices’ (A.Koestler, 1964)) and discussed by Perkins (DN.Perkins, 1981).
  • impossibilist creativity transforms the space in which a concept can exist. This includes widening the frame of information around a concept being examined, and the removal of assumptions or constraints from the en­vironment in which a concept exists. Jackson Pollock putting his paintings on the floor (removing the as­sumption that paintings need to be vertical) is an ex­ample of this. There are many constraints at play in creation. For in­stance, in the creation of prose, the pattern of stresses in a line is as important as the meanings and rhymes and hidden meanings within a stanza. We work within unspoken rules: creativity can work within these rules (using them as guidelines) or on those rules them­ selves (to create new forms or categories of art or sci­ence).
  • chaotic creativity is where a small mutation of an ex­isting concept is allowed. Beethoven’s minute rework­ing of his musical themes until he hit one which was acceptable to him, and Thaler’s creativity machine are examples of this.

This seems a reasonable division to work with, although it would be interesting to see whether, when these three forms of creativity are finally modelled, other forms of creative act and process become apparent.

Measuring creativity

Creativity is often confused with the creation of new things. Creativity is not novelty: while generation of concepts is important, it is not effective without their evaluation. Evaluation consists of deciding which solutions are cre­ative, either by clustering them or by using a measure of surprise. To be creative, we need some sense of the differ­ence between what is truly creative and what is just new. We need to have a sense of how to cluster the mutations generated and how to define the boundaries between those clusters: we need a sense of taste or discrimination. Much of this, we can take from work on concept clustering and in­formation fusion, and work on the difference between cre­ative solutions and novel near­miss solutions to a problem, and the change in process that leads to them. As an example, take the humble paperclip. I can bend a pa­perclip into dozens of minutely different and new shapes, but only a few could be seen (without an explaining con­text, which is in itself a creative act) as creative mutations of its original shape.

Comparing the solution to the problem

If the creative process is used to solve a specific prob­lem, then the problem and its potential solutions need to be matched. Again, this process is closely related to the process of assessing the output from conventional learning algorithms.

What and why creativity?

Creativity is one of those strange human attributes. Like beauty, everyone knows what it is, but can’t quite pin it down (although several psychologists and computer scientists are trying to). Ask someone on the number 8 bus (the number 9 is so passe dahling), and they’re likely to talk about art and genius (Einstein crops up a lot) and those leaps of imagination that take you from a difficult problem to an elegant solution in seconds.

So, assume for a moment that defining creativity is closely bound up with how we view art. Because that takes us to the heart of the problem, for both creativity scholars and curators. Which is how do we know when what we’ve got is good art? Creativity is about creation (of things, concepts, etc), but anyone can do that, and most people create things most of the time, even through mundane things like stepping into puddles and changing the splashmarks around them. What really distinguishes the creative from the creation is an agreement, a sense of the novel, the innovative, even the beautiful. (My personal measure for art btw is “does it tell me something about the world, tell me something about myself or make me smile”).

Creativity is also bound up with how we define ourselves as humans. Douglas Adams linked religion to man’s view of himself as a toolmaker, but I think this definition goes deeper than that. Humans survive in extreme spaces and despite rapid changes in their environments because they can create what they need for that survival (clothing, shelter, food production etc) from the materials around them. I’m saving a discussion of toolmaking (in humans and other animals) for another day, but the point here is that creativity is a fundamental part of human success (and possibly also failure) in the world. So it may be useful to understand how it works, and how we might be able to harness that, in a non-terminators-taking-over-the-world kind of way.

So, how do we define, model and even mimic creativity. I’ll leave that for later posts…

What is Information Fusion, exactly?

I’ve spent half a lifetime thinking about how people create and use mental models of the world, and I still don’t have a good answer for the title question. I’ve had a look around the web and its meaning seems to change with the background of its user. So I’m going to try again.

Information fusion (IF) is what happens when you apply data fusion principles to non-numeric data. Data fusion (DF) takes isolated pieces of sensor output (e.g. a series of sonar or radar plots) and turns them into a situation picture: a human-understandable representation of the objects in the world that created those sensor outputs. Multisensor DF does this with outputs from more than one sensor and/or sensor type.

Most data fusion happens in a nicely constrained and understandable environment where things (e.g. ship, planes) lurk or move around producing recognisable signatures (e.g. shape outlines or frequency lines). And that’s where the problems with IF start. IF deals with non-numeric data, e.g. verbal reports, documents and database entries. The first issue is knowing where the boundaries of the problem are. If we carry on the IF as an extension of DF paradigm, instead of dealing with a set of identifiable things in a known environment producing a known set of possible numerical signatures, IF deals (usually) with information written by humans describing their perception of the world. And that, in DF terms, is a nightmare: the data is of variable and unknown quality: it can be deliberately or innocently false or unreliable, can at the same time cover both much more than the area of interest and much less of the area than is needed to form a situation picture, and is usually presented (except sometimes in the case of data incest; more on that later) in a variety of formats, using a wide variety of terms for similar concepts and classes of concepts.

So what can we start to conclude from this? Well, first, that any IF system that wants to succeed will either have to have an area of interest (e.g. a topic or set of topics, or even a physical area if it’s linked into a DF system), or accept its fate as an unconstrained data mining/ data visualisation or data summary system. IF developers will either have to have very strong words with their system users about the use of precise language (easier with the military; not so easy with commercial or occasional users) and hanging back on the conjectures, or develop a system that is either a smart form of search engine, or very capable at parsing and understanding natural language structures (note the use of structures there: an IF system may not necessarily need to fully understand NL, but may survive on enough understanding to know when to pass something difficult up to a human) and knowing which pieces of information are relevant. IF developers will also have to build systems capable of parsing and merging the different levels of language (e.g. fruit for apple), overlapping meanings (a person is not always a game player) and different language contexts, and deal with all the anaphora, uncertainty and downright error that humans manage to create whenever they’re allowed to interact freely.

But I still haven’t answered the question. And I, as much as anyone else, am guilty of using my own working definition. To me, IF is the combination of all available sources of relevant information (yes, numeric as well) to create a (hopefully stable) concise and usable mental picture of an area of the world that I’m interested in. That, to me, sounds awfully much like thinking, which is the only excuse I have for spending years playing with artificial intelligence.

Hello World

So… a blog. A place to explore whatever interests me about the world. And in no particular order, this includes: risk, knitting, thought, images, power structures, combinations, the meaning and origins of places, languages, lies and growing vegetables. There are other things of interest, but they’ll turn up when they want to.