What is Information Fusion, exactly?

I’ve spent half a lifetime thinking about how people create and use mental models of the world, and I still don’t have a good answer for the title question. I’ve had a look around the web and its meaning seems to change with the background of its user. So I’m going to try again.

Information fusion (IF) is what happens when you apply data fusion principles to non-numeric data. Data fusion (DF) takes isolated pieces of sensor output (e.g. a series of sonar or radar plots) and turns them into a situation picture: a human-understandable representation of the objects in the world that created those sensor outputs. Multisensor DF does this with outputs from more than one sensor and/or sensor type.

Most data fusion happens in a nicely constrained and understandable environment where things (e.g. ship, planes) lurk or move around producing recognisable signatures (e.g. shape outlines or frequency lines). And that’s where the problems with IF start. IF deals with non-numeric data, e.g. verbal reports, documents and database entries. The first issue is knowing where the boundaries of the problem are. If we carry on the IF as an extension of DF paradigm, instead of dealing with a set of identifiable things in a known environment producing a known set of possible numerical signatures, IF deals (usually) with information written by humans describing their perception of the world. And that, in DF terms, is a nightmare: the data is of variable and unknown quality: it can be deliberately or innocently false or unreliable, can at the same time cover both much more than the area of interest and much less of the area than is needed to form a situation picture, and is usually presented (except sometimes in the case of data incest; more on that later) in a variety of formats, using a wide variety of terms for similar concepts and classes of concepts.

So what can we start to conclude from this? Well, first, that any IF system that wants to succeed will either have to have an area of interest (e.g. a topic or set of topics, or even a physical area if it’s linked into a DF system), or accept its fate as an unconstrained data mining/ data visualisation or data summary system. IF developers will either have to have very strong words with their system users about the use of precise language (easier with the military; not so easy with commercial or occasional users) and hanging back on the conjectures, or develop a system that is either a smart form of search engine, or very capable at parsing and understanding natural language structures (note the use of structures there: an IF system may not necessarily need to fully understand NL, but may survive on enough understanding to know when to pass something difficult up to a human) and knowing which pieces of information are relevant. IF developers will also have to build systems capable of parsing and merging the different levels of language (e.g. fruit for apple), overlapping meanings (a person is not always a game player) and different language contexts, and deal with all the anaphora, uncertainty and downright error that humans manage to create whenever they’re allowed to interact freely.

But I still haven’t answered the question. And I, as much as anyone else, am guilty of using my own working definition. To me, IF is the combination of all available sources of relevant information (yes, numeric as well) to create a (hopefully stable) concise and usable mental picture of an area of the world that I’m interested in. That, to me, sounds awfully much like thinking, which is the only excuse I have for spending years playing with artificial intelligence.