First Outline of the Mai project

With the advent of format radio and highly standardized routines in both radioprograms and certain music styles and my growing discontent with these issues came the idea of an automatic generative radio. The idea had been lingering inside my head for quite a while when I particpated in the x-med-k workshops.
The combination of synthetic speech and algorithms merged very naturally with my old concept and very soon I conceived the artbot, which is preliminarily named Mai.

The goal of Mai is to generate audiovisual art, though in the first stage of development this will be limited to sound. Mai will take the audiovisual and textual content of the okno website and archives as working material. The bot will be accessible by stream on the site, and later on, also a physical terminal with simulated sensory devices will be made.
User input from the site and the environment where the terminal resides will be regarded as impulses upon which the bot can react.

The workings of Mai, the mind as you might say, are based on models of human cognition as described in the work of Douglas Hofstadter. Heavily simplified, it can be described as follows. The bot has a semantic network of aesthetic concepts which are linked to eachother according to certain relationships. One could compare this semantic net to platonic ideal world, long term memory or structuralist semantics.
It is important to realize that this network is very dynamic and is in constant evolution, as a reaction to the problems or stimuli it is dealing with. Concepts in this network, constantly grow more or less important and further or closer to other concepts. The urgency or salience of a concept enlarges the chance of it being evoked, futhermore, the conceptual "deepness" of a given concept also enhances this chance.
The aspect of chance is very important. Nothing is certain or "hardwired" to work in a certain way, structures are emergent, as in cellular biology.
The place where concepts are evoked is called the workspace and is comparable to working or short term memory. The ideal concepts are thus realized in a concrete, workable form. Technically this is realized by using the scripting architecture in max-msp-jitter, interfaced by javascript. Feedback to the semantic net is realized by scouting agents that recognize structures and signal to the net that certain concepts might be appropriate, thus increasing their urgency or salience.
The specifics of this model cause the "thinking" of Mai to evolve from asynchronous parallel to serial, not unlike humans. One could compare this to quickly skimming a few ideas, finding a good one , thinking about it, but dropping it for a better one, elaborating further on that particular idea and realizing it. The better the idea, the smaller the chance of abandoning it to examine a new one.

To complement this cognitive structure, a model for perception is made and integrated. To realize this, a combination of computer vision and neural networks is used, where sound is converted to visual data as this enhances possibilities of pattern recognition.
An example would be that Mai sees that a certain sound is rythmic and of low, inharmonic spectral content. This would be signalled to the net and thus it would be more likely that a higher pitched, bright drone might be layered over it as this would benefit the balance relationship. If there would not be a suitable sound available on the site, another could be processed in such a way that it would fit.

An important part of all this is that it is very important that the basic modules (which are hardwired) are both flexible and reliable, functionally and aesthetically. In order to achieve this, the semantic network and the working modules must be meticuously crafted through introspection. I am certain that this can be achieved, as I have noticed that many very competent audio artists rely on a very limited set of methods, which doesn't prohibit interesting and pleasing results.

One of the pending possibilities is a natural language interface over the internet,
with a chatbot architecture provided by open source AIML programs. Complemented with the Mbrola speech synthesis engine and the manipulation and integration of the speech output in the generated audiostream, would make for a nice interaction.

It is important that the structure of Mai remains open, so that it can be extended at any time. An interesting example would be to make it possible for me to connect her over the internet to a performance of mine, so that she would engage in playing with me.
There are some issues of realtime processing to be adressed here, but none are fundamental.
Another would be that Mai manages an audio installation in another part of town, so that her thinking power can be used to make installations more complex. The audio output of the installation or performance could be streamed back to Mai and recorded for addition to her sounds for later use on the okno site.




