Glossary › Music and Artificial Intelligence

Algorithm

An algorithm is a mathematical or programming implementation of a goal to be achieved. Simple algorithms are those of statistics, where an average or deviation is calculated. These are often the basis of political and economic decisions. AI is also an algorithm, which calculates many problems far more effectively and includes identification or clustering as a goal. Other physical modeling algorithms include solving very complex mathematical equations or iteration methods, i.e. calculations that build on each other.

Additive Synthesis

Additive synthesis adds single oscillations to sound. The individual oscillations are sine tones with specific frequencies. A sound with a pitch, but also nois or percussion instrument sounds consist of many such sine tones. They can change in strength and frequency over time. An example of an additive synthesis is the organ when several pipes sound simultaneously, their individual sounds are then added. In synthesizers, additive synthesis is an important method of sound generation.

Amplitudes

An amplitude denotes the strength or magnitude of an oscillation. Strong amplitudes are therefore loud, weak ones low in volume. When amplitudes fluctuate in time, i.e. become larger or smaller, we hear a beating sound. If the speed of this beat becomes faster, the sound suddenly becomes unpleasantly harsh or rough. If the speed of the beat increases further, we no longer hear the original sound but a new sound, which is determined by the speed of the amplitude fluctuation.

Artificial Neural Network

An artificial neural network is an attempt to recreate the neural network of the brain in the computer. In most cases, the number of neurons and their connections is significantly reduced. Also the function of the neurons, which is highly complex in the brain, is drastically simplified. As a basic function, the property of real neurons is usually reproduced, which sends out a nerve impulse when there is sufficient input of nerve impulses from other neurons. It is also taken into account that with strong input over a longer period of time the activity of the neuron increases, i.e. it becomes even more receptive.

Auditiory Pathway

The Auditory Pathways, one for each ear, is the neural pathway from the ear to the neocortex, the cortex under the skull. After the ear converts the sound into electrical impulses, these neuronal spikes are passed through different and very complex neuronal nuclei where they are modified. Each nerve fiber is associated with a frequency, which is linked to a location on the basilar membrane of the ear, which in turn encodes that frequency. In the auditory pathway, pitch, timbre, spatiality and many other parameters are already extracted from the sound.

Basilar Membrane

The basilar membrane is in the inner ear and vibrates with the frequencies of sound transmitted through the eardrum. It is about 3.5 cm long, 1 mm wide and integrated into the cochlea with its three windings. Hair cells are located on it, which convert the mechanical vibration of the basilar membrane into electrical impulses. It also contains neurons that are controlled by the brain to optimize frequency accuracy or volume perception. A reduction in the reverberation of sounds also takes place here in order to increase speech intelligibility.

Cochlea

The cochlea is the organ of hearing. It consists of a spiral with about three turns, in which there are three channels filled with lymphatic fluid. At the lower end is a membrane that is indirectly connected to the eardrum. It is through this membrane that sound enters the cochlea. Between this and the other two canals is the basilar membrane, on which are the hair cells that convert sound into neural impulses. On the basilar membrane, the individual frequencies are arranged next to each other, so that it performs a frequency analysis of the sound.

Complexity of a Sound

Musical sounds have different perceived complexities or chaoticities. This is due to the ability of our hearing to fuse sounds consisting of harmonic overtone spectra, that is, to fuse them into a pitch impression; we hear a single tone. In the case of non-harmonic, i.e. inharmonic spectra, this fusion does not succeed and we perceive a complex sound. This can be measured by a fractal dimension, a measure of the degree of complexity or chaoticness. Very complex sounds merge into noise.

Compression

Compression is the attempt to make a large area manageable. All of our senses compress physical input into a manageable perceptual range. For example, our ear compresses the large frequency range from 20 Hz to 20 000 Hz into about eight octaves. In this process, frequency changes in the low frequency range are perceived very precisely and frequency changes in the high frequency range are perceived only very roughly. Our sense of loudness also compresses, as does our sense of sight, touch or smell. In the recording studio, this compression is reproduced by appropriate devices, compressors.

Connectionist Models

Connectionist models are a form of AI in which neurons are interconnected. A neuron is usually connected to a large number of other neurons, as is the case in the brain. There, each neuron is connected to about 10,000 other neurons. AI reduces this number considerably, so it is a gross simplification of the brain. The connections between neurons have connection strengths, so-called weights. When training the neural network, these connection strengths are constantly adapted to the input until the map is trained.

Deep Learning

Deep learning is a special type of AI that has an input layer, an output layer, and one or more hidden layers. Neural networks can also have only input and output layers. The hidden layers make the network ‘deep’, that is, deepen it. Depending on the size and complexity of the tasks, such a network can have only one hiddene layer, or about 27 layers like the GoogLeNet. The network learns an input, such as sounds, by ‘playing’ them to the network very frequently, changing the interactions between neurons according to the input.

Dutar

Dutar, du: two, tar: string, is a two-stringed instrument played by the Uyghurs, but also found among many other Turkic peoples. This instrument is quieter than the sethar and is therefore often used as an accompaniment to singing.

Dynamic

In music, dynamics refers to how much the volume of a piece changes. In classical music, there are often large loudness variations, the music is very dynamic. In pop music, hip hop, techno and similar music styles, attempts are usually made to reduce existing dynamics, i.e. loudness variations, they are compressed in the recording studio by means of a compressor This is usually done in order to keep the loudness level of the piece constantly high, whereby car drivers listening to music usually dwell on such pieces. This is also called the loudness war.

Frequency

In music, frequency refers to how often a sine wave repeats within one second. A sinusoidal oscillation is when the sound rises once to its maximum, then falls back to zero, then falls back to its negative maximum, and then rises again to zero. If this repeats itself about 100 times per second we hear a tone, whose pitch has the frequency 100 Hz (Hertz). Sounds usually consist of several such sine tones, each with its own frequency.

Gamelan

Gamelan is the Indonesian word for ‘orchestra’. The Javanese or Balinese gamelan orchestra consists of a variety of bronze instruments, metallophones, some with plates with which different pitches can be played, gongs but also flutes and a zither. Since the overtone spectra of these plates and gongs are no longer in the ratio 1:2:3:…, i.e. they are no longer harmonic but rather inharmonic, and since the tonal systems also do not follow any simple rule, the result is a very complex sound, which supports the often repetitive music of the gamelan.

Gongs and Cymbals

Gongs and cymbals used in China and in large parts of Southeast Asia, including Myanmar. Some gongs can play so-called pitch-glides. When the instrument is struck, the pitch slowly drops or rises, giving a dramatic effect.

Harpsichord

The harpsichord is a keyboard instrument that is the predecessor of today’s upright or grand piano, except that on the harpsichord the strings are not struck with a hammer, as on the piano, but are plucked with a quill. This quill is usually carved from the horn part of birds’ feathers. The plucking mechanism results in a very bright sound. The idea of the harpsichord was to mechanize plucked stringed instruments, such as the guitar, so that it was no longer a player who plucked the strings by hand, but a mechanism that does this.

Hulusi / Hulusheng

Hulusi (left) and Hulusheng (right). This type of instrument is not found in the West. The bamboo tubes, which are stuck in a gourd or wood, have a free reed and are blown. In the West, there are also free reeds, for example in the harmonica, but they are not attached to pipes. The instrument is played in Yunnan and adjacent regions with the Kachin and the Shan.

Intervals

In music, intervals are the distances between notes, for example in a melody. Intervals have a certain number of semitones. For example, the fifth has seven semitones, the octave has twelve, the major third has four, and the prime, i.e. the harmony, has zero semitones. The function of the intervals has grown historically, for example the division into harmonic (octave, fifth, sixth and third) and inharmonic (fourth, second, seventh, tritone) intervals. In Contemporary Music, on the other hand, these classifications disappear in favor of an equal perspective.

Kaehn

Kaehn from Laos. This instrument, like the hulusi and hulusheng, functions as a wind instrument with a free reed attached to a tube. It is mostly used as a solo instrument.

Key

Keys like major or minor, but also church keys are based on the idea that tones and chords have functions. Thus, there is a root and a fundamental chord on which a melody can end. The fifth and its chord of the dominant is just that, dominant, and ‘demands’ the root chord, thus has a tension and is unresolved in itself. Hugo Riemann has tried to explain this inner dynamic of the keys with the philosophy of Friedrich Hegel, so that a musical thought can be expressed. Also keys are assigned to regions of origin, emotions or rites.

Kohonen-Maps

Kohonen maps are a type of AI that learns an input through self-organization and then identifies clusters of similar elements. The learned map can then be used to assign and identify new elements, such as assigning a piece of music to a genre, determining a production method or a musical instrument. In contrast to e.g. Deep Learning, Kohonen maps are also able to determine why a cluster or an identification succeeded and thus provide statements about the trained data.

Labium (Pipe / Flute)

In wind instruments, the labium is the sharp edge onto which an air stream blows. The labium makes the system bi-stable, i.e. the air stream flows alternately to one side and then to the other of the labium. This results in a changing sound pressure and thus a sound. In this way, the energy of the air stream emitted by a flautist or given to the pipes by the wind chamber in an organ is converted into sound. In the case of the organ, the pitch is produced by the labium itself and amplified by the pipe; in the case of the flute, the pipe determines the pitch.

Melody

Melodies are known as sequences of tones. The brain automatically groups these into individual sections, such as verse, chorus, hook lines or short phrases. Grouping into so-called Gestalts, meaningful units, is a fundamental property of the brain and helps to order and sort the flood of sensory impressions. An AI can do this as well, as in face recognition, where the same face is always recognized from a multitude of image data and possible perspectives, or as with melodies that are transposed, played at different tempos or in variations.

Musical Parameter

Musical parameters are understood to be, for example, melodies, rhythms, timbres or musical form. Each of these parameters, of course, has a very wide range of variation and articulation. Nevertheless, it is these basic parameters that the human brain extracts as individual elements from music. Pitch and timbre have their own space, for example high and low tones, temporal sequences are separated by different time windows, by the minimum temporal resolution of 50 milliseconds, the short-term memory of 3 – 5 seconds and the long-term memory.

Neo Cortex

The neocortex is the cerebral cortex located under the skull. The different parts such as the frontal lobe under the forehead, the temporal lobe at the temples or the parietal lobe under the vertex are assigned to individual functions, so e.g. the auditory cortex is located in the temporal lobe. A distinction is also made between the right and left hemispheres of the brain. On the other hand, all regions are closely interconnected and the neuronal activity shows synchronized or desynchronized, which represents, among other things, musical tension. Subcortically, i.e. below the cortex, there is then e.g. the auditory cortex.

Neuron

Neurons are nerve cells that control brain activity. They have an electrical potential that is constantly changing. Neurons receive electrical signals from other neurons. If these accumulate, the neuron becomes active and ‘fires’ a so-called spike, i.e. an electrical potential to an average of 10,000 other neurons. But there are also so-called inhibitory neurons, which hinder the signal flow. These make up about 10-20% of the brain and without them the brain would not function. Neurons are virtually all the same, so the total activity of the brain consists of the interactions of neurons.

Partial tone

A musical tone can be thought of as being composed of many sine tones, each sine tone having its own frequency. In musical tones of e.g. guitars, pianos or violins, there is a lowest frequency at which we hear the pitch. The higher frequencies are in integer frequency ratios to the fundamental. For example, if the fundamental is at 110 Hz, that is a musical A, the higher frequencies are at 220 Hz, 330 Hz, and so on. The fundamental is called the first partial, the first overtone is called the second partial, and so on.

Physical Model

Physical Modeling refers to the mathematical modeling of a physical system. This system can be a musical instrument, the weather, a virus concentration and many more. Physical Modeling is an alternative to AI. The AI learns a data set, such as pieces of music, and then ‘knows’ everything that those musical styles contain. A physical model already contains all possible behaviors and properties of a system, they appear when the system is solved mathematically. However, solving a physical model mathematically is often much more time-consuming than computing an AI.

Roughness

Sonic roughness occurs when the frequencies of two sine tones are close together. Then they are placed so close together on the basilar membrane in the ear that they interfere with each other. This is why seconds or sevenths are rough. But other intervals that are further apart also sound rough if they are not tuned pure. Small detunings lead to tonal beatings, larger ones to roughness. The pure intervals like octave or fifth have very little roughness, so that major or minor tunings can also be explained as the intervals that are least rough.

In a tonal system, register refers to the octave position. Low notes are in a low register, high notes are in a high register. In the organ, however, a register, or stop, is a collection of pipes with a certain timbre, such as the Principal, Scharff, Spitzflöte or Vox Humana. Each stop goes over the entire range for which the stop has pipes. These stops can also be played together in so-called mixtures. Then several stops sound simultaneously with each keystroke.

Rhythm

Rhythm refers to the temporal succession of musical events, tones, percussion beats, or the like. There are divisive and additive rhythms. Divisive rhythms divide a unit of time into smaller and smaller subunits, i.e. the whole note into two halves, each half into two quarters, these into eighths, and so on. This also includes grooves or beats. Additive rhythms, as often found in folk music, such as in the Balkans, the Middle East, or sub-Saharan Africa, add long and short beats in succession. Sometimes these rhythms are taken from the speech rhythm of the sung poem.

Scales

Musical scales are the layering of notes into a tonal system. These include major, minor, the church keys, but also hundreds of other scales. On a coarse level, we can often specify the number of semitones between two scale notes, e.g. for C-D-E-F-G-A-B-C: 2-2-1-2-2-1, i.e. from C to D two semitones, from E to F one, and so on. On a fine, microtonal level we then distinguish tunings, i.e. fine detunings of individual scale tones. Many scales all over the world are difficult to understand in semitones, so that only the microtonal fine tuning is given there.

Sethar

Sethar of the Uyghurs. The instrument has three steel strings, which also gives it the name, se – three, tar: string. The soundboard is made of mulberry wood. Sethars exists in many countries among Turkic peoples along the Silk Road and is played solo or in a group.

Sharpness

Tone colors can sound more or less sharp. This is due to the energy in the frequency range between about 1 000 – 3000 Hertz (Hz). If there is a lot of energy, a tone or sound sounds sharp, if there is little energy, it sounds rather dull. Musically, sharpness is a means of expression but also a compositional means of creating musical form. Emotionally, sharpness is also an important parameter.

Soundboard

The soundboard is the wooden plate on which a string is mounted, for example on the piano, guitar or violin. This is necessary because a string alone is much too quiet and must be amplified. The soundboard, however, sounds quite different from the string. The interaction of string and soundboard is a self-organizing system, whereby the string forces the soundboard to vibrate with the frequencies of the string. However, the soundboard still ‘fights back’ when the string begins to vibrate, and so a completely unique, instrument-typical and percussive sound component is created at the start of each note.

Sound Pressure Level

Psychoacoustics describes the relationship between a physical quantity and perception. Sound Pressure Level (SPL) is the physical quantity of sound pressure as it reaches the ear or can be recorded with a microphone. In perception, this loudness, the Sound Pressure Level (SPL) becomes loudness. In perception, this loudness, the SPL becomes loudness. This loudness is not 1:1 the SPL, but is strongly compressed and frequency-dependent, from which e.g. the sound measure dB(A) od dB(C) can be evaluated in listening tests.

Spectral Flux

Spectral flux is the change of frequencies of a sound in time. Sound can change its amplitude or frequency, resulting in most effects known to music, such as beats, vibrato, or reverb and echo. Effects devices such as chorus, flanger, phaser or reverb all produce spectral flux. Also, each new tone changes the spectral flux significantly, through the complexity of the transient or new pitches. In computational processing of music, therefore, spectral flux is an important parameter.

Spectrum / Spectral Centroid

Perhaps the most important aspect of timbre is its brightness. Basses are darker, trebles brighter. Also, a tone performed by different musical instruments sounds differently bright. Listening tests repeatedly show that this brightness is an important anchor for the identification of musical instruments or, for composers, an essential parameter for expression and form design. Perceived brightness corresponds to the arithmetic mean of a frequency spectrum, i.e. the weighting of frequencies and their amplitudes, which is called spectral centroid.

Spike

A spike is a nerve impulse that is exchanged between neurons. This impulse of 80-100 millivolts is an ionic current, thus considerably slower than the current in a power cable. After a neuron has sent out a spike it must first recover. Neurons in the body thus perform 3-8 spikes per second. The fastest neurons in humans are in the ear and manage up to 300-400 spikes per second. All neuronal activity consists only of the exchange of spikes, which often synchronize and run through the brain as a so-called spike bursts.

Sub-Cortical

Subcortical refers to all neuronal structures that lie below the neocortex, the cerebral cortex under the skull. This includes the auditory pathway, i.e. the neuronal nuclei from the ear to the auditory cortex in the neocortex. But also all other neurons from the sensory organs to the brain, as well as nerves going from the brain via the spinal cord back into the body are subcortical. In the past it was thought that only the cortex has consciousness, but in the auditory pathway we already find most of the musical parameters that we consciously hear.

Timbre

Timbre is the part of a sound that is not pitch. This definition comes from the uniqueness with which pitches are heard and the wide range of properties that make up timbre. The most important properties of timbre are brightness, roughness, sharpness, fluctuation, loudness, spatiality or harmonicity, next to many others. We are able to perceive the finest nuances, to assign sounds to instruments and their families, to hear associations with materials and construction methods, to perceive articulations, and so on.

Tonal System

See Key

Tonal Systems and Intervals

See Key

Tunings (Meantone Temperament, Equal Temperament)

It is not possible to tune a musical instrument so that all intervals are pure in all keys, i.e. correspond to simple numerical ratios such as 2:1 for the octave or 3:2 for the fifth. As long as one stays in one key, e.g. C major, a pure tuning is possible. But already in the Renaissance keys like F, C, G and D were common. For this, all tones were detuned a little bit to the so-called mean-tone tuning. Werkmeister, Kirnberger, Valotti, Young and many others subsequently proposed further detunings, i.e. tunings or compromises, in order to get through all 12 keys.

Travelling Wave

Waves propagate in space in a variety of ways. One particular type of wave is the traveling wave, as it occurs on the basilar membrane in the inner ear. This wave rises over the membrane, reaches a peak and then drops steeply behind it. Each frequency has its own place where it reaches this peak. Thus, the ear achieves frequency analysis by assigning each frequency its own location on the basilar membrane. This traveling wave is achieved by the basilar membrane changing its stiffness and internal damping over its length.

Transient

Like transient Phase

Transient Phase

Musical instrument sounds have a transient phase, followed by a so-called quasi-stationary sound part after the transient. This stationary part is mostly harmonic, we hear a pitch. The transient, however, is often very noisy and chaotic. Nevertheless, it is important, for example, for recognizing the instrument. If the transient is cut off in the recording studio and only the stationary part is played, it is often difficult for the listener to recognize the musical instrument. The transient is also very important for musical articulation.

Sumpyi

Bamboo flutes of the Kachin (sumpyi), similarly found throughout Southeast Asia. The players build the instruments themselves within a few minutes. Since bamboo is a grass, the flutes break easily and are thus quickly replaced.

Valence / Arousel-Model

Human emotionality is tremendously complex. Nevertheless, basic emotions show up, which can be felt, but can also perceived by people in music, a work of art or a musical expression. These include valence, i.e. whether an emotion is positive or negative, and so-called arousal, i.e. whether the emotion is loud and effervescent or quiet and introverted. These two extremes of emotion can be assigned to musical parameters, e.g. loud and variable music, and thus computationally calculated from a piece of music.