X3V1.8M/SD-7 Journal of Development Standard Music Description Language Part Two: Technical Description and Formal Definition Editor: Alan D. Talbot, New England Digital Corporation June 4, 1988 June 16, 1988 - 2 - CONTENTS 0 Introduction 4 0.1 Purpose of the Document 4 0.2 Development Methodology 4 0.3 Editorial Conventions 5 1 Scope and Field of Application 5 1.1 Scope 5 1.2 Field of Application 6 3 References 6 4 Definitions 6 5 Notation 8 6 Structure and Content 8 7 Work 9 7.1 Work Segment 10 7.2 Work Segment Reference 11 8 Core 11 8.1 Thread 12 8.1.1 Core Event Sequence 13 8.1.2 Core Event Group 14 8.1.3 Core Events 14 8.1.3.1 Notes and Rests 14 8.1.3.2 Graced Events 15 8.1.3.3 Special Events 16 8.1.4 Core Event References 16 8.2 Time 16 8.2.1 Beat 16 8.2.2 Duration 17 8.3 Stress 17 8.3.1 Beat Number 17 8.3.2 Stresses 18 8.3.3 Meter 18 8.4 Tempo Sequence 18 8.4.1 Tempo 18 9 Gestural Domain 20 9.1 Track 21 9.1.1 Gestural Event Sequence 21 9.1.2 Gestural Event Group 21 9.1.3 Gestural Event 22 9.1.4 Gestural Event Reference 22 9.2 Click Track 22 10 Visual Domain 23 10.1 Part 23 10.1.1 Visual Event Sequence 24 10.1.2 Visual Event Group 24 10.1.3 Visual Event 24 10.1.4 Visual Event Reference 25 10.2 Space 25 11 Analytical Domain 25 11.1 Voice 26 11.1.1 Analytical Event Sequence 26 11.1.2 Analytical Event Group 27 11.1.3 Analytical Event 27 June 16, 1988 - 3 - 11.1.4 Analytical Event Reference 27 12 Bibliographic 27 12.1 Theme 28 13 Conformance 28 Annex A 30 Annex B 40 B.1 General Application 40 Annex C 42 C.1 Definitions 42 C.2 Structure 42 C.3 Segregation 42 C.4 Language 43 Annex D 44 D.1 Structure 44 D.2 Punctuation 44 Annex E 45 June 16, 1988 - 4 - 0 Introduction This is the second part of a two part document describing the work of ANSI X3V1.8M, the Music Information Processing Standards Committee (MIPS). The parts are known collectively as the Journal of Develop- ment. "Part One: Objectives and Methodology" describes the objectives of the project and the development methodology employed. "Part Two: Technical Description and Formal Definition" describes the language design itself, and provides a formal definition. This document and the other Standing Documents comprise the output of the committee, while the other documents and material presented at the meetings provide the input. NOTES 1. The Journal of Development is maintained in two parts only to facilitate maintenance by separate individuals; the two parts should always be read as a single document. There is much in Part Two, for example, that may seem confusing or contentious if it is not read in the context established by Part One. 2. This introduction appears in both parts of the Journal of Development. 3. General information about the MIPS committee, including a guide to participation, can be found in committee document X3V1.8M/SD- 0. 0.1 Purpose of the Document The Journal of Development describes the status of the Standard Music Description Language (SMDL) being developed by the Music Information Processing Standards (MIPS) Committee. It is intended as the technical and methodological design specification which will ultimately evolve into a Standard. 0.2 Development Methodology Both parts are revised by their respective editors after each meeting of the committee. As a result, the documents never represent text that has been agreed on in detail by the committee, but only the edi- tors' best efforts to express the committee's ideas. Moreover, the ideas in the journal are subject to further study and revision and do not represent a final design. Eventually, the design work will reach a point where all aspects of the language have been addressed, although not necessarily finalized. At that point, the Journal of Development will cease to be the vehicle that expresses the current language design. Instead, the committee will produce one or more successive "working drafts", consisting of June 16, 1988 - 5 - text that has been agreed to. During the Journal of Development and working draft stages, public comment is sought and considered, but the process is informal. When, eventually, the committee is satisfied with a working draft, it will recommend that X3V1.8 process the document as a "draft proposed Ameri- can National Standard". There will then commence a formal public review and ballot, during which all comments received will be responded to in writing. 0.3 Editorial Conventions Formal standards can be complex documents in which every word has both legal and technical significance. They may also need to be translated into other languages. For these reasons, editorial conventions have been established to assure precision, accuracy, and clarity (albeit often at the expense of readability by the general public). The key principles are: 1. Precise and consistent definitions of terms. 2. Distinguishing real requirements from mere commentary, explana- tions, and examples -- and from definitions. 3. Avoidance of redundancy. (Repetition of a requirement is nor- mally expressed as a note, to avoid the question of which text governs if the "repetition" is imperfect.) Part Two of the Journal of Development observes some of the editorial conventions of a formal standard, but not yet with the strictness and consistency that will be required in the final document. (See Annex C of Part 2 for details.) 1 Scope and Field of Application This section defines the range of applicability of the Standard. It specifies the limits of what the Standard can be expected to represent, and what is outside the design criteria. NOTE: In order to proceed in a timely fashion, we found it necessary to choose a subset of all possible music for the scope of the Stan- dard. As the design matures, we expect to expand the scope to meet any further needs of its users. 1.1 Scope This Standard defines a language for the representation of music information, either alone, or in conjunction with text, graphics, or other information needed for publishing or business purposes. The language is known as the "Standard Music Description Language", or "SMDL". NOTE: The Standard Music Description Language is an SGML application conforming to International Standard ISO 8879 -- Standard Generalized June 16, 1988 - 6 - Markup Language. The SMDL is capable of representing many (but not all) genres of music, and most (but not necessarily all) instances of works in those genres. The primary focus is on music that can reasonably be expressed in Standard Western Musical Notation. NOTE: The scope as defined should encompass the vast majority of music. It does not exclude the use of special symbols that can be placed in the score, nor of modern notational practices. The only cri- terion is that the music be capable of representation as notes on a staff, regardless of whether it was actually written down that way, or even written down at all. The SMDL is designed for flexibility and extensibility. There are no technical prohibitions against the use of some components without the whole, or against the use of user-defined components in conjunction with standardized ones. The Standard includes a conformance clause that identifies minimum and higher levels of support in terms of standardized language components and options for user extensions. 1.2 Field of Application Pieces that are composed on computer devices, pieces that exist as printed scores, pieces that are performances recorded in a manner that permits machine transcription, and pieces that are already represented in some language, are all within the field of application of this Standard. Pieces that have other sources, such as digital audio recordings, can be associated and synchronized with pieces described in SMDL. They can exist as elements in the same document as SMDL works, but will have their own representation (document type definition and data content notations). 3 References ISO 8879, Information processing -- Text and office systems -- Stan- dard Generalized Markup Language (SGML). X3V1.8M/SD-6 Journal of Development -- Standard Music Description Language -- Part One: Objectives and Methodology 4 Definitions The following items are used in a number of places in the text but are not explicitly defined. They are essential to the understanding of the Standard, and many have been assigned meanings which differ from com- mon usage. 4.1 analytical domain: The portion of a work which contains music theoretical analyses. 4.2 analysis: A music theoretical analysis of the piece, such as a June 16, 1988 - 7 - Shenkerian analysis. An examination of the piece as opposed to a ren- dition of the piece. 4.3 beat: A relative time unit that is used for measuring durations in the core. NOTE: It should be the "felt" beat (tactus) if known. Otherwise, it can be chosen for convenience; for example, the smallest or most com- mon note value in the piece (i.e., 1/4, 1/8, 1/16, etc.) 4.4 bibliographic data: Identification information used to catalog and archive pieces of music (or any other works.) 4.5 core: The portion of a work that represents the basis of all of the performances, scores, and analyses. The logical musical material as opposed to the performance or score specific material. NOTE: The core can be thought of mechanistically as the information which is most convenient to share in common among the other domains, and among multiple instances in the same domain. Philosophically, it can be thought of as the information that is necessary (and in the case of conventional Western music, sufficient) to distinguish the piece from all others. 4.6 gestural domain: The portion of a work that represents live per- formance data such as precise timing and dynamic fluctuation. 4.7 logical: The basic musical content of a piece of music, such as the time values, pitch values, and basic groupings such as chords and tuplets. 4.8 logical domain: The core. 4.9 markup minimization: The elimination of redundant verbiage in the actual representation of a work. NOTE: SGML has been designed to allow this to happen naturally, so it is not necessary to consider it in the initial design of the Standard. 4.10 MIPS: Music Information Processing Standards Committee; ANSI X3V1.8M. 4.11 performance: A particular realization of a piece, either by mechanical means or by a musician. 4.12 piece: A musical composition. 4.13 real time unit: The basic unit of measurement of time in a work; the smallest representable division of time. 4.14 SGML: Standard Generalized Markup Language; a text markup language and structured design tool. SGML is an International Standard and is fully defined and described in ISO 8879-1986. June 16, 1988 - 8 - 4.15 SMDL: Standard Music Description Language; this Standard. 4.16 score: A printed piece of music; an edition. 4.17 tuplet: A group of notes which occur in a different time frame than the surrounding notes; a time anomaly. NOTE: A triplet, a quintuplet, and a duplet in compound meter are all tuplets. 4.18 visual domain: The portion of a work which represents the score; the music engraving information. 4.19 work: The SMDL representation of a piece. NOTE: A work has four domains: core, gestural, visual, and analytical. In addition, bibliographic data can be associated with the work as a whole or with instances of any of the domains. 5 Notation This Standard is expected to be an SGML application, and the develop- ment is proceding using SGML as a design tool. For this reason, the formal syntactic and structural definitions in this document are in SGML. A brief discussion of SGML syntax and semantics can be found in Annex D. A complete and definitive treatise on SGML is found in ISO 8879-1986. It is intended that the text describing each element and attribute will be a complete definition and explanation, but the formal language of the SGML coding provides the rigorous definitions underlying the text descriptions, and will show the mechanism behind each technique that is presented. For this reason, excerpts of the SGML encoding have been interspersed with the text at strategic points. It is recommended that the reader refer to the SGML in the text and in Annex A while reading the technical definitions. NOTE: In the case of conflict between the SGML excerpts in the text and the formal specification in Annex A, the SGML in Annex A will govern. 6 Structure and Content The Standard will be based on a hierarchical structure which describes a piece in terms of four basic sections: the underlying musical form, a set of performances (presumably to be reproduced by a machine), a set of scores in the form of Standard Western Music Notation, and a set of theoretical analyses. We feel this structure best reflects the conceptual divisions inherent in music in light of the uses to which the Standard will be put. These divisions may not represent the philo- sophically most elegant approach to the expression of musical ideas, but we feel they will they will be maximally useful. This separation of the whole into performance and score, and the extraction of the logical musical concepts, seems an unavoidable outcome of the way June 16, 1988 - 9 - music has come to be performed and notated, and has long been present in Western music. This hierarchical structure will be codified in terms of elements. Elements are basic structural building blocks which provide a frame- work and a means to relate and collect information. Each element has a related information set consisting of attributes. These will contain much of the actual data, as the element itself is basically a place holder. For instance, an event is an element, and may represent a note, in which case it will have attributes describing pitch, dura- tion, and possibly dynamic level. Attributes can be defined by the user as well as the designer. This allows almost unlimited flexibility in representing unusual material that may not have been foreseen dur- ing the design. The representational scheme is based on the separation of the basic musical content (pitch, rhythm, harmony, etc.) from the purely perfor- mance oriented information (intonation, rhythmic interpretation) and the purely score oriented information (page layout, horizontal spac- ing, clef). This means simply that some process or machine must be able to separate the work into one or more of these categories for this Standard to represent it. (These divisions are discussed in detail in the following clauses.) This is not to say that the piece must originate in a separated form, only that it can be separated for the purpose of encoding in the Standard. While it is possible to ima- gine pieces which are not separable in this way, almost all works in all genres are in fact easily separable. 7 Work NOTE: This and the following clauses are devoted to a detailed defini- tion of each element of the structure, and the information it con- tains. (A description of the applications of these elements is found in Annex B.) Some of the attributes have been defined and are described below, but some have not yet been addressed. The assumption is that every element will have an attribute list, containing at least an identification mark for reference by other elements. Additional items will be added to the attribute list as they are defined, but in the interests of top down design, we are concentrating on the overall structure first, leaving the myriad and obfuscating details for later. The work is the top level of the hierarchy. The work encompasses the entire document, and is defined as the logical musical information, and all of the performances, scores, and analyses that stem from that musical information. If a "piece" actually has several versions which differ in basic ways, those versions must each be a separate work. All of the remaining elements are contained within the work. The source is an attribute of a work which indicates what form the piece originated from. It distinguishes between a piece which was cap- tured from a MIDI stream, a piece which was entered from a printed score, and a piece which was composed and entered as logical informa- tion. June 16, 1988 - 10 - The composer analysis attribute, if present, indicates an analysis which was created by the composer. NOTE: The intent is to label that information which is definitive as opposed to that which represents an opinion. The rtu base is a time reference which specifies the order of magni- tude of the timing in the work. It specifies the number of real time units (rtu) per second. NOTE: The intent is to allow a wide range of pieces to be realized with an implementation of limited precision. If 32 bits are used to hold time values, for instance, setting rtubase to 1 will give about 100 years of time measurable to 1 second accuracy. Setting it to 1,000,000 will give about 1 hour at 1 microsecond accuracy. 7.1 Work Segment The work segment is a structural device for dividing the work along major boundaries. Workseg is defined self-referentially so that repetitions and other constructs can be easily represented. Movements of a symphony would be placed in separate segments, as would acts in an opera or any other divisions that affect all aspects of the piece (i.e. all parts, all instruments, etc.) The segment will also be used for making global changes such as key changes, time signature changes and instrumentation changes. If the piece changes key or time signature, that often affects every part and instrument, and indicates a major turning point in the music. In such cases, the material before the change should be in one segment, and the material after in another. One very important use of the work segment will be in cases where the instrumentation changes. If the piece starts out with full orchestra, and later proceeds with only strings, then two segments should be used to separate the sections. This will greatly assist in maintaining a useful relationship between the threads in the core, the parts in the score, and the tracks in the performance. Another use is to indicate the composer's intent. If the composer or the editor wants a major division in the work, the work segment can be used to indicate the division even though none of the above situations June 16, 1988 - 11 - apply. The class is a label indicating the use of the workseg. It is coded as a text string, not as a machine interpretable value. The delay indicates the expected pause (if any) between the workseg and any following workseg. It is coded as a text string, not as a machine interpretable value. 7.2 Work Segment Reference The work segment reference is a structural tool to allow a work seg- ment to reference other work segments. This provides flexibility in creating repeats and loops, and allows analyses to refer to work seg- ments. 8 Core The core is the basis for a work, and a work has one and only one core. The core contains such information as pitch, note value, har- monic groupings, phrasings, tuplets, etc. A piece for which a core is not producible can not be represented, and a piece with more than one core must be represented as more than one work. We will see, however, that several interpretations of the same basic piece can reside in the same work if they derive from the same core. Let us take the example of a simple piano piece. We have a performance captured by a MIDI sequencer, and the score from which the performance was played. The core will contain an element for each note and rest in the score, thus representing the logical basis of the work. A given measure in the core may contain no notes, and the corresponding spot in the score may say "ad lib". At that point in the performance, there are several improvised notes. It is possible that another performance with a different improvised section, and another score which specifi- cally details a cadenza, might be included in this work and be based June 16, 1988 - 12 - on the same core. The normalized attribute states whether the core has been normalized. It may often be desirable that the core have a canonical (normalized) form. That is, that there be one particular form which will always be used for a given piece. (Note that the definition of the core does not provide orthoganality, so there are many ways that a given piece could be represented.) For such situations, an algorithm can be applied which translates any arbitrary core into a given canonical form. The user may create such an algorithm to fit the needs of the application, or the Standard Canonical Form can be generated using the Standard Algorithm. We plan to provide this Standard Algorithm both as a way of providing consistency between applications and as a model for other algorithms. The normalization algorithm attribute states which algorithm has been used to normalize the core. If "standard" is used, it is expected that implementations will access the Standard Algorithm. If another algo- rithm is used it can be identified here, and may be implementation specific. 8.1 Thread The thread is a sequence of musical events which lasts for the dura- tion of the piece. It is analogous to a track in a sequencer or on a multi-track tape deck. The purpose of the thread is to allow the core to be sectioned into concurrent streams of notes and other events, mostly for the sake of convenience. There is no assumption made about how the piece will be divided into threads, but logic suggests that parts in a score, tracks in a sequence, or voices would be the best choices of thread allocation. The tempo sequence attribute indicates which tempo sequence is to be used for this thread. The nominal instrument attribute records for posterity the instrument that the composer had in mind (in case anybody cares.) The point is that the gestural section, which contains the timbral information, may be an interpretation by someone other than the composer. This will be encoded as a text string, not as coded timbral data such as is found in the gestural section. June 16, 1988 - 13 - 8.1.1 Core Event Sequence A core event sequence is a collection of core events, other core event sequences, and core event groups. A core event sequence groups sequen- tial events, as in movements, measures or tuplets. These groups may be nested to any depth and combined in any way. Each thread is made up of a structure of core event sequences which is as complex as is neces- sary to represent the music completely. The time factor attribute is a fraction which describes the relation- ship of the beat inside a given sequence and the beat surrounding (or underneath) the sequence. Time anomalies (such as triplets) will be represented by setting the time factor to the correct fraction. For example, if the beat of a piece falls on the quarter note (so quarter notes have a time value of 1) and an eighth note triplet is encoun- tered, the triplet could be expressed as a sequence of three notes of value 1 with a time factor of 1/3, or as a sequence of three notes of value 1/2 with a time factor of 2/3. It may turn out to be desirable to specify that every event sequence must contain an integral (non- fractional) number of beats. This would not be limiting since a common denominator can be found for any situation. The stress id attribute is a reference to a stress pattern to use for this sequence. The stress beat attribute is the offset (in beats) into the stress pattern at which the sequence starts. A common use for this would be for an anacrusis (pick-up measure). The ornamentation style attribute is a text string which allows the composer or editor to record remarks on the ornamentation style of the sequence. NOTE: This attribute should perhaps modify stress instead. June 16, 1988 - 14 - 8.1.2 Core Event Group The core event group is a collection of events or sequences which are initiated simultaneously. A chord is a group which contains events (notes). A section of a thread may well be a group containing a sequence for each of several parallel voices. This is an alternative to placing each voice in a separate thread. 8.1.3 Core Events The core event is the basic unit of the structure. Notes and rests are examples of core events, but other occurrences may also be represented as events. In general an event is some occurrence or item which has a single definable starting point in time, and a definable duration. 8.1.3.1 Notes and Rests The note and the rest are the most common musical events. They are very similar in that they are simple events (as opposed to compound events like the graced event). The note has a pitch attribute which specifies its scale tone and octave. June 16, 1988 - 15 - 8.1.3.2 Graced Events The graced event is a compound event in that it consists of a main event ornamented by a "grace" modifier. The modifier is an event sequence which can either precede or follow the main event, and which will not consume time as will a normal sequence. The preceding grace modifier is an event sequence which starts at a given time and proceeds until finished, at which time the grace sub- ject is started. The grace subject is the main event. It starts after the preceding modifier and continues until the end of its duration, or until the start of a posterior modifier. The posterior grace modifier starts at a given time after the main event has started, and proceeds until finished. The grace synchronization attribute specifies the starting time offsets of the preceding and posterior modifiers. It is measured from the start time of the subject, and the end time of the subject, respectively. June 16, 1988 - 16 - 8.1.3.3 Special Events The special event contains user defined information about timed events other than conventional musical occurrences. Its content (other than starting time and duration) will be application specific. 8.1.4 Core Event References Core events are accessed through core event references. These are pointers which allow the core to be referred to in arbitrarily complex ways by the performance, score, and analysis sections of the piece. This process will be explored in more depth in Theory of Use. This structure yields a very flexible system for organizing and referring to events. 8.2 Time It is in the core that the time of the piece is represented. By time we mean the rhythmic relationship of each event to all other events. This is not to be confused with tempo, which refers to the rate of progress of the piece. The time model has several components which combine to form a system which we hope will account for any situation within the scope of the Standard. 8.2.1 Beat All time must be measured in relation to some base which is not open to interpretation. That base will be called the beat. The beat is defined to be that time interval which, at any given point in the piece, is small enough to divide without remainder into all existing subdivisions of the sequence, excluding time anomalies. This beat will only be assigned an absolute value in the gestural section; in the core it is simply a common reference. If the beat changes in meaning as the piece progresses, then the core will be sectioned into more than one sequence. Each sequence will specify the relation of its beat to an overall reference beat. June 16, 1988 - 17 - Since the beat is a relative measurement, the performance can be linked to any time base that is appropriate. The beat can be assigned a fixed duration, an algorithmically generated variable duration, or be related to a live recorded click track. Similarly the score can use any appropriate time signature for a given passage. The same piece could, for example, be scored in 4/4 as triplets or in 12/8 as straight eighths. Indeed, a score representation in each meter could refer to the same core. 8.2.2 Duration Each core event will have a music duration (note value) attribute which is stated as a fraction of a beat. The time consumed by a core event sequence will be the sum of the durations of its events in beats. Accumulated time is therefore represented as the sum of dura- tional time, necessitating the definition of events which sound (notes), and events which do not (rests). The model will support single events or tied events. Tied events are strings of events which are taken together to represent one event with a duration that is the sum of each of the individual durations. When a note starts sounding in one event sequence and continues into the next, the note is split into two tied events of the appropriate dura- tion. The tie attribute indicates that the event is tied, and to which subsequent event it is tied. 8.3 Stress The stress element indicates how a passage is to be stressed dynami- cally. It consists of a set of template sequences that indicate which beats are to receive what stress. Stress can be dynamic, agogic (tempo related), or can be related to other user specified parameters. The beat count attribute indicates the number of beats in the template cycle. 8.3.1 Beat Number The beat number element marks a particular beat in a stress cycle as June 16, 1988 - 18 - receiving stress. 8.3.2 Stresses The stresses element contains information on what kind of stress is to be applied to the beat with which it is associated. 8.3.3 Meter The concept of meter is expressible in the core by creating a stress template which models a measure. In 4/4 time, a template may have 4 beats, and may mark the first as having maximum stress, and the third as having moderate stress. In the case of a complex metric situation, such as a measure of five which is felt as two and three, a nested structure of stress templates can be used to accurately indicate the feel. If ambiguity is desired however, the measure can be represented as simply five beats. NOTE: The inclusion of the meter in the core reflects the philosophy that measures are a basic logical concept in music, rather than strictly a score related issue. This is certainly not true of all music, but the facility must be there for those pieces for which it is important. 8.4 Tempo Sequence The tempo sequence element is a list of time stamped tempo modifica- tions which govern the tempo of the piece. Each thread refers to a particular tempo sequence; there can be several if the piece involves conflicting tempi. 8.4.1 Tempo The tempo element is the basic building block of the tempo sequence. It specifies a tempo change from the current tempo to a target tempo. June 16, 1988 - 19 - (The current tempo is the tempo in effect infinitesimally before the start time of the tempo element. The target tempo is the tempo in effect infinitesimally before the start time of the next tempo ele- ment.) The attributes have been defined to give a large degree of flexibility in specifying changes over time, and maintaining ambiguity and imprecision when desired. The music duration attribute specifies the life of this tempo setting (the time until the next change) in beats. The set value attribute specifies a precise target tempo, either abso- lutely in rtu's per beat or as a percentage of the current tempo. The set text attribute specifies an imprecise target tempo. The value is represented as a text string, and presumably will be a common musi- cal expression such as "presto" or "moderato". The rate value attribute specifies a precise formula for reaching the target tempo from the current tempo. It can be specified as "immedi- ate" (instant change at the start time of the tempo element), "linear" over the duration of the tempo element, or represented by a mathemati- cal formula in the form of a text string. The rate text attribute specifies an imprecise formula for reaching the target tempo from the current tempo. The value is represented as a text string, and presumably will be a common musical expression such as "accelerando" or "ritardando". The hold duration attribute specifies a precise pause in the counting of music time. Its value is absolute in beats. It can be used for a fermata, a full stop, or any other pause or interruption of the normal time flow of the piece, such as an unaccompanied solo cadenza. The hold starts at the starting time of the tempo element, and the tempo duration begins at the completion of the hold. The hold type attribute specifies an imprecise pause in the counting of music time. It can be specified as "full stop", "long", "medium", "short", "none", in non-increasing order of length. The actual time value of these specifiers is implementation dependant. The hold starts at the starting time of the tempo element, and the tempo duration begins at the completion of the hold. The strictness attribute specifies the precision with which the tempo should be followed during a realization of the piece. It is specified as "strict tempo", or represented by a text string containing a common musical expression such as "rubato". June 16, 1988 - 20 - 9 Gestural Domain The gestural section of the piece contains the performances. While each work has only one core, it may have several gestural sections, each a different performance (and hence different interpretation) of the piece, and each linked to a particular score The gestural section refers to the core for the majority of its musical material, but may have events of its own. Usually these events will be ad lib notes and performance control information such as volume or timbre selection. The gestural section is intended to represent data for an automated performance of the piece. That data could be generated by a live per- formance or by non-real-time composition, then returned to a syn- thesizer for realization. The performance is the top level gestural element. Each performance typically realizes the entire piece. The score attribute identifies a score in the visual domain which represents the edition which produced this performance, if such a score exists. The closure attribute indicates whether every event in the core was realized in this performance. June 16, 1988 - 21 - 9.1 Track The track is analogous to the thread in the core. It will be used to drive one channel of sound output, or one instrument. It is the pre- cise counterpart of a track on a multi-track. Unlike the thread, the division of music into tracks may need to follow certain restraints imposed by the device that will perform the piece. For example a track may have to be limited to events which are to sound in the same tim- bre. A track is made up of gestural event sequences, which are made up of gestural events, gestural event references, and core event references. It is through these core event references that the core becomes the basis of the gestural section. While it would be possible through the use of gestural events to represent a performance that was unrelated to the core, the intention is that the track will contain mostly per- formance control information, and refer to the core for most or all of the notes, rests, and other basic conceptual material. 9.1.1 Gestural Event Sequence 9.1.2 Gestural Event Group June 16, 1988 - 22 - 9.1.3 Gestural Event 9.1.4 Gestural Event Reference 9.2 Click Track The click track is a gestural event sequence with an event to mark each beat in the piece. This element will provide a means for relating beats in the core to real time. Click tracks can have arbitrarily spaced events, so any kind of expressive performance can be June 16, 1988 - 23 - represented. The click track will usually be generated by a transcrip- tion program in the process of creating a work from a live perfor- mance. Note that a click track does not need to be present, since a rhythmically exact performance can be generated from the core alone. 10 Visual Domain The visual section of the piece contains the scores. While each work has only one core, it may have several scores, each a different edi- tion (and hence a different interpretation of the piece), and each linked to a particular performance. The visual section refers to the core for the majority of its musical material, but may have events of its own. Usually these events will be symbols that appear on the score aside from notes, rests, and accidentals. Such items as phrase markings, beams, accents, dynamic markings, and lyrics would be found here. The visual section is intended to represent the printed score in Standard Western Music Notation. The score could be generated by a music printing system and returned to such a system for printing or display. The score element is the top level of the visual domain. Each score is a presumably complete edition of the piece. The performance attribute specifies a performance in the gestural domain which was generated from this particular score (edition), if such a performance exists. The closure attribute indicates whether every event in the core was notated in this score. 10.1 Part The part is analogous to the thread in the core. It will be used to print one part of the score for one instrument. It is the precise counterpart of a staff in a score. The division of music into parts June 16, 1988 - 24 - will be based on the desired appearance of the score. A part is made up of visual event sequences, which are made up of visual events, visual event references, and core event references. It is through these core event references that the core becomes the basis of the visual section. While it would be possible through the use of visual events to represent a score that was unrelated to the core, the intention is that the part will contain mostly visual symbols, and refer to the core for most or all of the notes, rests, and other basic conceptual material. 10.1.1 Visual Event Sequence 10.1.2 Visual Event Group 10.1.3 Visual Event June 16, 1988 - 25 - 10.1.4 Visual Event Reference 10.2 Space The unit of space will be defined relative to the size of the staff and note heads. The actual size of the printed staff is not defined except perhaps as a global attribute of the visual section. A unit of one staff space for the vertical and one note head width for the hor- izontal will provide the basis for all spatial measurements. Spatial relationship will be representable in several ways: as an absolute position on a line (staff), as a relative position from another object, and as a relative position from a logical (time) posi- tion on a staff. Furthermore, for each of these possibilities there will be an explicit position (specified in spatial units) and an implicit position. The implicit position will take the form of a non-numerical relationship to some other object, such as "above the staff" or "between this note head and the one to the left". 11 Analytical Domain The analytical section of the piece contains any analyses that may have been produced. A work may have several analytical sections, each a different analysis (and hence a different interpretation of the piece.) The analytical section refers to the core for the majority of its musical material, but may also refer to performances and scores. The analytical section is intended to represent a structuring of the piece based on any style of analysis. The analysis could be generated June 16, 1988 - 26 - by a specialized music printing/editing system and returned to such a system for printing or display, or might take the ultimate form of a written document. It might even be generated automatically by a com- puter system. The analysis element is the top level of the analysis structure. It represents a presumably complete analysis of the piece, by a particu- lar analyst. If several analyses by different analysts exist, they will each be is a separate analysis. The analysis can refer freely to any other elements of a work, thus allowing complex relationships to be represented. 11.1 Voice The voice is analogous to the thread in the core. It will be used to represent one voice or melodic line of the piece. It is the counter- part of a passage of notes that have the same stem direction. The division of music into voices will be based on the voicing of the piece intended by the composer or analyst. A voice is made up of analytical event sequences, which are made up of analytical events, analytical event references, and core event refer- ences. It also can contain gestural event references and visual event references. It is through these references that the analytical section can arbitrarily structure any aspect of the piece in order to illus- trate a music theoretical idea. 11.1.1 Analytical Event Sequence June 16, 1988 - 27 - 11.1.2 Analytical Event Group 11.1.3 Analytical Event 11.1.4 Analytical Event Reference 12 Bibliographic The bibliographic entry is found at the top level (as an element of work) and can also be used at lower levels. It contains much of the bibliographic and discographic data necessary for the cataloging of a piece.The bibliographic entry will contain the information necessary to make the Standard useful. Such items as title, author, issuer (pub- lisher), date, and copyright will all be explicitly defined. In addi- tion, a miscellaneous area will be available which can contain any information that is not defined elsewhere. If desired, a bibliographic entry may be made for each performance in the gestural section, or for June 16, 1988 - 28 - each edition in the visual section. The attributes are explained in the SGML code comments. NOTE: We have not attempted to form an exhaustive structure for the representation of complete library cataloging information. Such a structure would extend the scope of the Standard beyond where we feel it should go at present. Since we are utilizing the machinery of SGML to implement this Standard, another committee could easily create such a complete bibliographic element, and it could be readily included in music documents. We in fact strongly urge the Library community to initiate such a project. "> %d.bib; 12.1 Theme The theme will contain references to the core which pinpoint key pas- sages (or famous passages) for the purpose of identification of the work. It will allow a cataloging application, for instance, to quickly locate and then display or perform a well known section. This will make it easy for the user to verify that the correct piece has been retrieved. 13 Conformance The Standard will define several levels of conformance to allow June 16, 1988 - 29 - applications to implement subsets of the language. There will be a canonical form and a "standard" level described. NOTE: The issue of conformance will be developed further at a later date. June 16, 1988 - 30 - Annex A Formal Definition (This annex is normative and will become an integral part of the Standard.) This annex contains the formal definition of a work, expressed as an SGML document type definition. NOTES Because the SMDL is still under development, the SGML document type defini- tion (DTD) is presently incomplete in a number of respects. These are listed below, and will be updated with the SGML Formal Definition. 1. Little detail is provided on the actual encoding of an instance of a work. As we are first attempting to identify the potential events and to define their properties (attributes), the DTD acts as though all events will be encoded with start-tags and end-tags, with all properties specified using the SGML attribute notation. The result is a lopsided definition in which there is only structure and no actual data. This convention is satisfactory (even advantageous) while we are designing the structure and semantics of the SMDL. It allows relationships to be seen easily and requirements to be evaluated, without the intrusion of cod- ing considerations. Once the design is complete and we understand all of the information that must be represented, we will create a concise coding scheme to replace the lower levels of the structure. (In SGML, such a scheme is known as a "data content notation".) 2. Most attributes have not yet been defined. As a result, many of the ATTLIST declarations appear identical to one another. In such cases, we expect that the lists will be differentiated by attributes that will be defined later. 3. The lowest-level gestural, visual, and analytical event elements (ge, ve, and ae) are temporary placeholders for lists of distinct element types (for example, bar lines, clefs, etc.). Eventually, the entity refer- ences to lists of the distinct types will be completed to replace these element names. For now, only the lowest-level core events have been defined. Moreover, as we are first attempting to define those attributes which all events have in common, a single ATTLIST is used in each domain. Eventu- ally, each event may have its own ATTLIST declaration. 4. There are many elements that have common content models and, at least for the moment, common attribute lists. As a matter of development methodology, we felt it better to assume that elements that represent dif- ferent semantic constructs (e.g., tracks and parts) are likely to have dif- ferent attributes when the design is complete, even though they may have identical structures. If the presumption proves incorrect in any instance, we will of course remove the redundancy when finalizing the design, but premature optimization might cause us to overlook vital differences. June 16, 1988 - 31 - 5. For attributes that have been defined, particularly those whose domain is a list of specific values, we have typically provided only a nom- inal list of values. We expect that once the overall structure is firm, experts will be able to contribute more complete lists. Such attribute domains can also be made user-extensible if that is desirable. June 16, 1988 - 32 - June 16, 1988 - 33 - "> %d.chord; June 16, 1988 - 34 - June 16, 1988 - 35 - June 16, 1988 - 36 - June 16, 1988 - 37 - June 16, 1988 - 38 - "> %d.bib; June 16, 1988 - 39 - June 16, 1988 - 40 - Annex B Theory of Use (This annex is informative and will not form an integral part of the Stan- dard.) As a language, the Standard can be put to a wide variety of uses ranging >from the highly appropriate to the completely pathological. It was, how- ever, designed with a particular set of applications in mind, and will be most effective if used for these. Knowing the design assumptions will also facilitate application of the Standard to unforeseen or unusual situations. It is hoped that this annex will answer many of the questions that will arise concerning applicability. NOTE: This section will be developed further at a later date. B.1 General Application In general, the Standard is intended as a storage and interchange for- mat for musical ideas. It is designed to be somewhat human readable so that a piece could theoretically be created by using a word processor and entering the encoded material directly. However, it is expected that it will be used mainly for automated processing in such areas as music printing, library cataloging and storage, multimedia presenta- tions, teaching, and research. For other situations, such as live per- formance or sound recording, other formats are likely to be more applicable. A piece to be represented can originate from almost any source. An automated composition program might generate a core and an associated gestural section. An interactive music printing system might generate a core and a visual section. A sequencer might capture a live perfor- mance and transcribe it into a core and performance section, and then turn the piece over to a music printing system for the creation of the visual and analytical sections. There is much flexibility in the way the Standard can be used and the situations to which it can be applied. The only common element is the core, the others need not even be present. The gestural section is designed to be used for the representation of computer instrument sequences. This does not mean that it is a sequencer format for internal use by sequencers. In fact it would be poorly suited for that application. It is for archiving and transport- ing music that has been, or will be, processed in some way by a com- puter system. A performance may be captured on a synthesizer, it may be interpreted from a MIDI stream, or it may be translated from another language, such as a MIDI sequence file format or MUSIC V. A sequencer might read a piece in the Standard, translate it into an internal data format, and then realize it in real time. The visual section will be used for representing scores of all kinds. The score may have an accompanying performance or it may not. The score may be entered or captured using a music printing system, or it June 16, 1988 - 41 - may be translated from DARMS or MUSTRAN. It might be retrieved as a display on a screen, a printed page, or translated into another language. Most importantly it will allow systems of all kinds to interchange scores easily and accurately. The analytical section will be used to represent theoretical ideas in a structural format. Any sort of layering and grouping will be possi- ble, so various styles of analysis will be supported. A given piece may have several analyses (i.e. one Shenkerian, one classical), which could even refer to each other. An analysis of a piece with a circular score could refer to the score and the performance in an attempt to relate the music to the shape of the score to the vertiginous effect on the performer. June 16, 1988 - 42 - Annex C Explanation of Editorial Conventions (This annex is informative and will not form an integral part of the Stan- dard.) This document observes some of the editorial conventions of a for- mal standard, but not yet with the strictness and consistency that will be required in the final document. Those conventions that are observed in this revision are listed. C.1 Definitions Definitions are contained in a separate clause. Ours is presently incomplete and will probably remain that way for a while. Also, some of the definitions in it are not as precise as they should be. When the clause is complete, the definitions will refer to one another in a top-down hierarchical order, without tautologies, and will define each term fully. C.2 Structure Part Two is structured like a standard in that it observes the follow- ing conventions: Clause 0 is an informative introduction (that is, it does not contain requirements.) Clause 1 states what the standard includes, and its expected uses. Clause 3 contains references to related standards. Clause 4 contains the definitions. Clause 5 describes the notational conventions used in the remain- ing clauses. The clauses from 6 until the end contain the actual requirements. There are also annexes (appendixes) containing information that was segregated from the body of the standard for convenience. C.3 Segregation Requirements are distinguished from definitions, examples, and expla- natory notes and comments. Anything identified as a "NOTE" is there to aid in understanding the standard, but does not change the requirements. At present, we also use notes to discuss matters relating to the development of the stan- dard. June 16, 1988 - 43 - Annexes are designated either "normative" or "informative". The former contain requirements and have the same force and effect as if they were in the body of the standard. The latter are extended notes or tutorial information. C.4 Language Some words have formal implications that may differ from casual usage. Those that are used in this document are as follows: C.4.1deprecated: Technically allowed, but only in rare situations a sensible thing to do. The opposite of "should". C.4.2must: Required by the language; unavoidable. C.4.3shall: Required by definition. (But not necessarily unavoid- able syntactically or semantically in the language.) C.4.4should: Recommended, but not mandatory. The opposite of "deprecated." (Within a requirement, it is used in place of "shall" where there is some rare situation in which it wouldn't work or where it was too burdensome to check for compliance.) June 16, 1988 - 44 - Annex D Guide to SGML Notation (This annex is informative and will not form an integral part of the Stan- dard.) For those unfamiliar with SGML, the following brief explanation will assist in understanding the code that appears in this document. For a more in- depth explanation, the ISO standard (ISO 8879-1986) is the definitive tutorial and reference on the subject. NOTE: This description is currently "brief" to the point of opacity. We plan to expand this section at a later date. D.1 Structure SGML consists of three basic structural components. It is the usual intent that these structures will contain data, but in our application there is only structure for the moment. Elements are structural build- ing blocks which can be defined to contain data or other elements. An attribute list is associated with an element and contains values which describe the element. Entities are a structural tool which allow por- tions of code to be referenced by a label from one or more places in the code. D.2 Punctuation There are several punctuation marks that are important. Declarations (definitions) are surrounded by and comments to the reader are surrounded by -- ... --. For the purposes of this document, the marks - - and -O can be ignored. In each declaration, the following marks may occur: , this followed by the next, & this and the next, | this or the next, ? optional, + one or more, * zero or more. June 16, 1988 - 45 - Annex E Status Report (This annex is informative and will not form an integral part of the Stan- dard.) In the first meetings, the committee concentrated on the overall structure of the SMDL. Many issues were touched upon to ensure that the basic design would be flexible and powerful enough to handle the wide range of material demanded by the requirements specification. More recently, the concentration has focused on the core and related issues, as this seemed the logical starting place. Subsequent work will have to build from a basically finished core section. As of the most recent meeting (February 1 - 4, 1988) we have developed the core substantially, although some work remains. We expect to finish this section at the next meeting, and proceed to the gestural and visual sections. It is assumed that further revisions of the core will be necessary after development of the other sections. Because the work has focused on a particular area, the preceding document is uneven. Some areas have been discused down to minute detail, and some are as yet merely suggestions of the direction in which to proceed. In par- ticular the core section is considerably fleshed out, but the others are unfinished. As the meetings continue, we expect this document (parts 1 and 2) to grow into a Draft Standard which will be complete in all areas. %%% 30 %%% June 16, 1988