Version DOI GitHub repo size GitHub all releases License

The Annotated Mozart Sonatas: Score, Harmony, and Cadence#

Scores, chord labels and cadence labels for Mozart’s 18 piano sonatas, following the Neue Mozart Ausgabe.

This dataset is accompanied by the data report Hentschel, J., Neuwirth, M. and Rohrmeier, M., 2021. The Annotated Mozart Sonatas: Score, Harmony, and Cadence. Transactions of the International Society for Music Information Retrieval, 4(1), pp.67–80. DOI: http://doi.org/10.5334/tismir.63

Changelog#

Version 2.0#

Changes to harmonize with other DCML corpora#

  • Eenamed folder scores to MS3 (7549f6a)

  • Extracted facets and metadata with ms3 1.0.1 (9eb9fe3)

    • TSV files now come with the column quarterbeats, which measures in quarter notes each event’s position as its distance from the beginning

    • The extracted harmony labels in the folder harmonies are expanded into feature columns by default.

    • Extracted notes now come with the columns name and octave.

    • Column volta (containing first and second endings) removed from pieces that don’t have any.

    • metadata.tsv has been enriched with further columns, in particular information about each movement’s dimensions, including dimensions upon unfolding repeats (for instance, last_mn has the number of measures, last_mn_unfolded the number of measures when playing all repeats)

    • The folder reviewed contains two files per movement:

      • A copy of the score where all out-of-label notes have been colored in red; additionally, modified labels ( w.r.t. v1.0) are shown in these files in a diff-like manner (removed in red, added in green).

      • A copy of the harmonies TSV with six added columns that reflect the coloring of out-of-label notes (“coloring reports”)

    • As long as the ms3 review has any complaints, it stores them in the file warnings.log. Currently, it is showing those labels where over 60% of the notes in the segment have been colored in red and probably need revisiting ( Pull Requests welcome)

  • Score updated using ms3 update (13dfb6d)

    • Files updated to MuseScore 3.6.2

    • All labels moved from the chord layer of staff 1 to the Roman Numeral Analysis layer of staff 2. This changes how they are displayed and eliminates the requirement to prepend a full stop to labels starting with a note name.

  • Cadence labels now integrated with harmony labels as per DCML harmony annotation standard 2.3.0 (1c290e8)

  • TSV files are automatically kept up to date using the dcml_corpus_workflow (c203595)

Changes to the content#

  • Made phrase annotations consistent by adding missing curly brackets. (9f10fc0)

  • Introduced first and second endings at the beginning of K311-2 in order to introduce an EC label on the repetition of bar 1.

  • Fixed repeat structure in da capo movements K282-2 and K331-2 for correct unfolding (b7271da..0e9f060)

  • updated labels of K283-3 (f1fe032)

  • corrected scores in a few places (b6aa4f1, 438acb0)

Removed mozart_loader.py#

The functionality of the loader has been superseded by the ms3 parsing library. Once installed (pip install ms3), you’ll have several commands on your hands, one of which is ms3 transform. For example, head to the folder with the dataset and type ms3 transform -N to create the concatenated note list. ms3 transform -h will show all options.

Getting the Data#

First, create a local copy of this repository, either by using the command

git clone https://github.com/DCMLab/mozart_piano_sonatas.git

or by unpacking this ZIP file.

Data Formats#

Every sonata movement is represented by five files with identical filenames in five different folders. For example, the first movement of the first sonata K. 279 has the following files:

  • MS3/K279-1.mscx: Uncompressed MuseScore file including the music and harmony labels.

  • notes/K279-1.tsv: A table of all note heads contained in the score and their relevant features (not each of them represents an onset, some are tied together)

  • measures/K279-1.tsv: A table with relevant information about the measures in the score.

  • harmonies/K279-1.tsv: A list of the included harmony labels (including cadences and phrases) with their positions in the score.

Opening Scores#

After navigating to your local copy, you can open the scores in the folder MS3 with the free and open source score editor MuseScore.

Opening TSV files in a spreadsheet#

Tab-separated value (TSV) files are like Comma-separated value (CSV) files and can be opened with most modern text editors. However, for correctly displaying the columns, you might want to use a spreadsheet or an addon for your favourite text editor. When you use a spreadsheet such as Excel, it might annoy you by interpreting fractions as dates. This can be circumvented by using Data --> From Text/CSV or the free alternative LibreOffice Calc. Other than that, TSV data can be loaded with every modern programming language.

Loading TSV files in Python#

Since the TSV files contain null values, lists, fractions, and numbers that are to be treated as strings, you may want to use this code to load any TSV files related to this repository (provided you’re doing it in Python). After a quick pip install -U ms3 (requires Python 3.10) you’ll be able to load any TSV like this:

import ms3

labels = ms3.load_tsv('harmonies/K283-1.tsv')
notes = ms3.load_tsv('notes/K283-1.tsv')

How to read metadata.tsv#

This section explains the meaning of the columns contained in metadata.tsv.

File information#

column

content

fname

name without extension (for referencing related files)

rel_path

relative file path of the score, including extension

subdirectory

folder where the score is located

last_mn

last measure number

last_mn_unfolded

number of measures when playing all repeats

length_qb

length of the piece, measured in quarter notes

length_qb_unfolded

length of the piece when playing all repeats

volta_mcs

measure counts of first and second endings

all_notes_qb

summed up duration of all notes, measured in quarter notes

n_onsets

number of note onsets

n_onset_positions

number of unique not onsets (“slices”)

Composition information#

column

content

composer

composer name

workTitle

full sonata title

composed_start

earliest composition date

composed_end

latest composition date

workNumber

Köchel number

movementNumber

1, 2, or 3

movementTitle

title of the movement

Score information#

column

content

label_count

number of chord labels

KeySig

key signature(s) (negative = flats, positive = sharps)

TimeSig

time signature(s)

musescore

MuseScore version

source

URL to the first typesetter’s file

typesetter

first typesetter

annotator

creator of the chord labels

reviewers

reviewers of the chord labels

Identifiers#

These columns provide a mapping between multiple identifiers for the sonatas (not for individual movements).

column

content

wikidata

URL of the WikiData item

viaf

URL of the Virtual International Authority File (VIAF) entry

musicbrainz

MusicBrainz identifier

imslp

URL to the wiki page within the International Music Score Library Project (IMSLP)

Generating all TSV files from the scores#

When you have made changes to the scores and want to update the TSV files accordingly, you can use the following command (provided you have pip-installed ms3):

ms3 extract -M -N -X -D # for measures, notes, expanded annotations, and metadata

If, in addition, you want to generate the reviewed scores with out-of-label notes colored in red, you can do

ms3 review -M -N -X -D # for extracting measures, notes, expanded annotations, and metadata

By adding the flag -c to the review command, it will additionally compare the (potentially modified) annotations in the score with the ones currently present in the harmonies TSV files and reflect the comparison in the reviewed scores.

Questions, Suggestions, Corrections, Bug Reports#

Please create an issue and feel free to fork and submit pull requests.

License#

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)

Overview#

file_name

measures

labels

annotators

reviewers

K279-1

100

251

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K279-2

74

156

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K279-3

158

321

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K280-1

144

225

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K280-2

60

124

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K280-3

190

199

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K281-1

109

208

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K281-2

106

153

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K281-3

162

384

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K282-1

36

104

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K282-2

72

129

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K282-3

102

176

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K283-1

120

326

Tal Soker

Johannes Hentschel, Markus Neuwirth

K283-2

39

169

Tal Soker

Johannes Hentschel, Markus Neuwirth

K283-3

277

337

Tal Soker

Johannes Hentschel, Markus Neuwirth

K284-1

127

330

Tal Soker

Johannes Hentschel, Markus Neuwirth

K284-2

92

228

Tal Soker

Johannes Hentschel, Markus Neuwirth

K284-3

260

755

Adrian Nagel

Johannes Hentschel, Markus Neuwirth

K309-1

155

307

Tal Soker

Johannes Hentschel, Markus Neuwirth

K309-2

79

259

Tal Soker

Johannes Hentschel, Markus Neuwirth

K309-3

252

406

Tal Soker

Johannes Hentschel, Markus Neuwirth

K310-1

133

292

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K310-2

86

252

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K310-3

252

428

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K311-1

112

319

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K311-2

93

241

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K311-3

269

491

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K330-1

150

293

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K330-2

64

187

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K330-3

171

365

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K331-1

134

399

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K331-2

100

160

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K331-3

127

128

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K332-1

229

316

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K332-2

40

168

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K332-3

245

449

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K333-1

165

431

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K333-2

82

217

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K333-3

224

460

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K457-1

185

308

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K457-2

57

214

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K457-3

319

328

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K533-1

239

584

Adrian Nagel

Johannes Hentschel, Markus Neuwirth

K533-2

122

261

Adrian Nagel

Johannes Hentschel, Markus Neuwirth

K533-3

187

423

Adrian Nagel

Johannes Hentschel, Markus Neuwirth

K545-1

73

119

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K545-2

74

146

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K545-3

73

143

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K570-1

209

245

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K570-2

55

250

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K570-3

89

281

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K576-1

160

295

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K576-2

67

151

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

K576-3

189

381

Uli Kneisel

Johannes Hentschel, Markus Neuwirth

Overview table updated using ms3 1.1.0.

Further information: