MARD contains texts and accompanying metadata originally obtained from a much larger dataset of Amazon customer reviews, which have been enriched with music metadata from MusicBrainz, and audio descriptors from AcousticBrainz. MARD amounts to a total of 65,566 albums and 263,525 customer reviews. A breakdown of the number of albums per genre is provided here: 

Genre Amazon MusicBrainz AcousticBrainz
Alternative Rock 2,674 1,696 564
Reggae 509 260 79
Classical 10,000 2,197 587
R\&B 2,114 2,950 982
Country 2,771 1,032 424
Jazz 6,890 2,990 863
Metal 1,785 1,294 500
Pop 10,000 4,422 1701
New Age 2,656 638 155
Dance & Electronic 5,106 899 367
Rap & Hip-Hop 1,679 768 207
Latin Music 7,924 3,237 425
Rock 7,315 4,100 1482
Gospel 900 274 33
Blues 1,158 448 135
Folk 2,085 848 179
Total 66,566 28,053 8,683

 

The dataset contains 2 files and 1 folder: 

mard_metadata.json: In this file there is an entry per album. Depending on the album, the following fields may be available or not. The list of possible fields per entry: 

Source: Amazon 
amazon-id: The Amazon product id. You can visualize the album page in amazon adding this id to the following url "www.amazon.com/dp/"
artist: The artist name as it appears in Amazon
title: The album title as it appears in Amazon related:
   also bought: Other products bought by people who bought this album
   buy_after_viewing: Other products bought by people after viewing this album
price: The album price
label: The record label of the album
categories: The genre categories in Amazon
sales_rank: Ranking in the Amazon music sales rank
imUrl: Url of the album cover
artist_url: The url of the artist page in amazon. You must add "www.amazon.com" at the beginning to access this page
root-genre: The root genre category of the album, extracted from the categories field. 

Source: MusicBrainz 
artist-mbid: The MusicBrainz ID of the artist 
first-release-year: The year of first publication of the album 
release-group-mbid: The MusicBrainz ID of the release group mapped to this album title 
release-group: The MusicBrainz ID of the first release in the release-group of this album, used to extract the tracks info 
songs: List of tracks in the album 
    ftitle: Title of the track 
    fmbid: MusicBrainz recording ID of the track. Used to map with AcousticBrainz. 

acousticbrainz_descriptors/ This folder contains one file per album with the audio descriptors of its first track. In the folder, each filename has the format amazonid_mbid.json. Using the song mbid, it is possible to get all other album songs related acoustic descriptors from the AcousticBrainz web service. The number of songs with information in AcousticBrainz is constantly growing, so the numbers indicated in the Table represents the availability in the moment of the dataset creation. 

mard_reviews.json The file containing the reviews text with some metadata associated to each review. Each entry has the following fields: reviewerID, amazon-id, reviewerName, helpful, reviewText, overall, summary, unixReviewTime, reviewTime. 

Each line in these 2 files is a json dictionary. To load the file you must read a line and then eval the json expression. Python code example: 

fname = "mard_metadata.json" 
with open(fname, 'r') as f: 
    for line in f: 
        data = eval(line)