This library comes from the Million Song Dataset, which used a company called the Echo Nest to derive data points about one million popular contemporary songs. The Million Song Dataset is a collaboration between the Echo Nest and LabROSA, a laboratory working towards intelligent machine listening. The project was also funded in part by the National Science Foundation of America (NSF) to provide a large data set to evaluate research related to algorithms on a commercial size while promoting further research into the Music Information Retrieval field. The data contains standard information about the songs such as artist name, title, and year released. Additionally, the data contains more advanced information; for example, the length of the song, how many musical bars long the song is, and how long the fade in to the song was.

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere.
The Million Song Dataset. In Proceedings of the 12th International Society
for Music Information Retrieval Conference (ISMIR 2011), 2011.

Explore Structure

Index Type Example Value
0 dict { }
... ... ...
Key Type Example Value Comment
"release" dict { }
"artist" dict { }
"song" dict { }
Key Type Example Value Comment
"terms_freq" float 1.0
"terms" str "hip hop"
"name" str "Casual"
"familiarity" float 0.581793766
"longitude" float -63.9333578685
"id" str "ARD7TVE1187B99BFB1"
"location" str "California - LA"
"latitude" float 37.1573567501
"similar" str "ARV4KO21187FB38008"
"hotttnesss" float 0.401997543
Key Type Example Value Comment
"key" float 1.0
"mode_confidence" float 0.636
"artist_mbtags_count" float 0.0
"key_confidence" float 0.736
"tatums_start" float 0.28519
"year" int 0
"duration" float 218.93179
"hotttnesss" float 0.60211999
"beats_start" float 0.58521
"time_signature_confidence" float 0.778
"title" str "I Didn't Mean To"
"bars_confidence" float 0.643
"id" str "SOMZWCG12A8C13C480"
"bars_start" float 0.58521
"artist_mbtags" str ""
"start_of_fade_out" float 218.932
"tempo" float 92.198
"end_of_fade_in" float 0.247
"beats_confidence" float 0.834
"tatums_confidence" float 0.779
"mode" int 0
"time_signature" float 4.0
"loudness" float -11.197
Key Type Example Value Comment
"id" int 300848
"name" str "Fear Itself"


Download all of the following files.


This library has 3 functions you can use.
import music
a_music = music.get_song_by_name("I Didn't Mean To")
list_of_music = music.get_songs_by_artist("Aerosmith")
list_of_music = music.get_songs()
Additionally, some of the functions can return a sample of the Big Data using an extra argument. If you use this sampled Big Data, it may be much faster. When you are sure your code is correct, you can remove the argument to use the full dataset.
import music
# These may be slow!
list_of_music = music.get_songs_by_artist("Aerosmith", test=True)
list_of_music = music.get_songs(test=True)



Given the title of a song, returns information about the song.

 music.get_songs_by_artist(artist, test=False)

Given the name of an artist, returns all the songs by that artist in the database.


Gets a list of all the songs in the database.