From a newspaper article about analyzing amazon e-book sales by genre and publisher. Unfortunately, they do not have information on the book's title or author. This collection includes 54,000 titles spanning across several genres and types of publishing companies, practically every book on every Amazon bestseller list. Along with publisher information, it also includes the book's overall Amazon Kindle store sales ranking. This ranking is used to sort the books. Keep in mind that this data is NOT time-oriented; it is a collection of a bunch of different books, not a book over time.


Explore Structure

Index Type Example Value
0 dict { }
... ... ...
Key Type Example Value Comment
"type" str "big five"
[Preview ]
The type, or size, of the publisher (e.g, "indie", "small", "big five", etc.)
"name" str "Katherine Tegen Books"
[Preview ]
The name of the publisher.
Value Count
"small/medium" 9741
"big five" 7309
"indie" 5946
"single author" 3608
"amazon" 423
Value Count
"(Small or Medium Publisher)" 9741
"(Indie Publisher)" 5946
"(Single-Author Publisher)" 3608
"HarperCollins e-books" 389
"HarperCollins" 253
"Simon & Schuster" 209
"Random House Books for Young Readers" 202
"Vintage" 190
"Berkley" 181
"Ballantine Books" 164
"Little, Brown and Company" 161
"St. Martin's Press" 159
"Bantam" 150
"Penguin Books" 149
"Scribner" 125
"Grosset & Dunlap" 111
"Thomas & Mercer" 110
"Grand Central Publishing" 107
"Puffin" 103
"Atria Books" 100
"Thomas Nelson" 93
"Farrar, Straus and Giroux" 90
"Random House" 87
"Aladdin" 86
"William Morrow" 84
"Tor Books" 84
"Portfolio" 83
"Montlake Romance" 78
"Anchor" 77
"47North" 77
"Pocket Books" 75
"Touchstone" 73
"Broadway Books" 70
"Zondervan" 70
"LucasBooks" 69
"Free Press" 68
"Little, Brown Books for Young Readers" 66
"Signet" 65
"Two Lions" 64
"Gallery Books" 64
"Hyperion" 63
"Pocket Books/Star Trek" 61
"Harmony" 61
"Harper" 61
"Plume" 57
"Crown Business" 55
"Delacorte Press" 54
"Putnam Adult" 54
"Orbit" 51
"Knopf" 51
"Ace" 50
"HarperOne" 49
"Dell" 49
"Riverhead" 48
"Del Rey" 48
"Spectra" 47
"Minotaur Books" 47
"William Morrow Paperbacks" 46
"Henry Holt and Co." 46
"Simon & Schuster Books for Young Readers" 46
"Ten Speed Press" 44
"St. Martin's Griffin" 43
"Jove" 42
"Thomas Dunne Books" 39
"Viking Adult" 38
"Crown" 37
"Clarkson Potter" 37
"Penguin" 37
"Atheneum Books for Young Readers" 36
"NAL" 36
"Knopf Books for Young Readers" 35
"Avon" 34
"Dutton Adult" 32
"AmazonCrossing" 32
"Random House Trade Paperbacks" 31
"HarperTeen" 30
"Ecco" 29
"It Books" 28
"Gotham Books" 28
"The Penguin Press" 28
"Crown Archetype" 28
"Zonderkidz" 27
"Palgrave Macmillan" 27
"DK Publishing" 27
"Doubleday" 26
"Amazon Publishing" 26
"Holt Paperbacks" 25
"HarperBusiness" 25
"Howard Books" 24
"Entangled: Brazen" 23
"Forever" 23
"Spiegel & Grau" 22
"Tarcher" 22
"Simon Spotlight" 22
"Harper Element" 22
"Margaret K. McElderry Books" 21
"Forge Books" 21
"Harper Perennial" 21
"St. Martin's Paperbacks" 21
"Three Rivers Press" 21
... ...
Key Type Example Value Comment
"sales rank" int 1 The ranking of this book, compared to other books. A higher rank indicates a more popular book.
"sale price" float 4.88 The cost of this book on Amazon, in dollars.
"total reviews" int 9604 The number of people who have reviewed this book.
"average rating" float 4.57 The average rating of this book, on a five-point scale, as determined by customers.
Key Type Example Value Comment
"genre" str "genre fiction"
[Preview ]
The genre of the book, either "fiction", "nonfiction", "genre fiction", "children", "comics", or "foreign language". Some books originally had more than one genre, but this was simplified down to the most prominent genre.
"publisher" dict { }
"sold by" str "HarperCollins Publishers"
[Preview ]
The actual company that sold this book, as oppposed to the company that published it.
"statistics" dict { }
"daily average" dict { }
Value Count
"nonfiction" 14161
"genre fiction" 8903
"children" 2541
"fiction" 733
"comics" 568
"foreign language" 121
Value Count
"Amazon Digital Services, Inc." 19548
"Random House LLC" 1964
"Penguin Group (USA) LLC" 1411
"HarperCollins Publishers" 1244
"Simon and Schuster Digital Sales Inc" 1108
"Macmillan" 777
"Hachette Book Group" 608
"DC Comics" 125
"HarperCollins Publishing" 99
"HarperCollins Christian Publishing" 96
"Idea & Design Works" 35
"Cengage Learning" 9
"Random House Mondadori" 2
"RCS MediaGroup S.p.A." 1
Key Type Example Value Comment
"publisher revenue" float 20496.0 The amount of money that the publisher made per day on this book, in dollars.
"amazon revenue" float 6832.0 The amount of money that Amazon made per day on this book, in dollars.
"author revenue" float 6832.0 The amount of money that the author made per day on this book, in dollars.
"gross sales" float 34160.0 The total amount of money that was made per day on this book, in dollars.
"units sold" int 7000 The number of books sold per day.


Download all of the following files.


This library has 1 function you can use.
import publishers
list_of_book = publishers.get_books()
Additionally, some of the functions can return a sample of the Big Data using an extra argument. If you use this sampled Big Data, it may be much faster. When you are sure your code is correct, you can remove the argument to use the full dataset.
import publishers
# These may be slow!
list_of_book = publishers.get_books(test=True)



Returns a list of the books in the database.