Overview

From a newspaper article about analyzing amazon e-book sales by genre and publisher. Unfortunately, they do not have information on the book's title or author. This collection includes 54,000 titles spanning across several genres and types of publishing companies, practically every book on every Amazon bestseller list. Along with publisher information, it also includes the book's overall Amazon Kindle store sales ranking.

http://authorearnings.com/report/september-2015-author-earnings-report/

Explore Structure




Index Type Example Value
0 dict { }
... ... ...
Key Type Example Value Comment
"type" str "big five" The type, or size, of the publisher (e.g, "indie", "small", "big five", etc.)
"name" str "Katherine Tegen Books" The name of the publisher.
Key Type Example Value Comment
"genre" str "genre fiction" The genre of the book, either "fiction", "non-fiction", "genre fiction", "childrens", "comics", or "foreign language". Some books originally had more than one genre, but this was simplified down to the most prominent genre.
"publisher" dict { }
"sold by" str "HarperCollins Publishers" The actual company that sold this book, as oppposed to the company that published it.
"statistics" dict { }
"daily" dict { }
Key Type Example Value Comment
"publisher revenue" float 20496.0 The amount of money that the publisher made per day on this book, in dollars.
"amazon revenue" float 6832.0 The amount of money that Amazon made per day on this book, in dollars.
"author revenue" float 6832.0 The amount of money that the author made per day on this book, in dollars.
"units sold" int 7000 The number of books sold per day.
"gross sales" float 34160.0 The total amount of money that was made per day on this book, in dollars.
Key Type Example Value Comment
"sales rank" int 1 The ranking of this book, compared to other books. A higher rank indicates a more popular book.
"sale price" float 4.88 The cost of this book on Amazon, in dollars.
"total reviews" int 9604 The number of people who have reviewed this book.
"average rating" float 4.57 The average rating of this book, on a five-point scale, as determined by customers.

Downloads

Download all of the following files.

Usage

This library has 1 function you can use.
import publishers
list_of_book = publishers.get_books()
Additionally, some of the functions can return a sample of the Big Data using an extra argument. If you use this sampled Big Data, it may be much faster. When you are sure your code is correct, you can remove the argument to use the full dataset.
import publishers
# These may be slow!
list_of_book = publishers.get_books(test=True)

Documentation

 publishers.get_books(test=False)

Returns a list of the books in the database.