Overview

Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, to 'encourage the creation and distribution of eBooks'. It was founded in 1971 by Michael S. Hart and is the oldest digital library. This dataset is a collection of the top 1000 most popular books on Project Gutenberg, as determined by downloads. Each book has information about its authorship, publication date, congressional classication, and a few other fields. It also has some simple, computed statistics based on common metrics such as sentiment analysis, Flesch Kincaid Reading level, and average sentence length.

https://www.gutenberg.org/ebooks/search/?sort_order=downloads

Explore Structure




Index Type Example Value
0 str "text/plain"
... ... ...
Index Type Example Value
0 dict { }
... ... ...
Index Type Example Value
0 str "PR"
... ... ...
Index Type Example Value
0 str "Sisters -- Fiction"
... ... ...
Index Type Example Value
0 str "en"
... ... ...
Key Type Example Value Comment
"url" str "https://www.gutenberg.org/ebooks/1342"
[Preview ]
"downloads" int 36576 The number of times this book has been downloaded from Project Gutenberg, as of the last update (circa Spring 2016).
"id" int 1342 Every book on Project Gutenberg has a unique ID number. You can use this number to check the book on project gutenberg (e.g., book 110 is http://www.gutenberg.org/ebooks/110).
"rank" int 1 The rank of this book in comparison to other books on Gutenberg, measured by number of downloads. A lower rank indicatest that that book is more popular.
"formats" dict { }
Value Count
"https://www.gutenberg.org/ebooks/2344" 1
"https://www.gutenberg.org/ebooks/2397" 1
"https://www.gutenberg.org/ebooks/12956" 1
"https://www.gutenberg.org/ebooks/14021" 1
"https://www.gutenberg.org/ebooks/689" 1
"https://www.gutenberg.org/ebooks/1937" 1
"https://www.gutenberg.org/ebooks/45631" 1
"https://www.gutenberg.org/ebooks/14407" 1
"https://www.gutenberg.org/ebooks/51290" 1
"https://www.gutenberg.org/ebooks/6686" 1
"https://www.gutenberg.org/ebooks/6688" 1
"https://www.gutenberg.org/ebooks/34099" 1
"https://www.gutenberg.org/ebooks/6867" 1
"https://www.gutenberg.org/ebooks/805" 1
"https://www.gutenberg.org/ebooks/5827" 1
"https://www.gutenberg.org/ebooks/34856" 1
"https://www.gutenberg.org/ebooks/6626" 1
"https://www.gutenberg.org/ebooks/2097" 1
"https://www.gutenberg.org/ebooks/51294" 1
"https://www.gutenberg.org/ebooks/51306" 1
"https://www.gutenberg.org/ebooks/51428" 1
"https://www.gutenberg.org/ebooks/51304" 1
"https://www.gutenberg.org/ebooks/51305" 1
"https://www.gutenberg.org/ebooks/51478" 1
"https://www.gutenberg.org/ebooks/2199" 1
"https://www.gutenberg.org/ebooks/22400" 1
"https://www.gutenberg.org/ebooks/271" 1
"https://www.gutenberg.org/ebooks/11339" 1
"https://www.gutenberg.org/ebooks/51475" 1
"https://www.gutenberg.org/ebooks/38594" 1
"https://www.gutenberg.org/ebooks/51477" 1
"https://www.gutenberg.org/ebooks/51308" 1
"https://www.gutenberg.org/ebooks/51473" 1
"https://www.gutenberg.org/ebooks/3021" 1
"https://www.gutenberg.org/ebooks/22381" 1
"https://www.gutenberg.org/ebooks/3742" 1
"https://www.gutenberg.org/ebooks/3743" 1
"https://www.gutenberg.org/ebooks/19033" 1
"https://www.gutenberg.org/ebooks/470" 1
"https://www.gutenberg.org/ebooks/51398" 1
"https://www.gutenberg.org/ebooks/2048" 1
"https://www.gutenberg.org/ebooks/876" 1
"https://www.gutenberg.org/ebooks/51397" 1
"https://www.gutenberg.org/ebooks/8419" 1
"https://www.gutenberg.org/ebooks/51391" 1
"https://www.gutenberg.org/ebooks/51393" 1
"https://www.gutenberg.org/ebooks/51392" 1
"https://www.gutenberg.org/ebooks/15489" 1
"https://www.gutenberg.org/ebooks/11505" 1
"https://www.gutenberg.org/ebooks/51296" 1
"https://www.gutenberg.org/ebooks/2488" 1
"https://www.gutenberg.org/ebooks/1946" 1
"https://www.gutenberg.org/ebooks/575" 1
"https://www.gutenberg.org/ebooks/1028" 1
"https://www.gutenberg.org/ebooks/12655" 1
"https://www.gutenberg.org/ebooks/27424" 1
"https://www.gutenberg.org/ebooks/5219" 1
"https://www.gutenberg.org/ebooks/18338" 1
"https://www.gutenberg.org/ebooks/37134" 1
"https://www.gutenberg.org/ebooks/16269" 1
"https://www.gutenberg.org/ebooks/1026" 1
"https://www.gutenberg.org/ebooks/51426" 1
"https://www.gutenberg.org/ebooks/7256" 1
"https://www.gutenberg.org/ebooks/51297" 1
"https://www.gutenberg.org/ebooks/51317" 1
"https://www.gutenberg.org/ebooks/15263" 1
"https://www.gutenberg.org/ebooks/3328" 1
"https://www.gutenberg.org/ebooks/28696" 1
"https://www.gutenberg.org/ebooks/29854" 1
"https://www.gutenberg.org/ebooks/2600" 1
"https://www.gutenberg.org/ebooks/2833" 1
"https://www.gutenberg.org/ebooks/3735" 1
"https://www.gutenberg.org/ebooks/28522" 1
"https://www.gutenberg.org/ebooks/28520" 1
"https://www.gutenberg.org/ebooks/16436" 1
"https://www.gutenberg.org/ebooks/14244" 1
"https://www.gutenberg.org/ebooks/25344" 1
"https://www.gutenberg.org/ebooks/731" 1
"https://www.gutenberg.org/ebooks/21765" 1
"https://www.gutenberg.org/ebooks/10625" 1
"https://www.gutenberg.org/ebooks/10623" 1
"https://www.gutenberg.org/ebooks/29765" 1
"https://www.gutenberg.org/ebooks/51331" 1
"https://www.gutenberg.org/ebooks/14328" 1
"https://www.gutenberg.org/ebooks/15237" 1
"https://www.gutenberg.org/ebooks/14323" 1
"https://www.gutenberg.org/ebooks/16966" 1
"https://www.gutenberg.org/ebooks/34632" 1
"https://www.gutenberg.org/ebooks/17489" 1
"https://www.gutenberg.org/ebooks/1200" 1
"https://www.gutenberg.org/ebooks/5322" 1
"https://www.gutenberg.org/ebooks/730" 1
"https://www.gutenberg.org/ebooks/20583" 1
"https://www.gutenberg.org/ebooks/2992" 1
"https://www.gutenberg.org/ebooks/16816" 1
"https://www.gutenberg.org/ebooks/3250" 1
"https://www.gutenberg.org/ebooks/808" 1
"https://www.gutenberg.org/ebooks/16726" 1
"https://www.gutenberg.org/ebooks/8581" 1
"https://www.gutenberg.org/ebooks/3528" 1
... ...
Key Type Example Value Comment
"polysyllables" int 4603 The number of words that have 3 or more syllables.
"characters" int 586794 Characters are letters and symbols in a text, not the number of people.
"average sentence length" float 18.0
"words" int 121533
"sentences" int 6511
"syllables" float 170648.1
"average sentence per word" float 0.05
"average letter per word" float 4.83
Key Type Example Value Comment
"metrics" dict { }
"bibliography" dict { }
"metadata" dict { }
Key Type Example Value Comment
"death" int 1817 The recorded year of the author's death. If their death year is unknown, it is replaced with "0".
"name" str "Austen, Jane"
[Preview ]
"birth" int 1775 The recorded birth year of the author. If their birth year is unknown, it is replaced with "0".
Value Count
"Unknown" 34
"Shakespeare, William" 19
"Twain, Mark" 19
"Dickens, Charles" 18
"Doyle, Arthur Conan" 14
"Wilde, Oscar" 12
"Anonymous" 11
"Austen, Jane" 11
"Plato" 11
"Chesterton, G. K. (Gilbert Keith)" 9
"Dick, Philip K." 9
"Poe, Edgar Allan" 8
"Dostoyevsky, Fyodor" 8
"Dante Alighieri" 8
"Wells, H. G. (Herbert George)" 8
"Carroll, Lewis" 7
"Kipling, Rudyard" 7
"Nietzsche, Friedrich Wilhelm" 7
"Leiber, Fritz" 6
"Ibsen, Henrik" 6
"Hardy, Thomas" 6
"Homer" 6
"James, Henry" 6
"Conrad, Joseph" 6
"Hawthorne, Nathaniel" 5
"Montgomery, L. M. (Lucy Maud)" 5
"Baum, L. Frank (Lyman Frank)" 5
"Shaw, Bernard" 5
"London, Jack" 5
"Goethe, Johann Wolfgang von" 5
"Burroughs, Edgar Rice" 5
"Verne, Jules" 5
"Tolstoy, Leo, graf" 4
"Cervantes Saavedra, Miguel de" 4
"Eliot, George" 4
"Fitzgerald, F. Scott (Francis Scott)" 4
"Darwin, Charles" 4
"Harmon, Jim" 4
"Various" 4
"Chekhov, Anton Pavlovich" 4
"Neville, Kris" 4
"Defoe, Daniel" 4
"Rousseau, Jean-Jacques" 4
"Paine, Thomas" 4
"Marlowe, Christopher" 4
"Mill, John Stuart" 4
"Stevenson, Robert Louis" 4
"Melville, Herman" 4
"Hugo, Victor" 4
"Dumas, Alexandre" 4
"Rizal, Jose" 4
"Blackwood, Algernon" 3
"Cicero, Marcus Tullius" 3
"Wharton, Edith" 3
"Eliot, T. S. (Thomas Stearns)" 3
"Moliere" 3
"Baudelaire, Charles" 3
"Aristotle" 3
"Gaskell, Elizabeth Cleghorn" 3
"Wodehouse, P. G. (Pelham Grenville)" 3
"Barrie, J. M. (James Matthew)" 3
"Tagore, Rabindranath" 3
"James, William" 3
"Freud, Sigmund" 3
"Swift, Jonathan" 3
"Shelley, Mary Wollstonecraft" 3
"Carlyle, Thomas" 3
"Andersen, H. C. (Hans Christian)" 3
"Joyce, James" 3
"Gogol, Nikolai Vasilevich" 3
"Bronte, Charlotte" 3
"Burnett, Frances Hodgson" 3
"Russell, Bertrand" 3
"Hesse, Hermann" 3
"Milton, John" 3
"Kafka, Franz" 3
"Aesop" 3
"Hume, David" 3
"Kant, Immanuel" 3
"Franklin, Benjamin" 3
"Potter, Beatrix" 2
"Virgil" 2
"Pangborn, Edgar" 2
"Mann, Thomas" 2
"Maugham, W. Somerset (William Somerset)" 2
"Malory, Thomas, Sir" 2
"Chaucer, Geoffrey" 2
"Tocqueville, Alexis de" 2
"Grimm, Wilhelm" 2
"Shaara, Michael" 2
"Irving, Washington" 2
"Vonnegut, Kurt" 2
"Leonardo, da Vinci" 2
"Forster, E. M. (Edward Morgan)" 2
"Flaubert, Gustave" 2
"Petronius Arbiter" 2
"Gibbon, Edward" 2
"Alcott, Louisa May" 2
"Gilman, Charlotte Perkins" 2
"Lang, Andrew" 2
... ...
Key Type Example Value Comment
"difficulty" dict { }
"statistics" dict { }
"sentiments" dict { }
Key Type Example Value Comment
"total" int 8 Project Gutenberg makes books available in a wide variety of file formats, including raw text files, HTML web pages, audio books, etc. This field indicates the number of ways that this book is available.
"types" list [ ]
Key Type Example Value Comment
"polarity" float 0.136713377605 Sentiment analysis attempts to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. Polarity in particular refers to how positive or negative the author is towards the content.
"subjectivity" float 0.52223914947 Sentiment analysis attempts to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. Subjectivity (as opposed to Objectivity) in particular refers to whether the text is opinionated or attempts to stay factual.
Key Type Example Value Comment
"publication" dict { }
"title" str "Pride and Prejudice"
[Preview ]
"author" dict { }
"languages" list [ ]
"subjects" list [ ]
"congress classifications" list [ ]
"type" str "Text"
[Preview ]
Value Count
"Poems" 3
"Common Sense" 2
"The Republic" 2
"Far from the Madding Crowd" 2
"The Iliad" 2
"The Hound of the Baskervilles" 2
"Paradise Lost" 2
"Hamlet, Prince of Denmark" 2
"Madame Bovary" 2
"Frankenstein; Or, The Modern Prometheus" 2
"The Wind in the Willows" 2
"The Picture of Dorian Gray" 2
"Alice's Adventures in Wonderland" 2
"Sense and Sensibility" 2
"The Secret Garden" 2
"Orthodoxy" 2
"The Art of War" 2
"Heart of Darkness" 2
"The Wonderful Wizard of Oz" 2
"The Tragical History of Doctor Faustus: From the Q..." 2
"The Scarlet Letter" 2
"The Arabian Nights Entertainments" 2
"Pride and Prejudice" 2
"The Strange Case of Dr. Jekyll and Mr. Hyde" 2
"The Jungle Book" 2
"The Return of Sherlock Holmes" 2
"The Entire Project Gutenberg Works of Mark Twain" 1
'"Everyman," with other interludes, including eight...' 1
"Wuthering Heights" 1
"Crime and Punishment" 1
"Lady Audley's Secret" 1
"Notes from the Underground" 1
"Phaedra" 1
"Dumbwaiter" 1
"The Tragedy of Julius Caesar" 1
"Mission Furniture: How to Make It, Part 3" 1
"Citizen Jell" 1
"The Awakening, and Selected Short Stories" 1
"On the Origin of Species By Means of Natural Selec..." 1
"Chain Reaction" 1
"Three Ghost Stories" 1
"The Tenant of Wildfell Hall" 1
"Emile" 1
"Leaves of Grass" 1
"End as a Hero" 1
"The Mill on the Floss" 1
"Ivanhoe: A Romance" 1
"The Spicy Sound of Success" 1
"Letters to His Son, Complete: On the Fine Art of B..." 1
"Palmistry for All" 1
"The Rubaiyat of Omar Khayyam" 1
"Essays by Ralph Waldo Emerson" 1
"Moral Equivalent" 1
"Film Truth; September, 1920" 1
"A General History of the Pyrates:: from their firs..." 1
"French Mediaeval Romances from the Lays of Marie d..." 1
"An Essay on Man; Moral Essays and Satires" 1
"Encyclopedia of Needlework" 1
"The Real Mother Goose" 1
"On the Duty of Civil Disobedience" 1
"The Moonstone" 1
"Kidnapped" 1
"The Iliads of Homer: Translated according to the Greek" 1
"Capture and Escape: A Narrative of Army and Prison..." 1
"Hawaiian Folk Tales: A Collection of Native Legends" 1
"Autobiography of Andrew Carnegie" 1
"The Iron Heel" 1
"A Study in Scarlet" 1
"Pen Pal" 1
"Beeton's Book of Needlework" 1
"Dr. Kometevsky's Day" 1
"Wives and Daughters" 1
"My Life and Work" 1
"The Beautiful and Damned" 1
"The Complete Plays of Gilbert and Sullivan" 1
"Lysistrata" 1
"The Memoirs of Jacques Casanova de Seingalt, 1725-..." 1
"Public School Life: Boys Masters Parents" 1
"Tales of the Jazz Age" 1
"The Theory of the Leisure Class" 1
"The American Occupation of the Philippines 1898-1912" 1
"The Federalist Papers" 1
'"De Bello Gallico" and Other Commentaries' 1
"The Complete Poems of Paul Laurence Dunbar" 1
"Myths & Legends of the Celtic Race" 1
"Of Human Bondage" 1
"The French Revolution: A History" 1
"History of Tom Jones, a Foundling" 1
"The Odyssey: Rendered into English prose for the u..." 1
"The Jew of Malta" 1
"Through the Looking-Glass" 1
"Hunt the Hunter" 1
"The Odyssey" 1
'"1812" Napoleon I in Russia' 1
"Eve's Diary, Complete" 1
"The Love of Monsieur" 1
"Bahnwrter Thiel" 1
"On the Nature of Things" 1
"Anne's House of Dreams" 1
"The Portrait of a Lady Volume 1" 1
... ...
Value Count
"Text" 1004
"Dataset" 1
"StillImage" 1
Key Type Example Value Comment
"flesch reading ease" float 70.13 The 'Flesch Reading Ease' uses the sentence length (number of words per sentence) and the number of syllables per word in an equation to calculate the reading ease. Texts with a very high Flesch reading Ease score (about 100) are very easy to read, have short sentences and no words of more than two syllables.
"automated readability index" float 10.7 The Automated Readability Index is a number indicating the understandability of the text. This number is an approximate US Grade Level needed to comprehend the text, calculated using the characters per word and words per sentences.
"coleman liau index" float 10.73 The Coleman Liau Index is a number indicating the understandability of the text. This number is an approximate US Grade Level needed to comprehend the text, calculated using characters instead of syllables, similar to the Automated Readability Index.
"gunning fog" float 9.2 The Gunning Fog Index measures the readability of English writing. The index estimates the years of formal education needed to understand the text on a first reading. The formula is calculated using the ratio of words to sentences and the percentage of words that are complex (i.e. have three or more syllables).
"linsear write formula" float 13.5 Linsear Write is a readability metric for English text, purportedly developed for the United States Air Force to help them calculate the readability of their technical manuals. It was designed to calculate the United States grade level of a text sample based on sentence length and the number words used that have three or more syllables.
"dale chall readability score" float 5.7 The Dale Chall Readability Score provides a numeric gauge of the comprehension difficulty that readers come upon when reading a text. It uses a list of 3000 words that groups of fourth-grade American students could reliably understand, considering any word not on that list to be difficult. This number is an approximate US Grade Level needed to comprehend the text.
"flesch kincaid grade" float 7.9 The "Flesch-Kincaid Grade Level Formula" presents a score as a U.S. grade level, making it easier to understand. It uses a similar formula to the Flesch Reading Ease measure.
"smog index" float 3.1 The SMOG grade is a measure of readability that estimates the years of education needed to understand a piece of writing. SMOG is the acronym derived from "Simple Measure of Gobbledygook". Its formula is based on the number of polysyllables (words with three or more syllables) and the number of sentences.
"difficult words" int 9032 The number of words in the text that are considered "difficult"; that is, they are not on a list of 3000 words that are considered understandable by fourth-grade American students.
Key Type Example Value Comment
"month name" str "June"
[Preview ]
"full" str "June, 1998"
[Preview ]
"year" int 1998 The year when the book was published according to Project Gutenberg. Keep in mind that this may not be the original publication date of the work, just that particular edition of the work. Notice that missing values have been coded as "0".
"day" int 1 The day of the month when the book was published. Notice that missing values have been coded as "0".
"month" int 6 The month of the year when the book was published; 1 corresponds to January, 2 to February, etc. Notice that missing values have been coded as "0".
Value Count
"March" 172
"February" 146
"January" 109
"May" 95
"April" 77
"July" 69
"August" 68
"June" 64
"October" 58
"December" 50
"September" 50
"November" 48
Value Count
"March 8, 2016" 9
"March, 1998" 9
"February 21, 2016" 8
"February, 2003" 7
"February 29, 2016" 7
"March 5, 2016" 7
"March 14, 2016" 6
"March 17, 2016" 6
"February, 1997" 6
"March 1, 2016" 6
"May, 2003" 6
"March 16, 2016" 6
"May, 2004" 6
"February 18, 2016" 6
"March 13, 2016" 6
"July, 2005" 6
"February 28, 2016" 6
"March 6, 2016" 5
"February 27, 2016" 5
"February 26, 2016" 5
"July, 2004" 5
"February 24, 2016" 5
"March 9, 2016" 5
"May 19, 2008" 5
"May, 2005" 5
"February 17, 2016" 5
"January 16, 2006" 5
"August 20, 2006" 5
"July 1, 2008" 5
"June 9, 2008" 5
"April, 2002" 4
"January 1, 1623" 4
"March 3, 2016" 4
"March, 1997" 4
"January, 2006" 4
"January, 2005" 4
"August, 2003" 4
"November, 1997" 4
"July, 2003" 4
"February, 2005" 4
"October, 2004" 4
"February 11, 2006" 4
"February, 2001" 4
"April 27, 2006" 4
"March 11, 2016" 4
"February 22, 2016" 4
"March 7, 2006" 4
"May, 1996" 4
"July, 1996" 4
"November, 1999" 4
"April, 1999" 4
"March 2, 2016" 4
"June 17, 2008" 3
"September, 2001" 3
"June, 2002" 3
"June, 2000" 3
"April, 2004" 3
"April, 2005" 3
"March 12, 2016" 3
"February 25, 2006" 3
"January, 1994" 3
"March, 1999" 3
"June, 1996" 3
"August 18, 2006" 3
"June 23, 2008" 3
"February, 1999" 3
"February, 1995" 3
"November, 1996" 3
"March 11, 2006" 3
"October, 2000" 3
"October, 2001" 3
"December, 1999" 3
"February 8, 2005" 3
"February, 2004" 3
"January 10, 2006" 3
"March 15, 2016" 3
"April 6, 2009" 3
"March 4, 2016" 3
"May, 1999" 3
"July, 1998" 3
"April, 1996" 3
"January 9, 2006" 3
"March 2, 2011" 3
"January, 1997" 3
"August, 1999" 3
"February 23, 2016" 3
"January 12, 2006" 3
"September, 2005" 2
"September, 2004" 2
"June, 2005" 2
"June, 2001" 2
"April, 2001" 2
"June 18, 2004" 2
"January, 2002" 2
"October, 1997" 2
"October, 1992" 2
"January, 1995" 2
"January, 1996" 2
"January, 1999" 2
"January, 1998" 2
... ...

Downloads

Download all of the following files.

Usage

This library has 1 function you can use.
import classics
list_of_book = classics.get_books()
Additionally, some of the functions can return a sample of the Big Data using an extra argument. If you use this sampled Big Data, it may be much faster. When you are sure your code is correct, you can remove the argument to use the full dataset.
import classics
# These may be slow!
list_of_book = classics.get_books(test=True)

Documentation

 classics.get_books(test=False)

Returns books from the dataset.