Parenting Books on Amazon.sg

What kinds of Parenting books sold on Amazon.sg?

I was curious to know the kinds of parenting books sold on Amazon.sg. Searching “parenting” in Amazon.sg gets a list of books such as this

How search results look like on Amazon.sg

Web-scrapping search results off Amazon.sg website

Using a search word “parenting” in Amazon.sg, I hit 75 pages of search results of about 1400 books. I then scrapped information on the book title. My codes for scrapping can be accessed here.

Let’s take a look at the variables in the dataframe.

glimpse(data_all)

## Rows: 1,198
## Columns: 3
## $ id       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ title    <chr> "Family Fun Night Conversation Starters Placemats: 375 Questi…
## $ pub_type <chr> "Novelty Book", "Paperback", "Hardcover", "Paperback", "Hardc…

From first few rows of the data, we can see the titles and type of publication. The book “How to Raise Kids Who Aren’t Assholes” seems interesting.

library(knitr)
data_all %>%  head(5) %>% kable(digits = 2)

id	title	pub_type
1	Family Fun Night Conversation Starters Placemats: 375 Questions That Celebrate Family and Create Lasting Memories	Novelty Book
2	Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, from Birth to Preschool: 2	Paperback
3	Parenting: 14 Gospel Principles That Can Radically Change Your Family	Hardcover
4	The Danish Way of Parenting: What the Happiest People in the World Know About Raising Confident, Capable Kids	Paperback
5	How to Raise Kids Who Aren’t Assholes: Science-Based Strategies for Better Parenting–from Tots to Teens	Hardcover

Preprocessing the data

Let’s turn the dataframe into a corpus. Then using Quanteda’s package to tokenize the titles and also perform some pre-processing steps. I removed any numbers, punctuations, filtered out stopwords (which are not useful for analysis, e.g., is, and, we). And then convert the tokens object to dfm.

corp <- corpus(data_all,
               docid_field = "id",
               text_field = "title")

tok <- 
    corp %>% 
    tokens(remove_numbers = T,
           remove_punct = T,
           remove_separators = T,
           remove_symbols = T,
           remove_url = T,
           include_docvars = T) %>% 
    tokens_tolower() %>% 
    tokens_select(pattern = c(stopwords("en")),
                  selection = "remove") %>% 
    tokens_ngrams(n = 1:2) 

dfm <-
    tok %>% dfm()

Popular words used in the titles

Let’s see what words appear in the most documents. Most books seem to be guides to parents.

library(quanteda.textstats)
topfeatures(dfm, n = 20,scheme = "docfreq")

##     guide      kids parenting  children     child      baby      book   parents 
##       180       149       135       130       129        80        80        77 
##   raising   edition      life      help       new      love    family     birth 
##        72        68        67        65        51        50        49        44 
## pregnancy     happy   child's   healthy 
##        41        40        38        37

Let’s try to plot the number of occurrences of words in the titles. Let’s convert the dfm to a tidy format where it is a table with one-token-per-document-per-row

Each word in each document is given a row in a tidy format.

document	term	count
1	family	2
1	fun	1
1	night	1
1	conversation	1
1	starters	1
1	placemats	1
1	questions	1
1	celebrate	1
1	create	1
1	lasting	1
1	memories	1
1	family_fun	1
1	fun_night	1
1	night_conversation	1
1	conversation_starters	1
1	starters_placemats	1
1	placemats_questions	1
1	questions_celebrate	1
1	celebrate_family	1
1	family_create	1

Plotting frequency of word occurrences in book titles

word_freq <-
    data_tidy %>% 
    group_by(term) %>% 
    summarize(word_n = sum(count)) %>% 
    ungroup()

word_freq %>% 
    filter(term != "parent" & term != "parents" & term != "parenting") %>% 
    slice_max(word_n, n = 30) %>% 
    ggplot(aes(x = word_n, y = reorder(term,word_n))) +
    geom_textsegment(aes(yend=term, xend=-0, label = term),  
                 alpha = .7, 
                 size = 5, 
                 linewidth = 1.5,
                 linecolor = "black",
                 textcolor = nus_blue, fontface = 7, family = font
              ) +
        geom_point(size = 3, color = tar_blue,  alpha = 1) +    
    labs(title = "Parenting books on Amazon.sg",
         subtitle = "Most used words in book titles",
         x = "# times word appeared",
         y = "Word",
         caption = "GERARDCHUNG.COM"
    ) + 
    theme(#legend.position = "none",
        plot.title = element_text(size=22, face="bold"),
        plot.subtitle = element_text(size=18),
        axis.title.x = element_text(size = 12),
        axis.title.y = element_text(size = 12),
        axis.text.y = element_blank(),
       # axis.text.y = element_text(size=12),
        axis.text.x = element_text(size=15),
       axis.ticks.y = element_blank()
    ) + 
    scale_x_continuous(limits = c(0, 190), breaks = seq(0, 200, by = 20), expand = c(0, 0))

We have books that are guides to parenting. Books for children ranging from infants, toddlers, children, and teens. Books probably for new parents. Book for pregnant parents. Books to improve sleep for the newborns. Some books are revised books and some spawned new editions. Activity books.

More books for mothers?

Would there be more book titles mentioning “mothers” than “fathers”?

str_view_all(dadmum1$title, regex(pattern = "dad|mum|mother|father|dads|mums|mothers|fathers", ignore_case = T), match =T)

There are twice the number of books offered for mothers than for fathers. Still, most books were written for parents in general and do not have references to either mothers or fathers.

Would word usage in titles be different across books for mums and for dads?

What Topics Were These Parenting Books On Amazon.sg About?

To identify topics, I ran structural topic modeling on the titles. STM is based on the Latent Dirichlet Allocation algorithm and its basic premise is to model documents as distributions of topics (topic prevalence) and topics as a distribution of words. Basically, it is used to quantitatively look at what latent topics generated these words used in the titles. STM provides a quick method to “qualitatively” analyze a large set of textual data. If you want to read more on analyzing open-ended survey responses using STM, read my paper here

Identify the possible number of topics using exclusivity and semantic coherence

The first challenge is to identify the possible number of latent topics (k) that had generated the distribution of words and topics. Using a plot of semantic coherence and exclusivity, the best k number seems to be k=26 topics (it has the best semantic coherence and exclusivity)

Twenty-six Topics and their highly associated words

With a model of 26 topics best fitting the data, let’s take a look at the highly associated words for each of the 26 topics. We see books on guiding parents to love their kids/teens (Topic 1), on motherhood and pregnancy (Topic 3 and 24), helping parents to develop rich relationships with kids (Topic 25), babies and language development/sleeping (Topic 9 and 6), journaling the process of motherhood/pregnancy (Topic 11), and books on what to expect in the 1st year of parenting (Topic 5). There are also activity books for kids (Topic 14). Dad-Jokes books for parents (Topic 18). Books related to children with Autism (Topic 10). Books on home-schooling (Topic 13)

Topic	Expected topic proportion	Top 6 terms
Topic 2	0.109	famili, babi, mom, practic, blue, calendar
Topic 1	0.082	life, love, teen, kid, boy, guid
Topic 25	0.073	children, rais, guid, kid, rich, relationship
Topic 3	0.068	guid, pregnanc, edit, birth, first, revis
Topic 7	0.063	child, kid, rais, learn, children, empow
Topic 20	0.060	help, kid, child, children, guid, mind
Topic 19	0.058	talk, child, children, edit, littl, way
Topic 9	0.052	babi, languag, help, child, guid, children
Topic 24	0.041	mother, guid, mom, child, day, motherhood
Topic 16	0.032	kid, children, child, build, power, thrive
Topic 11	0.030	journal, pregnanc, babi, kid, keepsak, memori
Topic 23	0.028	babi, know, year, need, everi, first
Topic 18	0.027	book, girl, dad, joke, kid, bodi
Topic 4	0.026	read, novel, six, help, french, child
Topic 13	0.026	child, home, guid, read, age, teach
Topic 5	0.026	year, expect, first_year, famili, first, effect
Topic 12	0.025	child, rais, children, shape, kid, complet
Topic 22	0.025	littl, rais, book, guid, littl_book, children
Topic 14	0.025	activ, workbook, child, fun, kid, exercis
Topic 17	0.024	kid, guid, way, overcom, go, kid_guid
Topic 15	0.023	rais, success, kid, child, children, world
Topic 21	0.019	children, adult, rais, happi, child, kind
Topic 10	0.018	autism, mindset, think, help, kid, skill
Topic 6	0.017	sleep, babi, babi_sleep, night, solut, help
Topic 8	0.017	magic, old, babi, wean, year_old, year
Topic 26	0.005	children, guid, feel, way, emot, kid

Conclusion

It was interesting to see what topics of parenting books were offered on Amazon.sg. This ranges from practical books such as journaling books to joke books, to guides books, as well as books on skills and parenting knowledge. There are still more books for mums than for dads. What will be interesting to look at is how these topics of books change in line with their publication date. Would the number of books for dads change with time? Would certain topics (e.g., books on brain development) change with time?

About me

Go to my personal website gerardchung.com to check out my other ongoing work