I was curious to know the kinds of parenting books sold on Amazon.sg.
Searching “parenting” in Amazon.sg gets a list of books such as this
Using a search word “parenting” in Amazon.sg, I hit 75 pages of search results of about 1400 books. I then scrapped information on the book title. My codes for scrapping can be accessed here.
Let’s take a look at the variables in the dataframe.
glimpse(data_all)
## Rows: 1,198
## Columns: 3
## $ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ title <chr> "Family Fun Night Conversation Starters Placemats: 375 Questi…
## $ pub_type <chr> "Novelty Book", "Paperback", "Hardcover", "Paperback", "Hardc…
From first few rows of the data, we can see the titles and
type of publication. The book “How to Raise Kids Who Aren’t
Assholes” seems interesting.
library(knitr)
data_all %>% head(5) %>% kable(digits = 2)
id | title | pub_type |
---|---|---|
1 | Family Fun Night Conversation Starters Placemats: 375 Questions That Celebrate Family and Create Lasting Memories | Novelty Book |
2 | Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, from Birth to Preschool: 2 | Paperback |
3 | Parenting: 14 Gospel Principles That Can Radically Change Your Family | Hardcover |
4 | The Danish Way of Parenting: What the Happiest People in the World Know About Raising Confident, Capable Kids | Paperback |
5 | How to Raise Kids Who Aren’t Assholes: Science-Based Strategies for Better Parenting–from Tots to Teens | Hardcover |
Let’s turn the dataframe into a corpus. Then using Quanteda’s package to tokenize the titles and also perform some pre-processing steps. I removed any numbers, punctuations, filtered out stopwords (which are not useful for analysis, e.g., is, and, we). And then convert the tokens object to dfm.
corp <- corpus(data_all,
docid_field = "id",
text_field = "title")
tok <-
corp %>%
tokens(remove_numbers = T,
remove_punct = T,
remove_separators = T,
remove_symbols = T,
remove_url = T,
include_docvars = T) %>%
tokens_tolower() %>%
tokens_select(pattern = c(stopwords("en")),
selection = "remove") %>%
tokens_ngrams(n = 1:2)
dfm <-
tok %>% dfm()
Let’s see what words appear in the most documents. Most books seem to be guides to parents.
library(quanteda.textstats)
topfeatures(dfm, n = 20,scheme = "docfreq")
## guide kids parenting children child baby book parents
## 180 149 135 130 129 80 80 77
## raising edition life help new love family birth
## 72 68 67 65 51 50 49 44
## pregnancy happy child's healthy
## 41 40 38 37
Let’s try to plot the number of occurrences of words in the titles. Let’s convert the dfm to a tidy format where it is a table with one-token-per-document-per-row
Each word in each document is given a row in a tidy format.
document | term | count |
---|---|---|
1 | family | 2 |
1 | fun | 1 |
1 | night | 1 |
1 | conversation | 1 |
1 | starters | 1 |
1 | placemats | 1 |
1 | questions | 1 |
1 | celebrate | 1 |
1 | create | 1 |
1 | lasting | 1 |
1 | memories | 1 |
1 | family_fun | 1 |
1 | fun_night | 1 |
1 | night_conversation | 1 |
1 | conversation_starters | 1 |
1 | starters_placemats | 1 |
1 | placemats_questions | 1 |
1 | questions_celebrate | 1 |
1 | celebrate_family | 1 |
1 | family_create | 1 |
word_freq <-
data_tidy %>%
group_by(term) %>%
summarize(word_n = sum(count)) %>%
ungroup()
word_freq %>%
filter(term != "parent" & term != "parents" & term != "parenting") %>%
slice_max(word_n, n = 30) %>%
ggplot(aes(x = word_n, y = reorder(term,word_n))) +
geom_textsegment(aes(yend=term, xend=-0, label = term),
alpha = .7,
size = 5,
linewidth = 1.5,
linecolor = "black",
textcolor = nus_blue, fontface = 7, family = font
) +
geom_point(size = 3, color = tar_blue, alpha = 1) +
labs(title = "Parenting books on Amazon.sg",
subtitle = "Most used words in book titles",
x = "# times word appeared",
y = "Word",
caption = "GERARDCHUNG.COM"
) +
theme(#legend.position = "none",
plot.title = element_text(size=22, face="bold"),
plot.subtitle = element_text(size=18),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
axis.text.y = element_blank(),
# axis.text.y = element_text(size=12),
axis.text.x = element_text(size=15),
axis.ticks.y = element_blank()
) +
scale_x_continuous(limits = c(0, 190), breaks = seq(0, 200, by = 20), expand = c(0, 0))
We have books that are guides to parenting.
Books for children ranging from infants, toddlers, children, and
teens. Books probably for new parents. Book
for pregnant parents. Books to improve
sleep for the newborns. Some books are
revised books and some spawned new
editions. Activity books.
Would there be more book titles mentioning “mothers” than “fathers”?
str_view_all(dadmum1$title, regex(pattern = "dad|mum|mother|father|dads|mums|mothers|fathers", ignore_case = T), match =T)
There are twice the number of books
offered for mothers than for fathers. Still, most books were written for
parents in general and do not have references to either mothers or
fathers.
Would word usage in titles be different
across books for mums and for dads?
To identify topics, I ran structural topic modeling on the titles. STM is based on the Latent Dirichlet Allocation algorithm and its basic premise is to model documents as distributions of topics (topic prevalence) and topics as a distribution of words. Basically, it is used to quantitatively look at what latent topics generated these words used in the titles. STM provides a quick method to “qualitatively” analyze a large set of textual data. If you want to read more on analyzing open-ended survey responses using STM, read my paper here
The first challenge is to identify the possible number of latent topics (k) that had generated the distribution of words and topics. Using a plot of semantic coherence and exclusivity, the best k number seems to be k=26 topics (it has the best semantic coherence and exclusivity)
With a model of 26 topics best fitting the data, let’s take a look at the highly associated words for each of the 26 topics. We see books on guiding parents to love their kids/teens (Topic 1), on motherhood and pregnancy (Topic 3 and 24), helping parents to develop rich relationships with kids (Topic 25), babies and language development/sleeping (Topic 9 and 6), journaling the process of motherhood/pregnancy (Topic 11), and books on what to expect in the 1st year of parenting (Topic 5). There are also activity books for kids (Topic 14). Dad-Jokes books for parents (Topic 18). Books related to children with Autism (Topic 10). Books on home-schooling (Topic 13)
Topic | Expected topic proportion | Top 6 terms |
---|---|---|
Topic 2 | 0.109 | famili, babi, mom, practic, blue, calendar |
Topic 1 | 0.082 | life, love, teen, kid, boy, guid |
Topic 25 | 0.073 | children, rais, guid, kid, rich, relationship |
Topic 3 | 0.068 | guid, pregnanc, edit, birth, first, revis |
Topic 7 | 0.063 | child, kid, rais, learn, children, empow |
Topic 20 | 0.060 | help, kid, child, children, guid, mind |
Topic 19 | 0.058 | talk, child, children, edit, littl, way |
Topic 9 | 0.052 | babi, languag, help, child, guid, children |
Topic 24 | 0.041 | mother, guid, mom, child, day, motherhood |
Topic 16 | 0.032 | kid, children, child, build, power, thrive |
Topic 11 | 0.030 | journal, pregnanc, babi, kid, keepsak, memori |
Topic 23 | 0.028 | babi, know, year, need, everi, first |
Topic 18 | 0.027 | book, girl, dad, joke, kid, bodi |
Topic 4 | 0.026 | read, novel, six, help, french, child |
Topic 13 | 0.026 | child, home, guid, read, age, teach |
Topic 5 | 0.026 | year, expect, first_year, famili, first, effect |
Topic 12 | 0.025 | child, rais, children, shape, kid, complet |
Topic 22 | 0.025 | littl, rais, book, guid, littl_book, children |
Topic 14 | 0.025 | activ, workbook, child, fun, kid, exercis |
Topic 17 | 0.024 | kid, guid, way, overcom, go, kid_guid |
Topic 15 | 0.023 | rais, success, kid, child, children, world |
Topic 21 | 0.019 | children, adult, rais, happi, child, kind |
Topic 10 | 0.018 | autism, mindset, think, help, kid, skill |
Topic 6 | 0.017 | sleep, babi, babi_sleep, night, solut, help |
Topic 8 | 0.017 | magic, old, babi, wean, year_old, year |
Topic 26 | 0.005 | children, guid, feel, way, emot, kid |
It was interesting to see what topics of parenting books were offered
on Amazon.sg. This ranges from practical books such as journaling books
to joke books, to guides books, as well as books on skills and parenting
knowledge. There are still more books for mums than for dads. What will
be interesting to look at is how these topics of books change in line
with their publication date. Would the number of books for dads change
with time? Would certain topics (e.g., books on brain development)
change with time?
Go to my personal website gerardchung.com to check out my other
ongoing work