WordCount Exercise # We will perform word count using Spark with Christmas Carol Text # If you are working with the notebook on your local computer

WordCount Exercise

# We will perform word count using Spark with Christmas Carol Text

# If you are working with the notebook on your local computer instead 

# of EC2 you can download the text from https://www.gutenberg.org/files/46/46-h/46-h.htm

# We look at first 10 lines

!head -10 christmas.txt

# Create a textfile RDD

christmas = sc.textFile(“christmas.txt”)

#simple actions on the RDD

print(christmas.count())

print(christmas.first())

print(christmas.take(10))

Questions: Use Spark, how do I get the 20 most common “lowercased” words (don’t count stopwords)

Leave a Reply