pub-3719887310684815
top of page

The data science Python libraries I used for the quiz game.

Aug 15

2 min read

0

13

0

In the planning stages, I had to decide on the dataframe to create using the Pandas library.


A couple dataframes were used to fill in the data in the main dataframe, which I called DF. Two others called Years and Decades helped hold the years and decades information would fill in the years and decades info using a for loop.


This helped create the wordcloud that helped me both make sure the concept was working, and to help find any words that weren't supposed to be included, like "Solo", "Instrumental" and such.


That would produce this:

Note that all of the lyrics are all in one field. Later, I will extend the dataframe to hold both the individual words of the lyrics, and another column that holds just the context.

I defined the context as 6 words before and after each seperated word. That is when the sentiment analysis can be performed.


After creating the data frame depicted in the above screenshot, I used the Numpy library to create statistics for the number of words in the lyrics per decade. In this project, it might not be necessary, but if this were a project for a business analysing customer feedback, or any other written data, parsing the whole sentance into individual words to be ranked would be very important.



The above chart displays the most frequent words used in Taylor Swift's lyrics in the 2020's. A fan might find this interesting, but if this were the most frequent words used in feedback this week, last month, last year, year over year... This could be very valuable information for a company wanting to measure customer feedback.


This is all for this entry. I still have to talk about both accessing the genius.com API, performing the sentiment analysis, and comparing that analysis between Taylor Swift and Metallica lyrics to make a viable and challenging quiz game.



Comments

Share Your ThoughtsBe the first to write a comment.
bottom of page