The Rice Datathon 2019 competition attracted experienced and novice data scientists for a fast-paced, 18-hour data mining competition on Jan. 25-26.
More than 150 Rice University students and alumni participated in the event organized by the Rice DataSci Club, with support from the Rice Center for Transforming Data to Knowledge (D2K Lab). Shell, a D2K Lab Affiliate, was the sponsor. The Liu Idea Lab was a perfect venue for the Datathon with plenty of space for students to work independently or in teams.
The organizers were Abhijeet Mulgund, senior in computer science (CS); Emily Wang, sophomore in statistics (STAT); Lynn Zhu, senior in statistics.
“I thought it was a lot of fun and well organized," said Santiago Tellez, a STAT senior. "I would love to see it grow and have more people involved. I liked the teaching sessions to help us in our analysis,”
The first-place winners were Benito Geordie, freshman, engineering; Gerald Wang, sophomore, engineering: David Torres Ramos, sophomore, engineering; Yong Shin, sophomore, CS, for “Twitter Sentiment Analysis.” Their goal was to build a sentiment-analysis tool to predict the polarity of a tweet based off a collection of tweets and their respective polarities. The group analyzed 1.6 million tweets to develop an algorithm that could predict the positivity of a future tweet.
The winners developed a sort of pseudo-machine learning and fed it large amounts of data.
“Everything was made from scratch and we didn't use any libraries,” Torres Ramos said. The team figured out that by determining the most common positive and negative words they could cluster tweet polarities to determine the community’s disposition on current topics. It could be used with other machine-learning algorithms to aggregate different news stories and trends, as well as help people develop tweets that can elicit certain responses.
“I would like to thank the D2K Lab and the DataSci Club for hosting the first Datathon. It was well coordinated and I was always able to get clarification on technical concepts,” Torres Ramos said.
The second-place winner was Santiago Tellez, senior in STAT, for “Craft Beer: Beer Name Sentiment Analysis.” He analyzed data on 2,400 craft beers and 500 breweries across the country.
“I noticed that craft breweries classify their beers into hundreds of vague types of beer," Tellez said. "I used beer forums online to classify the beers into eight categories: ale, IPA, lager, stout, wheat, fruit, kolsch and other.”
The goal was to use data on alcohol percentage, bitterness, the name of the beer and brewery location to predict the type of beer it is. He used sentiment analysis to label each beer name with certain emotions -- anger, joy, surprise, trust and so forth. He then performed correspondence analysis on the sentiment matrix to use those features in the prediction model.
Third place went to Damon Shan, a Rice alumnus; Alvin Sheng, senior in STAT; and Ouyang Zhu, a first-year graduate student, STAT; for RiceDatathon NBA. Their project tested various models to predict an NBA player’s salaries based on performance, age, height and other factors from Kaggle data sets and data taken from the web.