A new, innovative course introducing Rice students to the world of data science
Introduction to Data Science (DSCI 101) serves an important role in Rice's undergraduate data science curriculum. It is part of the new streamlined prerequisites for the Data Science Minor that makes data science accessible to students from different backgrounds. The real data science examples and projects also help to attract students with broad interests and show how data science can help solve challenges in nearly every industry and discipline.
"We are excited about growing this important introductory course, attracting more diverse students to data science, and promoting more hands-on data science opportunities in the Rice Data Science Minor," said Genevera Allen, founder and former director of the D2K Lab.
The Introduction to Data Science course had over 50% of its students from social sciences, humanities, and the arts in Spring 2022 with 40% of the students women and 25% underrepresented minorities.
This new and innovative introductory data science course is targeted for freshmen and students with little to no background in programming and statistics. "The mission is to make data science education accessible to all Rice students, regardless of their background and their chosen area of expertise," said Su Chen, Assistant Teaching Professor in the D2K Lab, who started the class in Fall 2020 after joining Rice. "I believe all students should have the opportunity to learn how to reason sensibly based on data and make more informed decisions with the knowledge of data science."
In the first week of class, students will be able to run a Jupyter notebook to explore a simple case study. They can make small changes in the code and personalize the code demo, and this gives them a great sense of accomplishment.
Throughout a semester, students learn the fundamentals of data science and Python programming while working on teams to solve real data science challenges, design a data science pipeline, and derive and communicate valuable insights from data.
"This course aims to show students where the door is when they enter the world of data science," said Dr. Chen. "I try to flatten the learning curve in the beginning and really make sure the students get the fundamentals."
One big challenge for DSCI 101 is to find public data sets that are interesting and appropriate for intro-level students. To expose the students to real-world data, Dr. Chen plans to develop a repository of DSCI 101 data sets for students to choose from during the course.
"I want to avoid most of the textbook toy data sets that are outdated and over-used. In addition, it would be great to have data sets from many disciplines given the diverse backgrounds of students!”
Donate your data for educational use in DSCI 101 Introduction to Data Science
The D2K Lab is looking for real, relevant, and recent data sets to be used in this course for developing code demos, case studies, and for students to use as their team projects.
Data sets and attributes that we are looking for:
- Tabular data set organized in data frames/data tables/spreadsheets, preferred in csv or excel format.
- Samples of text corpora for NLP beginners.
- Does not require much background knowledge to understand. Students with high school knowledge should be able to understand the variables in the data set with appropriate documentation and codebook.
- Not too large in size ( limit to 1 GB )
- Tabular data that have enough numerical variables (15 - 20 columns of numerical variables + other categorical variables).
- Decent data quality: not too many missing values, minimum data errors and discrepancy, and no privacy or confidentiality concerns.
- If you know of existing publicly available data sources that are appropriate for intro-level students, let us know!
In addition, please share your knowledge of any existing publicly available data sources that are appropriate for intro-level students with your suggestions of potential data questions.