Frequently Asked Questions

FAQs about Data Science Capstone:

What is the D2K capstone program?
  • In the D2K capstone program, D2K Affiliate members, community partners, or researchers partner with Rice University to sponsor teams of students that work on a semester-long data science project defined by the sponsor.
  • Teams of 4-6 students work with the sponsor’s data and on a sponsor-defined data science problem.
  • Teams consist of advanced undergraduates, professional masters students, and beginning PhD students, all with previous advanced coursework in data science.
  • Teams are co-mentored by a sponsor mentor, D2K Fellows and Rice data science faculty.
  • Teams produce a data science research or analysis report and deliver all software and scripts used in the report in a reproducible research format.
  • With the Sponsored Research Agreement, any resulting Intellectual Property (IP) is owned by the sponsor.
  • Teams will exclusively use open-source software such as R and Python.
Who can sponsor a capstone project?
  • Companies who are D2K Affiliate members.
  • Community partners including government agencies and non-profit organizations (D2K Affiliate membership is complimentary).
  • Researchers from Rice or the Texas Medical Center.
What types of projects are appropriate?
  • Projects should be real-world, impactful projects that would be meaningful to the sponsor and motivating for the students.
  • Projects should be challenging, but with well-defined objectives. We recommend that the objectives be tiered in nature with some that are easily achievable within a semester and with others that are a stretch to motivate and challenge the students.
  • While project objectives can focus on any aspect of the data analysis pipeline, the most effective projects will encompass aspects of each of the four stages of data science:
    • Data Wrangling is the process of cleaning and unifying messy and complex data sets for easy access and data analysis.
    • Exploratory Data Analysis (or Data Mining) visually explores data and finds interesting features or patterns in large data sets.
    • Data Science Modeling uses statistical or machine learning models to address specific objectives.
    • Communication & Validation of Results is critical for confirming data-driven discoveries and assessing the accuracy of data science models.
How should the projects be organized?
  • The best projects are organized in such a way that student teams can get started on day one.
  • While student teams will need to learn discipline-specific terms and discipline-specific data complexities, the more these can be minimized or abstracted away, the more time student teams will be able to spend on the data science aspects.
  • In cases where data wrangling is a significant portion of the project, it is helpful to provide students with a subset of “cleaned” data that they can use as a model to begin thinking about how to address the project’s objectives.
  • For complicated projects, it is helpful to provide the students with a very general roadmap of a proposed data science pipeline to help them get started right away.
  • It is often helpful to provide the student team with background reading on the project as well.
What data is appropriate?
  • Sponsors should provide data that they own
  • All D2K students must complete a basic training module on data security. While we make every effort to ensure that all data is handled in a secure manner, we caution that these are educational projects.
How do sponsors share data with Rice and the team?
  • There are two mechanisms by which D2K student teams can work with sponsor’s data:
    • Sponsors maintain complete control of the data and allow the D2K student team (and instructional staff) access to the data and computing resources through virtual private machines. Under this model, sponsors are responsible for data security and responsible for any costs associated with providing secure access to the data and the associated computational resources.
    • Sponsors can transfer non-confidential data to Rice and allow students to work with the data on Rice systems and student-owned machines.
What is the Sponsored Research Agreement?

The sponsored research agreement (SRA) is the mechanism under which affiliates can sponsor a D2K Learning Lab student team and own any resulting Intellectual Property.

Is this Sponsored Research Agreement negotiable?

No. For the fee associated with the SRA, we are not able to negotiate.

What is the fee associated with sponsoring a team?
  • For companies: TBD
  • For community partners: TBD
  • For researchers: TBD

Email your questions, d2k@rice.edu

What does the sponsorship fee cover?

The sponsorship fee covers all instructional costs associated with the D2K Learning Lab (e.g. instructors and teaching assistants), the legal and contractual fees associated with the sponsored research agreement, and the necessary computing costs for the team to complete the project.

What types of students can I expect on my team?

Student teams are comprised of students enrolled in the Data Science Projects course at Rice. These are typically advanced undergraduates, professional masters students, and beginning PhD students who have a technical background in data science. Currently, the D2K Learning Lab serves as the capstone for the Statistics majors, the Data Science Minor, the Data Science Professional Masters, and an elective capstone for the Computer Science major.

How much time will the team devote to my project?

Each student is expected to spend roughly 10 hours a week on their project.

What is the process of matching students to projects?
  • Before the semester begins, students are required to review project descriptions for all available projects and rank their project preferences. Students are then placed on teams and matched to projects based on their preferences, the specific skill sets needed to complete the project, and to ensure diverse and balanced teams.
  • Note that the number of projects that we are able to accommodate in a given semester will depend on the course enrollment and student preferences.
Are there examples of successful capstone projects?

Yes! Click here to view examples of past capstone projects.

What are sponsor mentors and what are they expected to do?

Sponsor mentors are an important part of the data science capstone program. Mentors should be able to provide clear direction and understanding of the project description and goals. It is very important to meet once a week with your student group in person or by video conference.

Can I propose a year-long project?
  • Yes! To follow the academic calendar, we structure year-long projects as two semester-long projects. Sponsors should clearly delineate project objectives.
  • For year-long projects, we try our best but cannot guarantee the same team composition both semesters.
  • If companies want to sponsor a year-long project please discuss this with our executive director via email, d2k@rice.edu

Still need help? Contact us at d2k@rice.edu.