Course Description

Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine learning approaches for data integration.

Overall Flow of the Class:
(Module = Group of Lectures)
  • Introduction
  • Module on "the Data" (Genomic, Proteomic & Structural Data), introducing the main data sources (their properties, where you access, &c)
  • Module on Databases & Data Science Issues (Knowledge Representation incl. Sem. Web & Privacy, Provenance & Standards)
  • Module on Mining (Alignment & Variant Calling, Supervised & Unsupervised Approaches, Networks)
  • Module on Cell Modeling
  • Module on Molecular Modeling
  • MW 1:00 - 2:15 PM, Bass 305 
Discussion Section:
  • Bass 405 (subject to change)

Different headings for this class (4 variants)

  • CB&B752/CPSC752 - Grad. w/ programming
This graduate-level version of the course consists of lectures, in-class tests, programming assignments, and a final programming project.
  • MB&B452/MCDB452 - Undergrad. 
This undergraduate version of the course consists of lectures, in-class tests, written problem sets, and a final (semi-computational section and a literature survey) project. 
  • MB&B752/MCDB752 - Grad. w/o programming 
This graduate-level version of the course consists of lectures, in-class tests, written problem sets, and a final (semi-computational section and a literature survey) project. Unlike CBB752, there is no programming required.
  • MB&B 753a3/MB&B 754a4 - Modules
For graduate students the course can be broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):
753 - Biomedical Data Science: Mining (1st half of term)
754 - Biomedical Data Science: Modeling (2nd half of term)
Each module consists of lectures, in-class tests, written problem sets, and a final, graduate level written project that is half the length of the full course's final project.
  • Auditing
This is allowed. We would strongly prefer if you would register for the class.


The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analysis that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.

Students should have:
  1. A basic knowledge of biochemistry and molecular biology. 
  2. A knowledge of basic quantitative concepts, such as single variable calculus, basic probability and statistics, and basic programming skills.
These can be fulfilled by: MBB 200 and Mathematics 115 or permission of the instructor.

Class Requirements
Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

In-class tests: Midterm & Quiz

  • There will be a midterm covering the 1st half of the course.
  • There will be a quiz covering 2nd half of the course comprising SIMPLE questions that you should be able to answer from the lectures plus the main readings.
For references, please refer the previous quizzes and answer keys from Fall 2012

Programming Assignments (Req'd for CBB and CS students)

  • There will be four homework assignments including assignment 0. We will try to promote the idea of reproducible research and using version control system, specifically GitHub, in facilitating the process of homework submission.
  • For Homework 1, you will be given an opportunity to get familiar with GitHub and programming with version control. You can choose to either submit your homework through GitHub OR through email. However, for the later assignments, you will only be able to submit homework through GitHub.
  • For the programming assignments, you can use either R or Python. However, if you would like to use other programming languages, please contact the TAs and request for a permission.
  • For detailed instruction and information, please refer the Start up for Homework 1 & Homework Submission Instructions.

Non-programming Assignments (For MB&B and MCDB students)

  • There will be equivalent four homework assignments (including assignment 0) for MB&B and MCDB students without a programming background. Programming part will be replaced with assignments involving the use of web-based tools or essay questions.

The course syllabus as a single PDF can be found HERE
(Simple HTML of the schedule is also available.)

Pages from previous years

2016 is the 19th time Bioinformatics has been taught at Yale. Pages for the 18 previous iterations of the class are available. Look at how things evolve!

  • Homework 2 DUE DATE: May 2nd (Tuesday) 2017, 11:59 pmHomework 3 covers lectures from Prof. O'Hern.Choose to do either MCDB&MBB or CBB&CS homework, depending on your ...
    Posted Apr 30, 2017, 8:21 AM by Mengting Gu
  • Homework 0 Welcome to the 20th iteration of "Biomedical Data Science: Mining and Modeling" course. Homework 0 is only meant for you to actively check-in to the class and get involved ...
    Posted Jan 12, 2017, 8:51 AM by Paul Muir
Showing posts 1 - 2 of 3. View more »
Final Project
NameDue DateDescription
Showing 0 items from page Final Project sorted by Due Date, edit time. View more »
Showing 0 files from page Section Readings.