Cambridge Bioinformatics Hackathon

The 25-27th September 2017 September will see our first Cambridge Bioinformatics Hackathon (#CamHack17) at the Craik-Marshall Building, Downing Site, Cambridge CB2 3AR.  The aim is to bring together programmers from the Cambridge area to work on bioinformatics or computational biology projects.

Many thanks to Genestack for sponsoring the Cambridge Bioinformatics Hackathon

Below lists some of the projects that will be undertaken on the Hackathon.

Name Affiliation Project Summary
Dr Steven Wingett The Babraham Institute Adding functionality to FastQC so that it detects and reports the extent of putative flowcell close proximity duplicate clusters in FASTQ files.
Dr Felix Krueger The Babraham Institute Attempt to write an overlap assembler for a rather tricky immunoglobulin locus in the 129 mouse strain using MinIon Nanopore reads.
Dr Christel Krueger The Babraham Institute Predicting from sequencing data by which protocol a bisulfite library was generated.
Mr Olly Burren Department of Medicine,
University of Cambridge
A web based tool for integration and visualisation of targeted chromatin interaction datasets with genetic/genomic data. Specifically I want to add functionality for integrating and querying variant positional data from the BRIDGE dataset of 10k whole genome sequences, both though frontend and elastic search backend using R.
Dr Anna Gogleva Sainsbury Laboratory,
Cambridge University
Interactive clustering of transcriptome data
Mr Hugo Tavares Sainsbury Laboratory,
Cambridge University
Interactive clustering of transcriptome data
Dr Sandra Cortijo Sainsbury Laboratory,
Cambridge University
Interactive clustering of transcriptome data
Mr Kevin Dialdestoro Genestack I have two projects and I cannot decide on which one to drop, so I will see if I can do a bit of both! The first one is a continuation of what we started this summer: predicting the cell types of new single-cell RNA-Seq samples based on past collection of annotated gold-standard single-cell RNA-Seq experiments. We now want to put together an R package for this: you will be able to easily supply a collection of annotated samples and use it to predict new samples using a basic PCA-based model. The package will be open-sourced and we hope that future uptakes of this project by the community will incorporate more advanced methods such as deep learning.

The second project I have is about automatically finding an optimal P-value threshold. In Bioinformatics, we do multiple hypothesis testing a lot, e.g. in GWAS and differential expression analysis. This introduces many false positives and currently, we perform a multiple-testing correction and choose an arbitrary threshold to limit them. But, this manual thresholding is sub-optimal and your significant results may unnecessarily include too many false positives or exclude too many true positives. I will present an idea for an automatic and data-driven alternative and try to implement and demonstrate its effectiveness.

Crina Samarghitean Judge Business School R, database construction, machine learning
Dr Russell Hamilton CTR, University of Cambridge RNA-Seq QC Tools and/or new features for ParticleStats

Please click here to download more information on the hackathon projects.

The hackathon is now fully booked, but please click here to be placed on the waiting list if people currently enrolled decide not to participate.

Click here to download the hackathon flier for printing

For further details, please contact