Title:
Detection of epistasis in genome-wide association studies with machine learning methods for therapeutic target identification.
Duration:
36 months
Starting:
03/10/2016
Application deadline:
13/06/2016
Location:
Paris / Chilly-Mazarin
Laboratory:
Centre for Computational Biology, Mines Paris Tech — Institut Curie — INSERM / In silico biology, Translational Science Unit, Sanofi
Contacts:
* Chloe-Agathe.Azencott@mines- paristech.fr
* Jean-Philippe.Vert@mines- paristech.fr
* Clement.Chatelain@sanofi.com
The Centre for Computational Biology of Mines Paris Tech and the Translational Science Unit of Sanofi are looking for a PhD candidate interested in a joint PhD project between a pharma industry and academic lab to work at the cutting edge of machine learning and genomics in a stimulating interdisciplinary research environment.
The Centre for Computational Biology is a joint laboratory between Mines ParisTech and ARMINES. It is also one of the teams of the U900 joint laboratory between Mines ParisTech, Institut Curie and INSERM dedicated to epidemiology, bioinformatics and systems biology of cancer.
The Translational Sciences Unit has a strategic position in Sanofi to implement translational research approaches throughout the drug value chain to understand the molecular mechanisms responsible for pathologies and to identify targets and biomarkers.
Context:
Genome-wide association studies (GWAS) have generated huge data sets in the past 10 years in order to find association between genetic polymorphisms and phenotypes. Until recently, most GWASs have focused on individual single nucleotide polymorphisms (SNP) markers, effectively looking at the contribution to disease of variants tagged by them in isolation. Thousands of SNPs have been associated with complex diseases by univariate analysis, but in most cases those variants independently explain only a small fraction of the estimated disease heritability (“missing heritability”). Strong evidence indicates today that genetic variants underlying most complex diseases are non-mendelian. These variants are typically not rare in the population. Independently, they show very little effect with low penetrance, but they may interact with each other in complex non-linear ways. This joint behavior of genetic variants is often referred to as epistasis or multilocus interaction. It has been speculated that epistasis ubiquitously contributes to complex diseases, partly because of the sophisticated regulatory mechanisms encoded in the human genome, and thus explains part of the missing heritability .
A vast number of methods for the detection of epistasis have been developed in recent years. However these methods still faces 3 major challenges: 1) statistical methods traditionally used in univariate SNP-phenotype associations are not adequate to find epistasis; 2) the number of combinations to be tested is tremendous, even for pairwise analyses (billions to trillions). This creates statistical power issues, and reaches the limitation of our actual computational capabilities; 3) the interpretation of the analytical results at a biological level is not straightforward.
Objective:
The objective of this PhD thesis will be (i) to develop new approaches for identifying complex multi-locus interactions in GWAS using machine learning methods, such as random forests or deep neural networks, focusing in particular on models stability and results interpretability, (ii) to implement these methods on GPU or in parallel when possible, in order to efficiently process large datasets and to integrate the solutions in the current GWAS analysis pipeline at Sanofi, and (iii) to apply these methods to selected datasets for complex diseases with an important estimated heritability.
Qualifications:
We are looking for a student with strong theoretical and practical knowledge in machine learning, statistics and optimization methods, highly motivated by an interdisciplinary research project that will take place both in an academic and industry environment. The candidate should have good programming skills (Python, R, C) and a strong interest for biology, in particular genetics and systems biology. Experience in Cuda programming or software development is a plus. The position will be open to candidates of any nationality with a master or equivalent degree (computer science, applied mathematics, statistics, data science, bioinformatics or related area). Good communication skills in French and English is required.
The PhD student will be co-supervised by Chloe-Agathe Azencott, Jean-Philippe Vert (Paris / Mines ParisTech) and Clément Chatelain (Chilly-Mazarin / Sanofi) and will be expected to spend time both at Mines and Sanofi laboratories. During the PhD, the student will be employed by Sanofi, with competitive salary and benefits.
Application/How to Apply
Interested candidates should contact:
* Chloe-Agathe.Azencott@mines- paristech.fr
* Jean-Philippe.Vert@mines- paristech.fr
* Clement.Chatelain@sanofi.com
Applications must contain a curriculum vitae, a publication list (if applicable), a letter of motivation and a list of two contact references.
Interview
http://cbio.ensmp.fr/index.php?wikipage=CbgPositions
For more scholarships, join
https://www.facebook.com/groups/1685338968416560/
Detection of epistasis in genome-wide association studies with machine learning methods for therapeutic target identification.
Duration:
36 months
Starting:
03/10/2016
Application deadline:
13/06/2016
Location:
Paris / Chilly-Mazarin
Laboratory:
Centre for Computational Biology, Mines Paris Tech — Institut Curie — INSERM / In silico biology, Translational Science Unit, Sanofi
Contacts:
* Chloe-Agathe.Azencott@mines-
* Jean-Philippe.Vert@mines-
* Clement.Chatelain@sanofi.com
The Centre for Computational Biology of Mines Paris Tech and the Translational Science Unit of Sanofi are looking for a PhD candidate interested in a joint PhD project between a pharma industry and academic lab to work at the cutting edge of machine learning and genomics in a stimulating interdisciplinary research environment.
The Centre for Computational Biology is a joint laboratory between Mines ParisTech and ARMINES. It is also one of the teams of the U900 joint laboratory between Mines ParisTech, Institut Curie and INSERM dedicated to epidemiology, bioinformatics and systems biology of cancer.
The Translational Sciences Unit has a strategic position in Sanofi to implement translational research approaches throughout the drug value chain to understand the molecular mechanisms responsible for pathologies and to identify targets and biomarkers.
Context:
Genome-wide association studies (GWAS) have generated huge data sets in the past 10 years in order to find association between genetic polymorphisms and phenotypes. Until recently, most GWASs have focused on individual single nucleotide polymorphisms (SNP) markers, effectively looking at the contribution to disease of variants tagged by them in isolation. Thousands of SNPs have been associated with complex diseases by univariate analysis, but in most cases those variants independently explain only a small fraction of the estimated disease heritability (“missing heritability”). Strong evidence indicates today that genetic variants underlying most complex diseases are non-mendelian. These variants are typically not rare in the population. Independently, they show very little effect with low penetrance, but they may interact with each other in complex non-linear ways. This joint behavior of genetic variants is often referred to as epistasis or multilocus interaction. It has been speculated that epistasis ubiquitously contributes to complex diseases, partly because of the sophisticated regulatory mechanisms encoded in the human genome, and thus explains part of the missing heritability .
A vast number of methods for the detection of epistasis have been developed in recent years. However these methods still faces 3 major challenges: 1) statistical methods traditionally used in univariate SNP-phenotype associations are not adequate to find epistasis; 2) the number of combinations to be tested is tremendous, even for pairwise analyses (billions to trillions). This creates statistical power issues, and reaches the limitation of our actual computational capabilities; 3) the interpretation of the analytical results at a biological level is not straightforward.
Objective:
The objective of this PhD thesis will be (i) to develop new approaches for identifying complex multi-locus interactions in GWAS using machine learning methods, such as random forests or deep neural networks, focusing in particular on models stability and results interpretability, (ii) to implement these methods on GPU or in parallel when possible, in order to efficiently process large datasets and to integrate the solutions in the current GWAS analysis pipeline at Sanofi, and (iii) to apply these methods to selected datasets for complex diseases with an important estimated heritability.
Qualifications:
We are looking for a student with strong theoretical and practical knowledge in machine learning, statistics and optimization methods, highly motivated by an interdisciplinary research project that will take place both in an academic and industry environment. The candidate should have good programming skills (Python, R, C) and a strong interest for biology, in particular genetics and systems biology. Experience in Cuda programming or software development is a plus. The position will be open to candidates of any nationality with a master or equivalent degree (computer science, applied mathematics, statistics, data science, bioinformatics or related area). Good communication skills in French and English is required.
The PhD student will be co-supervised by Chloe-Agathe Azencott, Jean-Philippe Vert (Paris / Mines ParisTech) and Clément Chatelain (Chilly-Mazarin / Sanofi) and will be expected to spend time both at Mines and Sanofi laboratories. During the PhD, the student will be employed by Sanofi, with competitive salary and benefits.
Application/How to Apply
Interested candidates should contact:
* Chloe-Agathe.Azencott@mines-
* Jean-Philippe.Vert@mines-
* Clement.Chatelain@sanofi.com
Applications must contain a curriculum vitae, a publication list (if applicable), a letter of motivation and a list of two contact references.
Interview
- Selected candidates will be interviewed during June 2016.
- please refer to this document (pdf).
http://cbio.ensmp.fr/index.php?wikipage=CbgPositions
For more scholarships, join
https://www.facebook.com/groups/1685338968416560/