We distinguish between 'main' development projects, and 'dedicated' on demand support projects. Read more here about the e-BioGrid approach. Additional proposals for dedicated projects can be submitted, depending on available resources at the time.
The objective of WeNMR, a EU funded project, is to optimize and extend the use of the NMR and SAXS research infrastructures through the implementation of an e-infrastructure in order to provide the user community with a platform integrating and streamlining the computational approaches necessary for NMR and SAXS data analysis and structural modelling. Access to the e-NMR infrastructure is provided through a portal integrating commonly used software and GRID technology.
applicant:
Alexander Bonvin, Bijvoet Centre for Biomolecular Research
results:
A portal with NMR and SAXS data analysis and modelling tools is available on the we-NMR website. The project team is currently working with the US OSG to enable support for our enmr.eu VO. We have already successfully deployed software and run test jobs on OSG sites, demonstrating interoperability of EU and US grids.
This is an educational project aiming at introducing CLOUD computing into the chemistry teaching at the bachelor level within the chemistry curriculum of Utrecht University. In the second year of the chemistry curriculum students can choose for a 'molecular modelling and mathematics' course in which the principles of molecular simulation techniques and force fields are introduced in parallel with all the necessary mathematical background. The course contains a practical part in which students learn the basics of molecular dynamics simulations by performing simulations of a protein using Gromacs and studying the effect of mutations on structure and dynamics. In this project CLOUD computers are used for students to perform the entire work (setup, production, analysis), on a single and dedicated system. In that way we introduce e-Science into teaching so that students can get acquainted with CLOUD computing. 'Education in the clouds' should be a novel and attractive concept for students.
applicant:
Alexandre Bonvin, Utrecht University
results:
aanvraag niet beschikbaar
status:
ongoing
team:
Alexandre Bonvin, Adrien Melquiond, Alain van Hoof
Metagenomics analyses are based on next-generation sequence data. The assembly of reads into contigs, and functional annotation of either contigs or reads in next-generation sequencing requires significant computing resources. Creating Grid and Cloud computing pipeline solutions for next-generation sequence data analysis would be an beneficial contribution to effective metagenomics research.
applicant:
Sacha van Hijum, Center for Molecular and Biomolecular Informatics
results:
The e-BioGrid programmers position has recently been filled by Jumamurat Bayjan, and is now ready to start developing a pipeline for their next-generation sequencing platform. This will be done in close collaboration with the NBIC NGS taskforce and the other eBioGrid team members involved with NGS data analysis.
status:
ongoing
team:
Victor de Jager (CMBI), Machiel Jansen, Niek Bosch, Jumamurat Bayjan
In this project we will like to explore a solution to enable high throughput processing of next-gen sequence data in grid or cloud. We have two large next-gen sequence available or coming in the April or May, 2011: Dutch Genome Project (250 Dutch trios, parents plus child) and Leiden Longevity Studies (222 individuals with longevity phenotype) while the raw data will be about 60T and 100T, respectively. A simplified pipeline has been prepared for a local cluster to process the pilot data of Dutch genome project and this pipeline will be a starting point to explore a comprehensive solution to port all necessary tools to a grid environment.
In a genome wide association (GWA) analysis, genetic variants (Single Nucleotide Polymorphisms: SNPs) across the whole genome are tested for the association with a certain trait (such as body weight or a certain disorder). With the data that is currently available, this signifies that 1.5 to 4.5 million tests are performed. These tests can be set up using structural equation modeling in which covariance structures with fixed effects are analyzed. Due to the large amount of tests GWA analysis is computationally expensive. Because genomic data are produced at increasing density and rapidly decreasing cost the need to apply state-of-the-art high performance computing methods in GWA analyses becomes urgent. Approaches to solve this problem are to use grid technology, and to use the computer hardware more efficiently either by making use of GPUs or by optimizing the algorithms used.
applicant:
Han Rauwerda, University of Amsterdam
results:
20-40 times gain in computing time by algorithm using symbolic algebra
status:
completed
team:
Marijn van Eupen, Matthijs Kattenberg, Michel Nivard, Han Rauwerda, Dorret Boomsma
The e-infrastructure for bioscience research e-bioinfra is routinely used by researchers at the AMC to perform analysis of genomics data on the Dutch Grid, in particular for Next Generation Sequencing (NGS). The analysis steps are implemented as workflows that are executed on the grid in an automated fashion. Bioinformaticians at the AMC primarily run these workflows using the VBrowser, which also facilitates data manipulation on the grid storage. Selected applications are also available at the web interface of the e-bioinfra gateway for novice users. The goal of the project is to enable and enhance genomics research via advanced tools for data analysis. This is achieved in close collaboration with bioinformaticians.
applicant:
Silvia Olabarriaga, on behalf of the VLEMED VO, Amsterdam Medical Centre / University of Amsterdam
results:
Angela CM Luyf, Barbera DC van Schaik, Michel de Vries, Frank Baas, Antoine HC van Kampen, Silvia D Olabarriaga. Initial steps towards a production platform for DNA sequence analysis on the grid. BMC Bioinformatics 2010 Dec, 14;11:59
status:
ongoing
team:
Barbera van Schaik, Antoine van Kampen, Angela Luyf, Marcel Willemsen, Aldo Jongejan, Silvia Olabarriaga, Mark Santcroos, Jan Just Keijser, Shayan Shahand, Vladimir Korkhov, Souley Madougou
We want to investigate the use of Linux Vms with specific phylogenomics and population genomics software installed to do whole genome coalescent and phylogenetic analyses that are currently outside of our computational reach. The most important request here is for cpu time (e.g. several weeks of wall time with 24 processors).
applicant:
Hendrik Jan Megens, Wageningen University, Animal Breeding and Genomics Centre
The goal of the project is to have an annotation pipeline for microbial genomes. Pipeline connecting freely available software is already available as stand-alone version. Ideally it should be upgraded to a faster version (currently a step involving BLAST is limiting) and should be made accessible to other users in a web interface.
With powerful machines available at the Netherlands Centre for Electron Microscopy various forms of 3D reconstruction in EM can be realized. Reconstruction methods range from near-atomic resolution to the level of ultra-structure. These methods are computationally intensive. Parallelization and Grid computing have only been partially adopted in this field. Particle-reconstruction has been implemented on a cluster using the IMAGIC software. X-ray crystallography uses the CCP4-suite and this suite is partially made suitable for the Grid. Computerized EM tomography invokes computer clusters using the IMOD software. Here, special attention need be given to enhancement of the 3D reconstruction. The goal of this project is to bundle 3D-reconstruction tools into an e-science problem solving environment for nanoscopy that is Grid enabled.
applicant:
Fons Verbeek, Leiden Institute of Advanced Computer Science
results:
Currently, the commonly used open source EMAN and Imod software packages for single particle reconstruction and tomography respectively, as well as Imagic are available to run on the BiG Grid cloud. A web interface has been build for users to upload and analyse data. Further input from EM researchers for requirements is welcome. Storage capabilities and further IT infrastructure regarding accounts and data access are being reviewed in the perspective of the upcoming usage to the NeCEN microscope.
status:
ongoing
team:
Fons Verbeek, Floris Sicking, N. Pannu, Jan-Pieter Abrahams, Bram Koster
Advanced analyses, easier Electron microscopy is of invaluable importance for the study of the complex organization and the architecture of cellular structures. In recent years new and (mostly automated) electron microscopical techniques have been developed (like electron tomography (ET) and focused ion beam scanning electron microscopy (FIB-SEM) that provide us with 3-dimensional (3D) information of the cell. This added dimension gave us new insights in cellular structures, and the interrelatedness between organelles and cellular processes. Automated 3D analysis methods are still in their infancy. Data extraction is still mainly depending on manual segmentation techniques, and therefore time consuming and subjective.
Structured storage Modern 3D electron microscopic recording methods and techniques (not only (S)TEM Tomography, but also FIB-SEM, ILEM and SEM) will have to cope with ever growing amounts of data. This amount of data needs to be stored in a structured and well-organized manner in order to be and remain accessible to its users. Currently data-storage is done by the individual researcher in many different ways and forms. A better structured and more uniform way of storing data and organizing data-management is a precondition for the primary science case, but it also creates the opportunity for long term use and thus enabling future reuse of electron microscopy information by research institutes and their distant collaborators.
The primary objective of this BiG Grid project is, to improve computer intensive analysis methods - including 3D template matching of 3D electron microscopy tomography data. In addition, the improvements include providing broad access of these methods to the 3D electron microscopy community. The secondary objective is to implement a data storage system for 3D electron microscopy data.
In order to achieve these objectives, an intensive collaboration of expert partners is needed to establish an infrastructure for the analysis of 3D electron microscopy data.
This project contains 5 main activities 1. Reduce the total compute time and improve the compute capacity by implementation of existing 3D analysis algorithms on GPU's 2. Improvement of the 3D analysis process by implementation of additional analysis and information management tools, objectifying and improving reliability of 3D data mining of electron tomography volumes 3. Improvement of the accessibility by creating an intuitive user environment 4. Improvement of storage, retrieval, archival and the controlled sharing of 3D data for electron microscopy data-analysis 5. Ensuring continuity and availability of the created solutions during and after the duration of the project
This project stems from a previous collaboration (IOP: IGE03012/VL-E: CellTom) in a IOP genomics programme. The Sara e-science support team has assisted in creating the project proposal and has established the project organization.
The foundation of this project is a proposal by the 3D electron microscopy group of the Utrecht University. This proposal is supported by the Leiden University Medical Center.
Nearly all micro-array experiments are unique due to differences in experiment design, experimental procedure, level of completeness of the data and cellular responses. Also methods with which array studies are analyzed are topic of much bioinformatics research. Therefore state-of-the-art array analysis is subject to change and must be highly flexible. Another aspect to array analysis is that at some points much computing power is needed. Here we propose to set up an architecture that implements Problem Solving Environments for six problem areas from array design to downstream expression analysis. We will do this by setting up web services and by the configuration of dedicated Virtual Laboratory Machine Images that can be instantiated in a High Performance Compute Cloud. These Machine Images can be shared and dynamically scaled to a large virtual computer cluster. The content of these Machine Images will be documented and stored in a public Document Management Server. The resources used, input data and experimental results can be stored in a private Result Representation Server.
applicant:
Timo Breit, University of Amsterdam
results:
Progenius is a new tool for micro-array design that was not yet available to the public
From SNP data common haplotypes are identified by using a tools prepared by the NBIC BRS team. The tool is running in Galaxy and will be made available on the BiG Grid computing infrastructure
applicant:
Cisca Wijmenga, University of Groningen
results:
status:
ongoing
team:
Leon Mei, Marcel Kempenaar, Marc van Driel, Gerard te Meerman, Andre de Vries
The e-infrastructure for bioscience research, e-bioinfra, is routinely used by researchers at the AMC to perform medical image analysis on the Dutch Grid. The image analysis pipelines are implemented as workflows that are executed on the grid in an automated fashion. Various neuroimaging applications have been ported to this platform and made available for researchers from the Radiology, Psychiatry and other clinical departments at the AMC. The web interface of the e-bioinfra gateway provides easy access to novice users to applications such as FreeSurfer (brain surface segmentation) and DTI atlas construction. The goal of the project is to enable and enhance medical imaging research via advanced tools for data analysis. This is achieved in close collaboration with medical imaging researchers.
applicant:
Silvia Olabarriaga, on behalf of the VLEMED VO, Amsterdam Medical Centre / University of Amsterdam
results:
Users of the e-BioInfra Gateway, the web interface to the e-BioInfra platform for non-expert users, can now analyse their medical imaging data with the FSL BEDPOSTX on the BiG Grid resources. e-BioInfra Gateway is now equipped with four medical imaging and three sequencing data analysis applications that are utilized by the AMC researchers. For more information please refer to the gateway documentations.
S.D. Olabarriaga, T. Glatard, P.T. de Boer, "A Virtual Laboratory for Medical Image Analysis", IEEE Transactions on Information Technology In Biomedicine (TITB), 2010 Apr 5. M.W.A. Caan, F.M. Vos, L.J. van Vliet, A.H.C. van Kampen, S.D. Olabarriaga. "Gridifying a Diffusion Tensor Imaging Analysis Pipeline". Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid 2010) Melbourne, VIC, Australia, May 17-May 20. IEEE Computer Society. pp.733-738, 2010. T. Glatard, R. S. Soleman, D. J. Veltman, A.J. Nederveen, S. D. Olabarriaga. Large scale functional MRI study on a production grid, Future Generation Computer Systems vol. 26, no. 4, pp. 685-692, 2010
status:
ongoing
team:
Matthan Caan, Silvia Olabarriaga, Antoine van Kampen, Mark Santcroos, Jan Just Keijser, Shayan Shahand, Vladimir Korkhov, Souley Madougou
Magnetic resonance Imagery is a modern technique for recording brain activity. The analysis of this data, both functional and anatomical, will undoubtedly bring new insights in the neural basis of cognitive functioning. Up to now the complexity of analysis has been determined by the available computational power and certain types of approaches have been avoided in typical analytical approaches. In this project we want to evaluate whether is it feasible to use computational heavy approaches to noise reduction, and anatomical and functional connectivity methods, for the normal experiments that are being conducted at the Spinoza Center for NeuroImaging.
applicant:
Steven Scholte, University of Amsterdam
results:
anticipated results: Matlab module to serve standard fMRI preprocessing tool for all brain and cognition researchers, and a DTI analysis tool for new type of connectivity analysis
status:
ongoing
team:
Steven Scholte, Sennay Ghebreab, Lourens Waldorp, Caan Matthan
Metabolomics is a rapidly growing discipline, relying on bioinformatics for data processing. Large amounts of data are being generated in metabolomics studies. The process of extracting biological information can be seen as an integrated workflow. It is recognized that this workflow can benefit highly from coordinated (and automated) handling and processing of the data. The need for tools and applications to support the data handling and biological interpretation is huge, but online availability of metabolomics data and tools is poor. This is hampering progress and standardization of the scientific field. The Netherlands Metabolomics Centre, in collaboration with the Netherlands Bioinformatics Centre, has a dedicated project that supports the development of an infrastructure to share metabolomics data and tools: the NMC Data Support Platform. This project addresses two major bottlenecks for metabolomics research. The first is sharing of metabolomics studies and data. The second addresses the accessibility of dedicated processing and biostatistics tools. This goal has a number of bioinformatics and e-science challenges: some tools require high performance computing, and the tools also need to be integrated into a data processing tool chain. This project proposes collaboration between eBioGrid and the NBIC/NMC data support platform taskforce of programmers, with the aim to tackle the e-science challenges. The output of the project will be an online computing environment where sets of preprocessing, biostatistics and quality control tools are made accessible for all NMC biologists and biostatisticians, and any interested users from the international community.
applicant:
Theo Reijmers, LACDR, Leiden University
results:
The e-BioGrid programmers position has recently been filled by Mahdi Jaghouri, and is now ready to start developing an infrastructure to share metabolomics data and tools, with acces to high-performance computing whenever required.
status:
ongoing
team:
Theo Reijmers, , Margriet Hendriks, Kees van Bochove, M. van Vliet, G. Zwanenburg, J. Bouwman, J. Wesbeek, S. Sikkema, T. Abma, Mahdi Jaghouri
In this project modular workflows are developed for robust, automated and efficient analysis of LC-MS data. The goal in this project is to develop a suite of efficient programs to eliminate existing bottlenecks in the high-throughput analysis of LC-MS data, i.e. to develop and implement robust parallel software for chromatographic alignment, retention time prediction, calibration of MS data and extraction of quantitative information from LC-MS datasets. A typical analysis workflow contains at least one component for matching tandem mass spectra to predicted peptide fragmentation patterns. Examples would be X!Tandem or Crux but we are also developing our own software. We have already integrated the serial version of X!Tandem in a Taverna workflow with PeptideProphet and some of our own tools for alignment and calibration, such as pepAlign and msRecal. All these tools/algorithms are open source and have been described in recent literature, although only X!Tandem has been parallelized previously.
applicant:
Magnus Palmblad, Leiden University Medical Center
results:
different workflows and workflow components for proteomics data analysis are implemented to run on the cloud. Paper describing the use of scientific workflow management system in proteomics is also published (de Bruin, Deelder, and Palmblad, Scientific Workflow Management in Proteomics, Mol. Cell. Proteomics. 2012). Two other manuscripts are in preparation.
status:
ongoing
team:
Magnus Palmblad, Yassene Mohammed, Andre M. Deelder
The FOM institute AMOLF is currently a part of the COMMIT project for e-biobanking of large mass spectrometric datasets. One aim of this project is the collection, storage and analysis of large mass spectrometric imaging datasets. The highest performance mass spectrometers, Fourier transform ion cyclotron resonance (FT-ICR MS), offer unrivalled chemical specificity. This high performance requires large (4 MB-16 MB) individual data files for each mass spectrum. A full MS imaging scan of a biological tissue usually requires ~4,000 individual spectra, yielding complete datasets with a size of 15 GB-100 GB. This data is then processed, which entails a zero-filling of the data (increasing the individual data size by 2x), application of an apodization function (CPU/time intensive) and a Fast Fourier transform. BigImage will use hardware resources to roll out work-flow based data analysis software onto the BiG Grid. The requested core hours will support testing on BiG Grid, as well as analysis of FT-ICR MS imaging datasets of breast cancer tissues. After basic processing on BiG Grid, the core hours will be used to extend the data analysis capabilities of the work-flow based software to include multi-variate statistical analysis tools.
applicant:
Donald Smith, FOM-Instituut voor Atoom-Molecuulfysica
results:
The use of BiG Grid for analysis of large FT-ICR MS imaging datasets will yield a dramatic decrease in analysis time. In addition, the ability to apply advanced algorithms will improve the mass spectral performance and will be applied for the first time to FT-ICR MS imaging datasets. Combined, the results will yield a unique capability for unrivalled rapid data analysis of high resolution FT-ICR MS imaging datasets. New statistical analysis modules in Chameleon should result in unique classifiers for diseased tissues based on integrated multi-modal data processing on BiG Grid.
status:
ongoing
team:
Donald Smith, Ron Heeren, Carl Schultz, Nadine Mascini
The Galaxy server, serviced by NBIC, is used to run generic bioinformatics tools sequence and proteomics analysis through a standard web-user interface. As many of the tools are CPU and storage demanding, running the Galaxy server on a high-performing computing cloud will expand its computer capacity.
applicant:
Rob Hooft, Netherlands Bioinformatics Centre
results:
Galaxy images are being build to be installed on the clouds
The Microarray Department/Integrative Bioinformatics Unit at the University of Amsterdam will setup the HPC Cloud environment as a flexible and scalable environment for microarray design and analysis. From a local R session we want to be able to initialize a HPC Cloud computer cluster on the fly, use it from the local R session and shut the cluster down when no longer needed for.
applicant:
Timo Breit, University of Amsterdam
results:
from a local R statistics session cloud computing can be initiated and terminated whenever required.
The e-BioInfra platform provides facilities to run large data analysis experiments on the Dutch Grid. The project includes software and system design, development and deployment as services for the AMC researchers community. The platform is based on workflow technology, including also data transfer, monitoring and provenance services. The team also provides support to researchers that wish to perform experiments on the grid infrastructure. The web interface of the e-bioinfra gateway provides easy access to novice users. BiGGrid funds one member of the e-bioscience team (Mark Santcroos) to improve the link between the e-Bioinfra and the Dutch grid resources and services. Activities involve development and integration of new middleware tools, user support, definition of guidelines and best practices, and platform dissemination to a larger community of biomedical and life science researchers.
applicant:
Silvia Olabarriaga, Antoine van Kampen and Jan Just Keiser, Amsterdam Medical Centre / University of Amsterdam
results:
S.D. Olabarriaga, T. Glatard, P.T. de Boer, "A Virtual Laboratory for Medical Image Analysis", IEEE Transactions on Information Technology In Biomedicine (TITB), 2010 Apr 5. M.W.A. Caan, F.M. Vos, L.J. van Vliet, A.H.C. van Kampen, S.D. Olabarriaga. Gridifying a Diffusion Tensor Imaging Analysis Pipeline. Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid 2010) Melbourne, VIC, Australia, May 17-May 20. IEEE Computer Society. pp.733-738, 2010. S. Shahand, M. Santcroos, Y. Mohammed, V. Korkhov, A. Luyf, A. van Kampen and S. Olabarriaga. Front-ends to Biomedical Data Analysis on Grids. Proceedings of HealthGrid 2011 (in press), 2011
status:
ongoing
team:
Mark Santcroos, Silvia Olabarriaga, Jan Just Keijser, Antoine van Kampen, Shayan Shahand, Vladimir Korkhov, Souley Madougou
Development of generic infrastructure program Data Analysis Framework (DAF) to make simpler processing of data intensive tasks on Grid. DAF provides for each users quota limited disk space to upload the starting input files and store the processing results. On the data available at the user disk space data processing tasks can be executed based on integrated command line tools in DAF. All data processing services including file I/O to and from the user´s disk space can be accessed via web service requests. DAF is using glite for job submission and enhanced ToPoS pilot job system to lower job submission errors. We plan to integrate DAF with data management software such as Molgenis or OpenBIS to develop a fully integrated data analysis platform. We also plan to provide an easy-to-use web interface, where generic web pages are created for integrated tools and workflow and scientific visualization platform providing the visualization support. See also the application to a proteomics data analysis infrastructure here.
applicant:
Ishtiaq Ahmad, University of Groningen, Department of Pharmacy, Analytical Biochemistry
results:
DAF in current state is already used to provide high-throughput time alignment service¹ based on Warp2D tool² for LC-MS peak list accessible at http://www.nbpp.nl/warp2d.html. Integrated msComapre6 workflow in NBIC Galaxy server. Other results relate to the tools and workflows, that we intend to integrate in DAF and already mentioned above.
1. Ahmad I, Suits F, Hoekman B, Swertz MA, Byelas H, Dijkstra M, Hooft R, Katsubo D, van Breukelen B, Bischoff R, Horvatovich P., A high-throughput processing service for retention time alignment of complex proteomics and metabolomics LC-MS data, Bioinformatics, 2011, 27(8):1176-1178, PMID: 21349866 2. Suits F, Lepre J, Du P, Bischoff R, Horvatovich P., Two-dimensional method for time aligning liquid chromatography-mass spectrometry data, Anal Chem., 2008, 80(9):3095-3104, PMID: 18396914 3. Christin C, Hoefsloot HC, Smilde AK, Suits F, Bischoff R, Horvatovich PL., Time alignment algorithms based on selected mass traces for complex LC-MS data, J Proteome Res., 2010, 9(3):1483-1495, PMID: 20070124 4. Christin C, Smilde AK, Hoefsloot HC, Suits F, Bischoff R, Horvatovich PL., Optimized time alignment algorithm for LC-MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms, Anal Chem., 2008, 80(18):7012-7021, PMID: 18715018 5. Christin, C., Hoefsloot, H. C. J., Smilde, A. K., Hoekman, B., Bischoff, R., Horvatovich, P., A critical assessment of statistical methods for biomarker discovery in clinical proteomics, manuscript submitted to Molecular & Cellular Proteomics. 6. Hoekman, B., Breitling, R., Suits, F., Bischoff, R., Horvatovich, P., msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative
status:
ongoing
team:
Isthiaq Ahmad, Berend Hoekman, Peter Horvatovich, Rainer Bischoff and collaborators from Gaining Momentum Initiative
The SHIWA VO was created for the SHIWA project (shiwa-workflow.eu). It will be used for testing the SHIWA Simulation Platform (SSP), which will enable scientists to share and run workflows on DCIs. The project develops solutions interoperable workflows, including management of credentials across DCIs. The tests to be performed on the SHIWA VO resources initially will be intended to show viability of the adopted solutions on production infrastructures. This VO is already supported by the French NGI, but we need more sites to support it to enable testing under more realistic conditions.
We want to test a Web service fail-over system for a Danish web service by use of a virtual machine that can perform the same task. This Virtual machine will be parked at both a computer centre in Munchen and one in the UK. We will at fixed times let the web service in Denmark go down at which time our calling program will one way or another launch one of those two virtual machines to do the one second call at your cloud.
The Netherlands have 150 biobanks with over 400,000 samples in total. To exploit these billions worth of material they are now embarking on large scale genetic profiling. An example is the highly visible Genome of the Netherlands project that will sequence the DNA of 750 Dutch individuals completely to elucidate the genetic diversity in the Dutch population, and to impute this new information onto existing more sparse genetic information of 100.000 Dutch individuals. However, the data handling and computational needs are enormous and Dutch Institutes are struggling to effectively use the hardware infrastructures available.
This e-BioGrid subproject will overcome this barrier by interfacing the data processing tools used in biobanking to the existing BiGGrid infrastructure and by supporting the Dutch biobanking community to deploy these tools for their large data and processing challenges. Applications include (1) high-throughput genetics studies from next generation sequencing of biobank samples (>750), (2) genome wide imputation and association studies (>100,000) and (3) follow-up BBMRI-NL projects that are currently being drafted by the BBMRI-NL steering commitee. We also aim to pilot knowledge discovery across biobanks by connecting this new information with existing information from over 150 existing Dutch biobanks.
Envisioned short-term results are high-impact scientific publications of these biobank studies. Long-term results are availability of flexible and scalable e-Science tools for large scale biobanking and optimized GRID infrastructure for biobanks within BigGrid, SARA, NBIC and the Life Sciences community to bring the Netherlands at the forefront of the next generation of genetics and populations research.
applicant:
Morris Swertz, University Medical Center Groningen
results:
Whereas some of the analysis pipelines are already available on Grid resources (e.g. imputation), others are ongoing and planned (SNP calling, analysis of structural variations). The read alignment pipeline is further being developed such that other genome sequencing projects can make use of the analysis infrastructure as well. See also the wiki for progress updates.
status:
ongoing
team:
Jan Bot, Pieter Neerincx, Abhishek Narain, George Byelas, Tom Visser