Information Gathering

- Bioinformatics Workflow / Biological Workflow

Bioinformatics Workflow / Biological Workflow

Workflow - is the operational aspect of a work procedure: how tasks are structured, who performs them, what their relative order is, how they are synchronized, how information flows to support the tasks and how tasks are being tracked ... Scientific workflows found wide acceptance in the fields of bioinformatics and cheminformatics in the early 2000s, where they successfully met the need for multiple interconnected tools, handling of multiple data formats and large data quantities (wikipedia).

A Bioinformatics workflow development environment is a software tool designed specifically to compose and execute workflows in bioinformatics (wikipedia).

Why Bioinformatics Workflow?

Bioinformatics has huge amount of information to deal with. (DNA, protein sequences, structure data, bioinformatics literature, etc)
There is an exponential increase of information.
The data is distributed over network, in different formats, in heterogeneous data structure and information systems.
Bioinformatics is a growing field, new research areas and data are discovered daily.
To enable bioinformatics workers to work effectively through multiple interconnected tools, to handle multiple data formats and large data quantities, workflow management needs to be integrated into the bioinformatics workers' daily working process.

Bioinformatics workflow needs to address the following issues

How to deal with different data types, different data sources, and heterogeneous systems? How to represent data so that different users can use them?
Data conversion issues, as bioinformatics workflow is very data driven. (It may not have complex control flow dependencies, but a lot of data representation issues.)
Bioinformatics workflow needs a lot of processing power. (GRID computing)
Tools creation for non-computing users, design tools easy to use and visualize data.

Conferences / Workshop (related to bio-workflow)

NETTAB - Network Tools and Applications in Biology (NETTAB) workshops. This is a series of workshops focused on the most promising and innovative ICT tools and to their usefulness in Bioinformatics.

2005 - Bioinformatics Workflows - 5-7 October, 2005, Second University of Naples, Naples, Italy. Program
2006 - Grid Applications - July 10-13, 2006, Santa Margherita di Pula (somewhere in Italy). Program
2007 - Semantic Web - June 12-15, 2007, University of Pisa, Italy

Note: From my observation, this is a Bioinformatics conference / workshop organized by an Italian organizing committee, starting 2001. All the conference venues are in Italy, mostly in Italian universities. Their focus is on "ICT tools and their usefulness in Bioinformatics", with a yearly theme.

Bioinformatics Workflow Development Environments

System	Description	Organizations	People	People / Papers

Taverna	The Taverna workbench is an open source software tool for designing and executing bioinformatics workflows created by the myGrid project. Taverna allows users to integrate many different software tools, including web services.	European Bioinformatics Institute (EBI), IT Innovation School of Computer Science, University of Newcastle School of Computer Science at the University of Manchester Nottingham University Mixed Reality Lab	See University info below	Taverna: Lessons ... Paper

Triana	The Triana project is an open source problem solving environment developed at Cardiff University that combines an intuitive visual interface with powerful data analysis tools.	Cardiff University	Team with e-mail	Papers

Pegasus	Pegasus is a flexible framework that enables the mapping of complex scientific workflows onto the grid developed at the Information Sciences Institute at the University of Southern California	University of Southern California, CS Dept	People Page of Pegasus site	Publications page of Pegasus site

Kepler	The Kepler workflow system enables scientists in a variety of disciplines like biology, ecology and astronomy to compose and execute workflows. Kepler is based on the Ptolemy II system for heterogeneous, concurrent modeling and design. Ptolemy II was developed by the members of the Ptolemy project at University of California Berkeley.		Members (bottom of pg)	Publications

DiscoveryNet	DiscoveryNet is a £2m EPSRC-funded project to an e-Science platform for scientific discovery from the data generated by a wide variety of high throughput devices at Imperial College London	The London e-Science Centre, Imperial College London	Team	Papers

BioSense	BioSense is the national program designed to improve the nation's capabilities for real-time biosurveillance and situational awareness at a time when the vast number of health-related information systems that exist nationally vary in their ability to share data to support immediate biosurveillance needs. By providing access to data from hospitals and healthcare systems in major metropolitan cities across the nation, BioSense is connecting existing health information to public health in a way not previously possible.	the Centers for Disease Control and Prevention (CDC)	Not sure biosensehelp@cdc.gov	Papers

WildFire	Wildfire is a distributed, Grid-enabled workflow construction and execution environment. It has a graphical user interface for constructing and running workflows. Wildfire borrows user interface features from Jemboss and adds a drag-and-drop interface allowing the user to compose EMBOSS (and other) programs into workflows. For execution, Wildfire uses GEL, the underlying workflow execution engine, which can exploit available parallelism on multiple CPU machines including Beowulf-class clusters and Grids.		See WilfFire link

MOLQuest	"... is a desktop application for sequence analysis and molecular biology data management. MolQuest includes the fastest and most accurate family of eukaryotic genefinding programs, fgenesh/fgenesh+, for a variety of different genomes, as well as pipelines for fully automatic annotation of eukaryotic (fgenesh++) and bacterial (fgenesb) genomes, that were widely used in scientific publications and are well-known for ther accuracy. The package provides a user-friendly interface for sequence editing, primer design, internet database searches, gene prediction, promoter identification, regulatory elements mapping, patterns discovery protein analysis, multiple sequence alignment, phylogenetic reconstruction, and a wide variety of other functions..."	SoftBerry, Inc	Not Sure Link to contact page

BioEclipse	The Bioclipse project is aimed at creating a Java-based, open source, visual platform for chemo- and bioinformatics based on the Eclipse Rich Client Platform (RCP). Bioclipse will provide functionality for chemo- and bioinformatics, and extension points that easily can be extended by plugins to provide added functionality. The first version of Bioclipse includes a CDK-plugin (bc_cdk) to provide a chemoinformatic backend, a Jmol-plugin (bc_jmol) for 3D-visualization and a general logging plugin.	Bioclipse is develped as a collaboration between the Proteochemometric Group , Dept. of Pharmaceutical Biosciences, Uppsala University, Sweden, and the Research Group for Molecular Informatics at Cologne University Bioinformatics Center (CUBIC).	Uppsala U Team (Scroll Down) Cologne U Team	Papers Papers

LabVIEW	LabVIEW (short for Laboratory Virtual Instrumentation Engineering Workbench) is a platform and development environment for a visual programming language from National Instruments. The graphical language is named "G". Originally released for the Apple Macintosh in 1986, LabVIEW is commonly used for data acquisition, instrument control, and industrial automation on a variety of platforms including Microsoft Windows, various flavors of UNIX, Linux, and Mac OS.	National Instruments There are a lot of visual programming languages (not many are for bioinformatics)	Not sure about development team There is a large LabVIEW user community

University / Organization	Name	E-mail	Related Project	Papers (as starting point)

School of Computer Science at the University of Manchester	Informatics Process Group (IPG) Bio-Health Informatics	Their main Staff Directory	myGrid, Taverna, ISPIDER, Qurator etc CLEF; ComparaGRID; CO-ODE / HyontUse; Sealife

School of Computer Science, University of Newcastle	the North-East Regional e-Science Centre (NEReSC).	Their main Staff Directory	Dynasoar, myGrid, Microbase, BASIS, etc	Taverna: a tool for the composition and enactment of bioinformatics workflowspaper

Standford Medical Informatics	People

Other Conferences (Related to Bioinformatics)

Conferences	Venue / Date	Scale	Description

International Society for Computational Biology	ISMB, annual PSB, annual ECCB, RECOMB, CHI conferences, and regional and commercial conferences. Past Conferences.	International, Huge	Organizes many conferences, PSB is one of the conferences.

The Fourth Asia Pacific Bioinformatics Conference	13-16 Feb, 2006, Taiwan	Annual

9th International Northern European Bioinformatics Conference	June 4 - 7, 2007, Umeå, Sweden	(annual) 9th meeting organised by the Society for Bioinformatics in Northern Europe (SocBiN)

German Conference on Bioinformatics 2006	Tübingen, 20-22 September 2006	Annual, International organised by German organizations	organized by the Center for Bioinformatics Tübingen (ZBIT) and MPI for Developmental Biology.

2006 LSS Computational Systems Bioinformatics Conference	Aug 14 - 18, 2006, Standford University, California	Fifth conference. Annual

BeNeLux BioInformatics Conference (BBC2006)	17th and 18th of October 2006, WICC - hotel and conference centre - Wageningen, The Netherlands	2nd conference. Annual	organized by ... KNCV Workgroup BioInformatics, Belgium Bioinformatics Groups, Local Committee Wageningen

Ohio Collaborative Conference on Bioinformatics	Miami University, Oxford, Ohio July 9-11, 2007	Annual, regional	foster long-term collaborative relationships among informatics and life sciences researchers from academia, government and industry, spanning interests across Ohio

International Conference on Bioinformatics and its Applications (ICBA’04)	December 16-19, 2004 Nova Southeastern University Fort Lauderdale, Florida, USA	One Time Conference

O'Reilly Bioinformatics Technology Conference 2003	Westin Horton Plaza, San Diego, CA, Feb 3-6, 2003	One Time Conference

Some thoughts:

Long term goal:
Use the conference as a forum for international bioinformatics workflow system users / developers to present, discuss and agree on common acceptable data formats, operation platforms or simply, a way to organize/perform bioinformatics workflow tasks.

Discussion forum:
Potential future research / project opportunities for bioinformatics workflow system.
Potential enhancement opportunities for existing systems.
Potential existing systems enhancements to achieve common format.

Workshop / tutorial (1 to 2 hours):
Invite commercial (e.g MOLQUEST) / academic () workflow system developers to conduct tutorial workshop on the system.

NETTAB changes focus (theme) each year to attract papers from different areas of bioinformatics workflow. This conference can divide workflow into different areas (e.g. GRID, web services, semantics web, workflow applications, health care implications, visual language community, etc) to attract people from different areas.