113
whereas others were made Open Source by releasing in-
house codebases under liberal licenses. When the Blue
Obelisk was established five years ago, the primary
toolkits under active development were the Chemistry
Development Kit (CDK) [5,6], Open Babel [7], and JOE-
Lib [8]. Of these, both the CDK and Open Babel con-
tinue to be actively developed.
The CDK project has been under regular development
over the last five years. Several features have been
implemented ranging from core components such as an
extensible SMARTS matching system and a new graph
(and subgraph) isomorphism method [9], to more appli-
cation oriented components such as 3D pharmacophore
searching and matching, and a variety of structural-key
and hashed fingerprints. In addition, there have been a
number of second generation tools developed on top of
the CDK (see below). As well as the use of the CDK in
various tools, it has been deployed in the form of web
services [10] and has formed the basis of a variety of
web applications.
Since 2006, major new features of Open Babel include
3D structure generation and 2D structure-diagram gen-
eration, UFF and MMFF94 forcefields, and significantly
expanded support for computational chemistry calcula-
tions. In addition, a major focus of Open Babel develop-
ment has been to provide for accurate conversion and
representation in areas of stereochemistry, kekulisation,
and canonicalisation. The project has also grown, in
terms of new contributors, new support from commer-
cial companies, and second-generation tools applying
Open Babel to a variety of end-user applications, from
molecular editors to chemical database systems.
Two new Open Source cheminformatics toolkits have
appeared since the original paper. In 2006 Rational Dis-
covery, a cheminformatics service company (since closed
down), released RDKit [11] under the BSDLicense. This
is a C++ library with Python and (more recently) Java
bindings. RDKit is actively developed and includes code
donated by Novartis. Recent developments include the
Java bindings, as well as performance improvements for
its database cartridge.
More recently, GGA Software Services (a contract
programming company) released the Indigo toolkit [12]
and associated software in 2009 under the GPL. Indigo
is a C++ library with high-level wrappers in C, Java,
Python, and the .NET environment. Like RDKit and
other toolkits, Indigo provides support for tetrahedral
and cis-trans stereochemistry, 2D coordinate generation,
exact/substructure/SMARTS matching, fingerprint gen-
eration, and canonical SMILES computation. It also pro-
vides some less common functionality, like matching
tautomers and resonance substructures, enumeration of
subgraphs, finding maximum common substructure of
Ninput structures, and enumerating reaction products.
Second-generation tools
Although feature-rich and robust cheminformatics
toolkits are useful in and of themselves, they can also be
seen as providing a base layer on which additional tools
and applications can be built. This is one of the reasons
that cheminformatics toolkits are so important to the
open source ‘ecosystem’; their availability lowers the bar-
rier for the development of a ‘second generation’ of
chemistry software that no longer needs to concern
itself with the low-level details of manipulating chemical
structures, and can focus on providing additional func-
tionality and ease-of-use. Although a wide range of
chemistry software has been built using Blue Obelisk
components (see for example, the “Related Software”
link on the Open Babel website, [13] listing over 40 pro-
jects as of this writing, or “Software using CDK” at the
CDK website), in this section we focus on second-gen-
eration tools which themselves have been developed by
members of the Blue Obelisk.
Bioclipse [14] (v2.4 released in Aug 2010) and Avoga-
dro [15] (v1.0 in Oct 2009) are two examples of such
software, based on the CDK and Open Babel, respec-
tively. Bioclipse (Figure 1) is an award-winning molecu-
lar workbench for life sciences that wraps
cheminformatics functionality behind user-friendly inter-
faces and graphical editors while Avogadro (Figure 2) is
a3D molecular editor and viewer aimed at preparing
and analysing computational chemistry calculations.
Both projects are designed to be extended or scripted by
users through the provision of a plugin architecture and
scripting support (using Bioclipse Scripting Language
[16], or Python in the case of Avogadro). An interesting
aspect of both Avogadro and Bioclipse is that they share
some developers with the underlying toolkits and this
has driven the development of new features in the CDK
and Open Babel.
Both products in turn act as extensible platforms for
other software. Bioclipse, for example is used by soft-
ware such as Brunn [17], a laboratory information sys-
tem for microplate based high-throughput screening.
Brunn provides a graphical interface for handling differ-
ent plate layouts and dilution series and can automati-
cally generate dose response curves and calculate IC
50
-
values. Avogadro is used by Kalzium [18], a periodic
table and chemical editor in KDE, and XtalOpt [19,20],
an evolutionary algorithm for crystal structure predic-
tion. XtalOpt provides a graphical interface using Avo-
gadro and submits calculations using a range of solid-
state simulation software to predict stable polymorphs.
Afinal example of second-generation Blue Obelisk
software is the AMBIT2 [21,22] software, which was
developed to facilitate registration of chemicals for the
REACH EU directive, and is based on the CDK. It was
distributed initially as a standalone Java Swing GUI, and
O’Boyle et al. Journal of Cheminformatics 2011,3:37
http://www.jcheminf.com/content/3/1/37
Page 4of16