2 Installing Bioperl on Windows
11 * 6 Bioperl on Windows
14 * 7.1 Setting environment variables
15 * 7.2 Installing bioperl-db
18 * 9 bioperl-db in Cygwin
20 * 11 MySQL and DBD::mysql
22 * 13 Directory for temporary files
28 This installation guide was written by Barry Moore, Nathan Haigh
29 and other Bioperl authors based on the original work of Paul Boutros. The
30 guide was updated for the BioPerl wiki by Chris Fields and Nathan
33 Please report problems and/or fixes to the BioPerl mailing list.
35 An up-to-date version of this document can be found on the BioPerl wiki:
37 http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
41 1) Only ActivePerl >= 5.8.8.819 is supported by the Bioperl team.
42 Earlier versions may work, but we do not support them.
44 One of the reason for this requirement is that ActivePerl >= 5.8.8.819 now
45 use Perl Package Manager 4 (PPM4). PPM4 is now superior to earlier
46 versions and also includes a Graphical User Interface (GUI). In short,
47 it's easier for us to produce and maintain a package for installation via
48 PPM and also easier for you to do the install! Proceed with earlier
49 versions at your own risk.
53 1) Download the ActivePerl MSI from ActiveState
55 2) Run the ActivePerl Installer (accepting all defaults is fine).
57 3) Start the Perl Package Manager GUI from the Start menu.
59 4) Go to Edit >> Preferences and click the Repositories tab. Add a
60 new repository for each of the following:
63 +----------------------------------------------------------------+
65 |--------------------------+-------------------------------------|
66 |BioPerl-Release Candidates| http://bioperl.org/DIST/RC |
67 |--------------------------+-------------------------------------|
68 |BioPerl-Regular Releases | http://bioperl.org/DIST |
69 |--------------------------+-------------------------------------|
70 |Kobes | http://theoryx5.uwinnipeg.ca/ppms|
71 |--------------------------+-------------------------------------|
72 |Bribes | http://www.Bribes.org/perl/ppm |
73 +----------------------------------------------------------------+
76 5) Select View >> All Packages.
78 6) In the search box type bioperl.
80 7) Right click the latest version of Bioperl available and choose
83 7a) This package will be sufficient for the main
84 functionality of Bioperl. However, if you require
85 full functionality, you should also install the
86 latest Bundle-BioPerl package.
88 8) Click the green arrow (Run marked actions) to complete the
91 9) Go to the Bioperl Wiki and start reading documentation.
95 Bioperl is a large collection of Perl modules (extensions to the
96 Perl language) that aid in the task of writing Perl code to deal
97 with sequence data in a myriad of ways. Bioperl provides objects for
98 various types of sequence data and their associated features and
99 annotations. It provides interfaces for analysis of these sequences with a
100 wide variety of external programs (BLAST, FASTA, clustalw and
101 EMBOSS to name just a few). It provides interfaces to various types of
102 databases both remote (GenBank, EMBL etc) and local (MySQL,
103 Flat_databases flat files, GFF etc.) for storage and retrieval of
104 sequences. And finally with its associated documentation and
105 mailing lists, Bioperl represents a community of bioinformatics
106 professionals working in Perl who are committed to supporting both
107 development of Bioperl and the new users who are drawn to the project.
109 While most bioinformatics and computational biology applications are
110 developed in UNIX/Linux environments, more and more programs are
111 being ported to other operating systems like Windows, and many users
112 (often biologists with little background in programming) are looking for
113 ways to automate bioinformatics analyses in the Windows environment.
115 Perl and Bioperl can be installed natively on Windows NT/2000/XP.
116 Most of the functionality of Bioperl is available with this type of
117 install. Much of the heavy lifting in bioinformatics is done by programs
118 originally developed in lower level languages like C and Pascal
119 (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as
120 a wrapper for running and parsing output from these external programs.
122 Some of those programs (BLAST for example) are ported to Windows.
123 These can be installed and work quite happily with Bioperl in the native
124 Windows environment. Some external programs such as Staden and the
125 EMBOSS suite of programs can only be installed on Windows by using
126 Cygwin and its gcc C compiler (see Bioperl in Cygwin, below).
127 Recent attempts to port EMBOSS to Windows, however, have been mostly
130 If you have a fairly simple project in mind, want to start using Bioperl
131 quickly, only have access to a computer running Windows, and/or don't mind
132 bumping up against some limitations then Bioperl on Windows may be a
133 good place for you to start. For example, downloading a bunch of sequences
134 from GenBank and sorting out the ones that have a particular
135 annotation or feature works great. Running a bunch of your sequences
136 against remote or local BLAST, parsing the output and storing it
137 in a MySQL database would be fine also.
139 Be aware that most Bioperl developers are working in some type of a
140 UNIX environment (Linux, OS X, Cygwin). If you have
141 problems with Bioperl that are specific to the Windows environment, you
142 may be blazing new ground and your pleas for help on the Bioperl mailing
143 list may get few responses (you can but try!) - simply because no one
144 knows the answer to your Windows specific problem. If this is or becomes a
145 problem for you then you are better off working in some type of UNIX-like
146 environment. One solution to this problem that will keep you working on a
147 Windows machine it to install Cygwin, a UNIX emulation environment for
148 Windows. A number of Bioperl users are using this approach successfully
149 and it is discussed in more detail below.
153 There are a couple of ways of installing Perl on a Windows machine. The
154 most common and easiest is to get the most recent build from
155 ActiveState, a software company that provides free builds of Perl for
156 Windows users. The current (October 2006) build is ActivePerl 5.8.8.819.
157 Bioperl also works on Perl 5.6.x but due to installation problems etc,
158 only ActivePerl 5.8.8.819 or later is supported. To install ActivePerl on
161 1) Download the ActivePerl MSI from
162 http://www.activestate.com/Products/ActivePerl/.
164 2) Run the ActivePerl Installer (accepting all defaults is fine).
166 You can also build Perl yourself (which requires a C compiler) or download
167 one of the other binary distributions. The Perl source for building it
168 yourself is available from CPAN, as are a few other binary
169 distributions that are alternatives to ActiveState. This approach is not
170 recommended unless you have specific reasons for doing so and know what
171 you're doing. If that's the case you probably don't need to be reading
174 Cygwin is a UNIX emulation environment for Windows and comes with
175 its own copy of Perl.
177 Information on Cygwin and Bioperl is found below.
181 Perl is a programming language that has been extended a lot by the
182 addition of external modules.
184 These modules work with the core language to extend the functionality of
187 Bioperl is one such extension to Perl. These modular extensions to
188 Perl sometimes depend on the functionality of other Perl modules and this
189 creates a dependency. You can't install module X unless you have already
190 installed module Y. Some Perl modules are so fundamentally useful that the
191 Perl developers have included them in the core distribution of Perl - if
192 you've installed Perl then these modules are already installed. Other
193 modules are freely available from CPAN, but you'll have to install them
194 yourself if you want to use them. Bioperl has such dependencies.
196 Bioperl is actually a large collection of Perl modules (over 1000
197 currently) and these modules are split into seven packages. These seven
200 +------------------------------------------------------------------------+
201 | Bioperl Group | Functions |
202 |----------------------+-------------------------------------------------|
203 |bioperl (the core) |Most of the main functionality of Bioperl |
204 |----------------------+-------------------------------------------------|
205 |bioperl-run |Wrappers to a lot of external programs |
206 |----------------------+-------------------------------------------------|
207 |bioperl-ext |Interaction with some alignment functions and the|
209 |----------------------+-------------------------------------------------|
210 |bioperl-db |Using Bioperl with BioSQL and local relational |
212 |----------------------+-------------------------------------------------|
213 |bioperl-microarray |Microarray specific functions |
214 |----------------------+-------------------------------------------------|
215 |bioperl-pedigree |manipulating genotype, marker, and individual |
216 | |data for linkage studies |
217 |----------------------+-------------------------------------------------|
218 |bioperl-gui |Some preliminary work on a graphical user |
219 | |interface to some Bioperl functions |
220 +------------------------------------------------------------------------+
222 The Bioperl core is what most new users will want to start with. Bioperl
223 (the core) and the Perl modules that it depends on can be easily installed
224 with the perl package Manager PPM. PPM is an ActivePerl utility for
225 installing Perl modules on systems using ActivePerl. PPM will look online
226 (you have to be connected to the internet of course) for files (these
227 files end with .ppd) that tell it how to install the modules you want and
228 what other modules your new modules depends on. It will then download and
229 install your modules and all dependent modules for you.
231 These .ppd files are stored online in PPM repositories. ActiveState
232 maintains the largest PPM repository and when you installed ActivePerl PPM
233 was installed with directions for using the ActiveState repositories.
234 Unfortunately the ActiveState repositories are far from complete and other
235 ActivePerl users maintain their own PPM repositories to fill in the gaps.
236 Installing will require you to direct PPM to look in three new
237 repositories as detailed in Installation Guide.
239 Once PPM knows where to look for Bioperl and it's dependencies you simply
240 tell PPM to search for packages with a particular name, select those of
241 interest and then tell PPM to install the selected packages.
245 You may find that you want some of the features of other Bioperl groups
246 like bioperl-run or bioperl-db. Currently, plans include setting up PPM
247 packages for installing these parts of Bioperl; check this by doing a
248 Bioperl search in PPM. If these are not available, though, you can use
249 the following instructions for installing the other distributions.
251 For this you will need a Windows version of the program make
254 http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe
256 You will also want to have a willingness to experiment. You'll have to
257 read the installation documents for each component that you want to
258 install, and use nmake where the instructions call for make, like so:
265 'nmake test' will likely produce lots of warnings, many of these can be
266 safely ignored (these stem from the excessively paranoid '-w' flag in
267 ActivePerl). You will have to determine from the installation documents
268 what dependencies are required, and you will have to get them, read their
269 documentation and install them first. It is recommended that you look
270 through the PPM repositories for any modules before resorting to using
271 nmake as there isn't any guarantee modules built using nmake will work.
272 The details of this are beyond the scope of this guide. Read the
273 documentation. Search Google. Try your best, and if you get stuck consult
274 with others on the BioPerl mailing list.
276 Setting environment variables
278 Some modules and tools such as Bio::Tools::Run::StandAloneBlast and
279 clustal_w, require that environment variables are set; a few examples
280 are listed in the INSTALL document. Different versions of Windows utilize
281 different methods for setting these variables. NOTE: The instructions that
282 comes with the BLAST executables for setting up BLAST on Windows are
283 out-of-date. Go to the following web address for instructions on setting
284 up standalone BLAST for Windows:
285 http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/pc_setup.html
287 * For Windows XP, go here. This does not require a reboot but all
288 active shells will not reflect any changes made to the environment.
289 * For older versions (Windows 95 to ME), generally editing the
290 C:\autoexec.bat file to add a variable works. This requires a reboot.
293 set BLASTDB=C:\blast\data
295 For either case, you can check the variable this way:
297 C:\Documents and Settings\Administrator>echo %BLASTDB%
300 Some versions of Windows may have problems differentiating forward and
301 back slashes used for directories. In general, always use backslashes (\).
302 If something isn't working properly try reversing the slashes to see if it
305 For setting up Cygwin environment variables quirks, see an example
308 Installing bioperl-db
310 bioperl-db now works for Windows w/o installing CygWin. This has
311 primarily been tested on WinXP using MySQL5, but it is expected that other
312 bioperl-db supported databases (PostgreSQL, Oracle) should work.
314 You will need Bioperl rel. 1.5.2, a relational database (I use MySQL5 here
315 as an example), and the Perl modules DBI and DBD::mysql, which
316 can be installed from PPM as desribed above (make sure the additional
317 repositories for Kobes and Bribes are added, they will have the latest
318 releases). Do NOT try using nmake with these modules as they will not
319 build correctly under Windows! The PPM builds, by Randy Kobes, have been
320 modified and tested specifically for Windows and ActivePerl.
322 NOTE: we plan on having a PPM for bioperl-db available along with the
323 regular bioperl 1.5.2 release PPM. We will post instructions at that
324 time on using PPM to install bioperl-db.
326 to begin, follow instructions detailed in the Installation Guide for
327 adding the three new repositories (Bioperl, Kobes and Bribes). Then
328 install the following packages:
333 The next step involves creating a database. The following steps are for
336 >mysqladmin -u root -p create bioseqdb
337 Enter password: **********
339 The database needs to be loaded with the BioSQL schema, which can be
340 downloaded as a tarball here.
342 >mysql -u root -p bioseqdb < biosqldb-mysql.sql
343 Enter password: **********
345 Download bioperl-db from CVS. Use the following to install the
351 Now, for testing out bioperl-db, make a copy of the file
352 DBHarness.conf.example in the bioperl-db test subdirectory (bioperl-db\t).
353 Rename it to DBHarness.biosql.conf, and modify it for your database setup
354 (particularly the user, password, database name, and driver). Save the
355 file, change back to the main bioperl-db directory, and run 'nmake test'.
356 You may see lots of the following lines,
359 Subroutine Bio::Annotation::Reference::(eq redefined at C:/Perl/lib/overload.pm line 25,
361 Subroutine new redefined at C:\Perl\src\bioperl\bioperl-live/Bio\Annotation\Reference.pm line 80,
365 which can be safely ignored (again, these come from ActivePerl's paranoid
366 '-w' flag). All tests should pass. NOTE : tests should be run with
367 a clean database with the BiOSQL schema loaded, but w/o taxonomy loaded
374 It is recommended that you load the taxonomy database using the script
375 load_ncbi_taxonomy.pl included in biosql-schema\scripts. You will need to
376 download the latest taxonomy files. This can be accomplished using the
377 -download flag in load_ncbi_taxonomy.pl, but it will not 'untar' the file
378 correctly unless you have GNU tar present in your PATH (which most Windows
379 users will not have), thus causing the following error:
381 >load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser root -dbpass **********
382 The system cannot find the path specified.
383 Loading NCBI taxon database in taxdata:
384 ... retrieving all taxon nodes in the database
385 ... reading in taxon nodes from nodes.dmp
386 Couldn't open data file taxdata/nodes.dmp: No such file or directory rollback ineffective with
387 AutoCommit enabled at C:\Perl\src\bioperl\biosql-schema\scripts\load_ncbi_taxonomy.pl line 818.
388 Rollback ineffective while AutoCommit is on at
389 C:\Perl\src\bioperl\biosql-schema\scripts\load_ncbi_taxonomy.pl line 818.
390 rollback failed: Rollback ineffective while AutoCommit is on
392 Use a file decompression utility like 7-Zip to 'untar' the files in
393 the folder (if using 7-Zip, this can be accomplished by right-clicking on
394 the file and using the option 'Extract here'). Rerun the script without
395 the -download flag to load the taxonomic information. Be patient, as this
396 can take quite a while:
398 >load_ncbi_taxonomy.pl -driver mysql -dbname bioseqdb -dbuser root -dbpass **********
400 Loading NCBI taxon database in taxdata:
401 ... retrieving all taxon nodes in the database
402 ... reading in taxon nodes from nodes.dmp
403 ... insert / update / delete taxon nodes
404 ... (committing nodes)
405 ... rebuilding nested set left/right values
406 ... reading in taxon names from names.dmp
407 ... deleting old taxon names
408 ... inserting new taxon names
412 Now, load the database with your sequences using the script
413 load_seqdatabase.pl, in bioperl-db's bioperl-db\script directory:
415 C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -drive mysql
416 -dbname bioseqdb -dbuser root -dbpass **********
417 Loading NP_249092.gpt ...
420 You may see occasional errors depending on the sequence format, which is a
421 non-platform-related issue. Many of these are due to not having an updated
422 taxonomic database and may be rectified by updating the taxonomic
423 information as detailed in load_ncbi_taxonomy.pl's POD.
425 Thanks to Baohua Wang, who found the initial Windows-specific problem in
426 Bio::Root::Root that led to this fix, to Sendu Bala for fixing
427 Bug #1938, and to Hilmar Lapp for his input.
431 Cygwin is a Unix emulator and shell environment available free at
432 http://www.cygwin.com. Bioperl v. 1.* runs well within Cygwin. Some
433 users claim that installation of Bioperl is easier within Cygwin than
434 within Windows, but these may be users with UNIX backgrounds. A note on
435 Cygwin: it doesn't write to your Registry, it doesn't alter your system or
436 your existing files in any way, it doesn't create partitions, it simply
437 creates a cygwin/ directory and writes all of its files to that directory.
438 To uninstall Cygwin just delete that directory.
440 One advantage of using Bioperl in Cygwin is that all the external modules
441 are available through CPAN - the same cannot be said of ActiveState's PPM
444 To get Bioperl running first install the basic Cygwin package as well as
445 the Cygwin perl, make, binutils, and gcc packages. Clicking the View
446 button in the upper right of the installer window enables you to see
447 details on the various packages. Then start up Cygwin and follow the
448 Bioperl installation instructions for UNIX in Bioperl's INSTALL file
449 (for example, THE BIOPERL BUNDLE and INSTALLING BIOPERL THE EASY WAY USING
454 This package is installed using the instructions contained in the package,
455 without modification. Since postgres is a package within Cygwin this is
456 probably the easiest of the 3 platforms supported in bioperl-db to
457 install (postgres, Mysql, Oracle).
461 If you can, install Cygwin on a drive or partition that's
462 NTFS-formatted, not FAT32-formatted. When you install Cygwin on
463 a FAT32 partition you will not be able to set permissions and ownership
464 correctly. In most situations this probably won't make any difference but
465 there may be occasions where this is a problem.
467 If you're trying to use some application or resource outside of Cygwin
468 directory and you're having a problem remember that Cygwin's path syntax
469 may not be the correct one. Cygwin understands /home/jacky or
470 /cygdrive/e/cygwin/home/jacky (when referring to the E: drive) but the
471 external resource may want E:/cygwin/home/jacky. So your *rc files may end
472 up with paths written in these different syntaxes, depending.
476 You may want to install a relational database in order to use BioPerl
477 db, BioSQL or OBDA. The easiest way to install Mysql is to use
478 the Windows binaries available at http://www.mysql.com. Note that
479 Windows does not have sockets, so you need to force the Mysql connections
480 to use TCP/IP instead. Do this by using the -h, or host, option from the
481 command-line. Example:
483 >mysql -h 127.0.0.1 -u <user> -p<password> <database>
485 Alternatively you could install postgres instead of MySQL, postgres is
486 already a package in Cygwin.
488 One known issue is that DBD::mysql can be tricky to install in Cygwin
489 and this module is required for the bioperl-db, Biosql, and
490 bioperl-pipeline external packages. Fortunately there's some good
493 * Instructions included with DBD::mysql:
495 http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin
497 * Additional instructions if you run into any problems; this
498 information is more up-to-date, covers post-2.9 DBD::mysql quirks in
501 http://rage.against.org/installingdbdmysqlInCygwin
505 Note that expat comes with Cygwin (it's used by the modules
506 XML::Parser and XML::SAX::ExpatXS, which are used by certain
509 Directory for temporary files
511 Set the environmental variable TMPDIR, programs like BLAST and
512 clustalw need a place to create temporary files. e.g.:
514 setenv TMPDIR e:/cygwin/tmp # csh, tcsh
515 export TMPDIR=e:/cygwin/tmp # sh, bash
517 This is not the syntax that Cygwin understands, which would be something
518 like /cygdrive/e/cygwin/tmp or /tmp, this is the syntax that a Windows
521 If this variable is not set correctly you'll see errors like this when you
522 run Bio::Tools::Run::StandAloneBlast:
524 ------------- EXCEPTION: Bio::Root::Exception -------------
525 MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory
533 If you want use BLAST we recommend that the Windows binary be obtained
534 from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/ - the
535 file will be named something like blast-2.2.13-ia32-win32.exe). Then
536 follow the Windows instructions in README.bls. You will also need to set
537 the BLASTDIR environment variable to reflect the directory which holds the
538 blast executable and data folder. You may also want to set other variables
539 to reflect the location of your databases and substitution matrices if
540 they differ from the location of your blast executables; see
541 Installing Bioperl for Unix for more details.
545 Although we've recommended using the BLAST and MySQL binaries
546 you should be able to compile just about everything else from source code
547 using Cygwin's gcc. You'll notice when you're installing Cygwin that many
548 different libraries are also available (gd, jpeg, etc.).