INSTALL.WIN

   1
   2                          Installing Bioperl on Windows
   3
   4    Contents
   5
   6      * 1 Introduction
   7      * 2 Requirements
   8      * 3 Installation Guide
   9      * 4 Bioperl
  10      * 5 Perl on Windows
  11      * 6 Bioperl on Windows
  12      * 7 Beyond the Core
  13
  14           * 7.1 Setting environment variables
  15           * 7.2 Installing bioperl-db
  16
  17      * 8 Bioperl in Cygwin
  18      * 9 bioperl-db in Cygwin
  19      * 10 Cygwin tips
  20      * 11 MySQL and DBD::mysql
  21      * 12 Expat
  22      * 13 Directory for temporary files
  23      * 14 BLAST
  24      * 15 Compiling C code
  25
  26 Introduction
  27
  28    This installation guide was written by Barry Moore, Nathan Haigh
  29    and other Bioperl authors based on the original work of Paul Boutros. The
  30    guide was updated for the BioPerl wiki by Chris Fields and Nathan
  31    Haigh.
  32
  33    Please report problems and/or fixes to the BioPerl mailing list.
  34
  35    An up-to-date version of this document can be found on the BioPerl wiki:
  36
  37    http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
  38
  39 Requirements
  40
  41            1) Only ActivePerl >= 5.8.8.819 is supported by the Bioperl team.
  42            Earlier versions may work, but we do not support them.
  43
  44    One of the reason for this requirement is that ActivePerl >= 5.8.8.819 now
  45    use Perl Package Manager 4 (PPM4). PPM4 is now superior to earlier
  46    versions and also includes a Graphical User Interface (GUI). In short,
  47    it's easier for us to produce and maintain a package for installation via
  48    PPM and also easier for you to do the install! Proceed with earlier
  49    versions at your own risk.
  50
  51 Installation Guide
  52
  53            1) Download the ActivePerl MSI from ActiveState
  54
  55            2) Run the ActivePerl Installer (accepting all defaults is fine).
  56
  57            3) Start the Perl Package Manager GUI from the Start menu.
  58
  59            4) Go to Edit >> Preferences and click the Repositories tab. Add a
  60            new repository for each of the following:
  61
  62                               Repositories to add
  63        +----------------------------------------------------------------+
  64        |           Name           |              Location               |
  65        |--------------------------+-------------------------------------|
  66        |BioPerl-Release Candidates|    http://bioperl.org/DIST/RC       |
  67        |--------------------------+-------------------------------------|
  68        |BioPerl-Regular Releases  |    http://bioperl.org/DIST          |
  69        |--------------------------+-------------------------------------|
  70        |Kobes                     |    http://theoryx5.uwinnipeg.ca/ppms|
  71        |--------------------------+-------------------------------------|
  72        |Bribes                    |    http://www.Bribes.org/perl/ppm   |
  73        +----------------------------------------------------------------+
  74
  75
  76            5) Select View >> All Packages.
  77
  78            6) In the search box type bioperl.
  79
  80            7) Right click the latest version of Bioperl available and choose
  81            install.
  82
  83                         7a) This package will be sufficient for the main
  84                         functionality of Bioperl. However, if you require
  85                         full functionality, you should also install the
  86                         latest Bundle-BioPerl package.
  87
  88            8) Click the green arrow (Run marked actions) to complete the
  89            installation.
  90
  91            9) Go to the Bioperl Wiki and start reading documentation.
  92
  93 Bioperl
  94
  95    Bioperl is a large collection of Perl modules (extensions to the
  96    Perl language) that aid in the task of writing Perl code to deal
  97    with sequence data in a myriad of ways. Bioperl provides objects for
  98    various types of sequence data and their associated features and
  99    annotations. It provides interfaces for analysis of these sequences with a
 100    wide variety of external programs (BLAST, FASTA, clustalw and
 101    EMBOSS to name just a few). It provides interfaces to various types of
 102    databases both remote (GenBank, EMBL etc) and local (MySQL,
 103    Flat_databases flat files, GFF etc.) for storage and retrieval of
 104    sequences. And finally with its associated documentation and
 105    mailing lists, Bioperl represents a community of bioinformatics
 106    professionals working in Perl who are committed to supporting both
 107    development of Bioperl and the new users who are drawn to the project.
 108
 109    While most bioinformatics and computational biology applications are
 110    developed in UNIX/Linux environments, more and more programs are
 111    being ported to other operating systems like Windows, and many users
 112    (often biologists with little background in programming) are looking for
 113    ways to automate bioinformatics analyses in the Windows environment.
 114
 115    Perl and Bioperl can be installed natively on Windows NT/2000/XP.
 116    Most of the functionality of Bioperl is available with this type of
 117    install. Much of the heavy lifting in bioinformatics is done by programs
 118    originally developed in lower level languages like C and Pascal
 119    (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as
 120    a wrapper for running and parsing output from these external programs.
 121
 122    Some of those programs (BLAST for example) are ported to Windows.
 123    These can be installed and work quite happily with Bioperl in the native
 124    Windows environment. Some external programs such as Staden and the
 125    EMBOSS suite of programs can only be installed on Windows by using
 126    Cygwin and its gcc C compiler (see Bioperl in Cygwin, below).
 127    Recent attempts to port EMBOSS to Windows, however, have been mostly
 128    successful.
 129
 130    If you have a fairly simple project in mind, want to start using Bioperl
 131    quickly, only have access to a computer running Windows, and/or don't mind
 132    bumping up against some limitations then Bioperl on Windows may be a
 133    good place for you to start. For example, downloading a bunch of sequences
 134    from GenBank and sorting out the ones that have a particular
 135    annotation or feature works great. Running a bunch of your sequences
 136    against remote or local BLAST, parsing the output and storing it
 137    in a MySQL database would be fine also.
 138
 139    Be aware that most Bioperl developers are working in some type of a
 140    UNIX environment (Linux, OS X, Cygwin). If you have
 141    problems with Bioperl that are specific to the Windows environment, you
 142    may be blazing new ground and your pleas for help on the Bioperl mailing
 143    list may get few responses (you can but try!) - simply because no one
 144    knows the answer to your Windows specific problem. If this is or becomes a
 145    problem for you then you are better off working in some type of UNIX-like
 146    environment. One solution to this problem that will keep you working on a
 147    Windows machine it to install Cygwin, a UNIX emulation environment for
 148    Windows. A number of Bioperl users are using this approach successfully
 149    and it is discussed in more detail below.
 150
 151 Perl on Windows
 152
 153    There are a couple of ways of installing Perl on a Windows machine. The
 154    most common and easiest is to get the most recent build from
 155    ActiveState, a software company that provides free builds of Perl for
 156    Windows users. The current (October 2006) build is ActivePerl 5.8.8.819.
 157    Bioperl also works on Perl 5.6.x but due to installation problems etc,
 158    only ActivePerl 5.8.8.819 or later is supported. To install ActivePerl on
 159    Windows:
 160
 161            1) Download the ActivePerl MSI from
 162            http://www.activestate.com/Products/ActivePerl/.
 163
 164            2) Run the ActivePerl Installer (accepting all defaults is fine).
 165
 166    You can also build Perl yourself (which requires a C compiler) or download
 167    one of the other binary distributions. The Perl source for building it
 168    yourself is available from CPAN, as are a few other binary
 169    distributions that are alternatives to ActiveState. This approach is not
 170    recommended unless you have specific reasons for doing so and know what
 171    you're doing. If that's the case you probably don't need to be reading
 172    this guide.
 173
 174    Cygwin is a UNIX emulation environment for Windows and comes with
 175    its own copy of Perl.
 176
 177    Information on Cygwin and Bioperl is found below.
 178
 179 Bioperl on Windows
 180
 181    Perl is a programming language that has been extended a lot by the
 182    addition of external modules.
 183
 184    These modules work with the core language to extend the functionality of
 185    Perl.
 186
 187    Bioperl is one such extension to Perl. These modular extensions to
 188    Perl sometimes depend on the functionality of other Perl modules and this
 189    creates a dependency. You can't install module X unless you have already
 190    installed module Y. Some Perl modules are so fundamentally useful that the
 191    Perl developers have included them in the core distribution of Perl - if
 192    you've installed Perl then these modules are already installed. Other
 193    modules are freely available from CPAN, but you'll have to install them
 194    yourself if you want to use them. Bioperl has such dependencies.
 195
 196    Bioperl is actually a large collection of Perl modules (over 1000
 197    currently) and these modules are split into seven packages. These seven
 198    packages are:
 199
 200    +------------------------------------------------------------------------+
 201    |    Bioperl Group     |                    Functions                    |
 202    |----------------------+-------------------------------------------------|
 203    |bioperl (the core)    |Most of the main functionality of Bioperl        |
 204    |----------------------+-------------------------------------------------|
 205    |bioperl-run           |Wrappers to a lot of external programs           |
 206    |----------------------+-------------------------------------------------|
 207    |bioperl-ext           |Interaction with some alignment functions and the|
 208    |                      |Staden package                                   |
 209    |----------------------+-------------------------------------------------|
 210    |bioperl-db            |Using Bioperl with BioSQL and local relational   |
 211    |                      |databases                                        |
 212    |----------------------+-------------------------------------------------|
 213    |bioperl-microarray    |Microarray specific functions                    |
 214    |----------------------+-------------------------------------------------|
 215    |bioperl-pedigree      |manipulating genotype, marker, and individual    |
 216    |                      |data for linkage studies                         |
 217    |----------------------+-------------------------------------------------|
 218    |bioperl-gui           |Some preliminary work on a graphical user        |
 219    |                      |interface to some Bioperl functions              |
 220    +------------------------------------------------------------------------+
 221
 222    The Bioperl core is what most new users will want to start with. Bioperl
 223    (the core) and the Perl modules that it depends on can be easily installed
 224    with the perl package Manager PPM. PPM is an ActivePerl utility for
 225    installing Perl modules on systems using ActivePerl. PPM will look online
 226    (you have to be connected to the internet of course) for files (these
 227    files end with .ppd) that tell it how to install the modules you want and
 228    what other modules your new modules depends on. It will then download and
 229    install your modules and all dependent modules for you.
 230
 231    These .ppd files are stored online in PPM repositories. ActiveState
 232    maintains the largest PPM repository and when you installed ActivePerl PPM
 233    was installed with directions for using the ActiveState repositories.
 234    Unfortunately the ActiveState repositories are far from complete and other
 235    ActivePerl users maintain their own PPM repositories to fill in the gaps.
 236    Installing will require you to direct PPM to look in three new
 237    repositories as detailed in Installation Guide.
 238
 239    Once PPM knows where to look for Bioperl and it's dependencies you simply
 240    tell PPM to search for packages with a particular name, select those of
 241    interest and then tell PPM to install the selected packages.
 242
 243 Beyond the Core
 244
 245    You may find that you want some of the features of other Bioperl groups
 246    like bioperl-run or bioperl-db. Currently, plans include setting up PPM
 247    packages for installing these parts of Bioperl; check this by doing a
 248    Bioperl search in PPM.  If these are not available, though, you can use
 249    the following instructions for installing the other distributions.
 250
 251    For this you will need a Windows version of the program make
 252    called nmake:
 253
 254    http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe
 255
 256    You will also want to have a willingness to experiment. You'll have to
 257    read the installation documents for each component that you want to
 258    install, and use nmake where the instructions call for make, like so:
 259
 260  perl Makefile.PL
 261  nmake
 262  nmake test
 263  nmake install
 264
 265    'nmake test' will likely produce lots of warnings, many of these can be
 266    safely ignored (these stem from the excessively paranoid '-w' flag in
 267    ActivePerl). You will have to determine from the installation documents
 268    what dependencies are required, and you will have to get them, read their
 269    documentation and install them first. It is recommended that you look
 270    through the PPM repositories for any modules before resorting to using
 271    nmake as there isn't any guarantee modules built using nmake will work.
 272    The details of this are beyond the scope of this guide. Read the
 273    documentation. Search Google. Try your best, and if you get stuck consult
 274    with others on the BioPerl mailing list.
 275
 276     Setting environment variables
 277
 278    Some modules and tools such as Bio::Tools::Run::StandAloneBlast and
 279    clustal_w, require that environment variables are set; a few examples
 280    are listed in the INSTALL document. Different versions of Windows utilize
 281    different methods for setting these variables. NOTE: The instructions that
 282    comes with the BLAST executables for setting up BLAST on Windows are
 283    out-of-date. Go to the following web address for instructions on setting
 284    up standalone BLAST for Windows:
 285    http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/pc_setup.html
 286
 287      * For Windows XP, go here. This does not require a reboot but all
 288        active shells will not reflect any changes made to the environment.
 289      * For older versions (Windows 95 to ME), generally editing the
 290        C:\autoexec.bat file to add a variable works. This requires a reboot.
 291        Here's an example:
 292
 293  set BLASTDB=C:\blast\data
 294
 295    For either case, you can check the variable this way:
 296
 297  C:\Documents and Settings\Administrator>echo %BLASTDB%
 298  C:\blast\data
 299
 300    Some versions of Windows may have problems differentiating forward and
 301    back slashes used for directories. In general, always use backslashes (\).
 302    If something isn't working properly try reversing the slashes to see if it
 303    helps.
 304
 305    For setting up Cygwin environment variables quirks, see an example
 306    below.
 307
 308     Installing bioperl-db
 309
 310    bioperl-db now works for Windows w/o installing CygWin. This has
 311    primarily been tested on WinXP using MySQL5, but it is expected that other
 312    bioperl-db supported databases (PostgreSQL, Oracle) should work.
 313
 314    You will need Bioperl rel. 1.5.2, a relational database (I use MySQL5 here
 315    as an example), and the Perl modules DBI and DBD::mysql, which
 316    can be installed from PPM as desribed above (make sure the additional
 317    repositories for Kobes and Bribes are added, they will have the latest
 318    releases). Do NOT try using nmake with these modules as they will not
 319    build correctly under Windows! The PPM builds, by Randy Kobes, have been
 320    modified and tested specifically for Windows and ActivePerl.
 321
 322    NOTE: we plan on having a PPM for bioperl-db available along with the
 323    regular bioperl 1.5.2 release PPM. We will post instructions at that
 324    time on using PPM to install bioperl-db.
 325
 326    to begin, follow instructions detailed in the Installation Guide for
 327    adding the three new repositories (Bioperl, Kobes and Bribes). Then
 328    install the following packages:
 329
 330            1) DBI
 331            2) DBD-mysql
 332
 333    The next step involves creating a database. The following steps are for
 334    MySQL5:
 335
 336  >mysqladmin -u root -p create bioseqdb
 337  Enter password: **********
 338
 339    The database needs to be loaded with the BioSQL schema, which can be
 340    downloaded as a tarball here.
 341
 342  >mysql -u root -p bioseqdb < biosqldb-mysql.sql
 343  Enter password: **********
 344
 345    Download bioperl-db from CVS. Use the following to install the
 346    modules:
 347
 348  perl Makefile.PL
 349  nmake
 350
 351    Now, for testing out bioperl-db, make a copy of the file
 352    DBHarness.conf.example in the bioperl-db test subdirectory (bioperl-db\t).
 353    Rename it to DBHarness.biosql.conf, and modify it for your database setup
 354    (particularly the user, password, database name, and driver). Save the
 355    file, change back to the main bioperl-db directory, and run 'nmake test'.
 356    You may see lots of the following lines,
 357
 358  ....
 359  Subroutine Bio::Annotation::Reference::(eq redefined at C:/Perl/lib/overload.pm line 25,
 360      <GEN0> line 1.
 361  Subroutine new redefined at C:\Perl\src\bioperl\bioperl-live/Bio\Annotation\Reference.pm line 80,
 362      <GEN0> line 1.
 363  ....
 364
 365    which can be safely ignored (again, these come from ActivePerl's paranoid
 366    '-w' flag). All tests should pass. NOTE : tests should be run with
 367    a clean database with the BiOSQL schema loaded, but w/o taxonomy loaded
 368    (see below).
 369
 370    To install, run:
 371
 372  nmake install
 373
 374    It is recommended that you load the taxonomy database using the script
 375    load_ncbi_taxonomy.pl included in biosql-schema\scripts. You will need to
 376    download the latest taxonomy files. This can be accomplished using the
 377    -download flag in load_ncbi_taxonomy.pl, but it will not 'untar' the file
 378    correctly unless you have GNU tar present in your PATH (which most Windows
 379    users will not have), thus causing the following error:
 380
 381  >load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser root -dbpass **********
 382  The system cannot find the path specified.
 383  Loading NCBI taxon database in taxdata:
 384          ... retrieving all taxon nodes in the database
 385          ... reading in taxon nodes from nodes.dmp
 386  Couldn't open data file taxdata/nodes.dmp: No such file or directory rollback ineffective with
 387  AutoCommit enabled at C:\Perl\src\bioperl\biosql-schema\scripts\load_ncbi_taxonomy.pl line 818.
 388  Rollback ineffective while AutoCommit is on at
 389  C:\Perl\src\bioperl\biosql-schema\scripts\load_ncbi_taxonomy.pl line 818.
 390  rollback failed: Rollback ineffective while AutoCommit is on
 391
 392    Use a file decompression utility like 7-Zip to 'untar' the files in
 393    the folder (if using 7-Zip, this can be accomplished by right-clicking on
 394    the file and using the option 'Extract here'). Rerun the script without
 395    the -download flag to load the taxonomic information. Be patient, as this
 396    can take quite a while:
 397
 398  >load_ncbi_taxonomy.pl -driver mysql -dbname bioseqdb -dbuser root -dbpass **********
 399
 400  Loading NCBI taxon database in taxdata:
 401          ... retrieving all taxon nodes in the database
 402          ... reading in taxon nodes from nodes.dmp
 403          ... insert / update / delete taxon nodes
 404          ... (committing nodes)
 405          ... rebuilding nested set left/right values
 406          ... reading in taxon names from names.dmp
 407          ... deleting old taxon names
 408          ... inserting new taxon names
 409          ... cleaning up
 410  Done.
 411
 412    Now, load the database with your sequences using the script
 413    load_seqdatabase.pl, in bioperl-db's bioperl-db\script directory:
 414
 415  C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -drive mysql
 416                                -dbname bioseqdb -dbuser root -dbpass **********
 417  Loading NP_249092.gpt ...
 418  Done.
 419
 420    You may see occasional errors depending on the sequence format, which is a
 421    non-platform-related issue. Many of these are due to not having an updated
 422    taxonomic database and may be rectified by updating the taxonomic
 423    information as detailed in load_ncbi_taxonomy.pl's POD.
 424
 425    Thanks to Baohua Wang, who found the initial Windows-specific problem in
 426    Bio::Root::Root that led to this fix, to Sendu Bala for fixing
 427    Bug #1938, and to Hilmar Lapp for his input.
 428
 429 Bioperl in Cygwin
 430
 431    Cygwin is a Unix emulator and shell environment available free at
 432    http://www.cygwin.com. Bioperl v. 1.* runs well within Cygwin. Some
 433    users claim that installation of Bioperl is easier within Cygwin than
 434    within Windows, but these may be users with UNIX backgrounds. A note on
 435    Cygwin: it doesn't write to your Registry, it doesn't alter your system or
 436    your existing files in any way, it doesn't create partitions, it simply
 437    creates a cygwin/ directory and writes all of its files to that directory.
 438    To uninstall Cygwin just delete that directory.
 439
 440    One advantage of using Bioperl in Cygwin is that all the external modules
 441    are available through CPAN - the same cannot be said of ActiveState's PPM
 442    utility.
 443
 444    To get Bioperl running first install the basic Cygwin package as well as
 445    the Cygwin perl, make, binutils, and gcc packages. Clicking the View
 446    button in the upper right of the installer window enables you to see
 447    details on the various packages. Then start up Cygwin and follow the
 448    Bioperl installation instructions for UNIX in Bioperl's INSTALL file
 449    (for example, THE BIOPERL BUNDLE and INSTALLING BIOPERL THE EASY WAY USING
 450    CPAN).
 451
 452 bioperl-db in Cygwin
 453
 454    This package is installed using the instructions contained in the package,
 455    without modification. Since postgres is a package within Cygwin this is
 456    probably the easiest of the 3 platforms supported in bioperl-db to
 457    install (postgres, Mysql, Oracle).
 458
 459 Cygwin tips
 460
 461    If you can, install Cygwin on a drive or partition that's
 462    NTFS-formatted, not FAT32-formatted. When you install Cygwin on
 463    a FAT32 partition you will not be able to set permissions and ownership
 464    correctly. In most situations this probably won't make any difference but
 465    there may be occasions where this is a problem.
 466
 467    If you're trying to use some application or resource outside of Cygwin
 468    directory and you're having a problem remember that Cygwin's path syntax
 469    may not be the correct one. Cygwin understands /home/jacky or
 470    /cygdrive/e/cygwin/home/jacky (when referring to the E: drive) but the
 471    external resource may want E:/cygwin/home/jacky. So your *rc files may end
 472    up with paths written in these different syntaxes, depending.
 473
 474 MySQL and DBD::mysql
 475
 476    You may want to install a relational database in order to use BioPerl
 477    db, BioSQL or OBDA. The easiest way to install Mysql is to use
 478    the Windows binaries available at http://www.mysql.com. Note that
 479    Windows does not have sockets, so you need to force the Mysql connections
 480    to use TCP/IP instead. Do this by using the -h, or host, option from the
 481    command-line. Example:
 482
 483  >mysql -h 127.0.0.1 -u <user> -p<password> <database>
 484
 485    Alternatively you could install postgres instead of MySQL, postgres is
 486    already a package in Cygwin.
 487
 488    One known issue is that DBD::mysql can be tricky to install in Cygwin
 489    and this module is required for the bioperl-db, Biosql, and
 490    bioperl-pipeline external packages. Fortunately there's some good
 491    instructions online:
 492
 493      * Instructions included with DBD::mysql:
 494
 495        http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin
 496
 497      * Additional instructions if you run into any problems; this
 498        information is more up-to-date, covers post-2.9 DBD::mysql quirks in
 499        Cygwin.
 500
 501        http://rage.against.org/installingdbdmysqlInCygwin
 502
 503 Expat
 504
 505    Note that expat comes with Cygwin (it's used by the modules
 506    XML::Parser and XML::SAX::ExpatXS, which are used by certain
 507    Bioperl modules).
 508
 509 Directory for temporary files
 510
 511    Set the environmental variable TMPDIR, programs like BLAST and
 512    clustalw need a place to create temporary files. e.g.:
 513
 514  setenv TMPDIR e:/cygwin/tmp     # csh, tcsh
 515  export TMPDIR=e:/cygwin/tmp    # sh, bash
 516
 517    This is not the syntax that Cygwin understands, which would be something
 518    like /cygdrive/e/cygwin/tmp or /tmp, this is the syntax that a Windows
 519    application expects.
 520
 521    If this variable is not set correctly you'll see errors like this when you
 522    run Bio::Tools::Run::StandAloneBlast:
 523
 524    ------------- EXCEPTION: Bio::Root::Exception -------------
 525    MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory
 526    STACK: Error::throw
 527    ..........
 528
 529    [edit]
 530
 531 BLAST
 532
 533    If you want use BLAST we recommend that the Windows binary be obtained
 534    from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/ - the
 535    file will be named something like blast-2.2.13-ia32-win32.exe). Then
 536    follow the Windows instructions in README.bls. You will also need to set
 537    the BLASTDIR environment variable to reflect the directory which holds the
 538    blast executable and data folder. You may also want to set other variables
 539    to reflect the location of your databases and substitution matrices if
 540    they differ from the location of your blast executables; see
 541    Installing Bioperl for Unix for more details.
 542
 543 Compiling C code
 544
 545    Although we've recommended using the BLAST and MySQL binaries
 546    you should be able to compile just about everything else from source code
 547    using Cygwin's gcc. You'll notice when you're installing Cygwin that many
 548    different libraries are also available (gd, jpeg, etc.).