2 Map data is a critical component in many aspects of genetics and
3 biological research. Well defined toolkits for manipulating map data
4 do not exist at this point, we propose to build a system for
5 manipulating most types of map data (Genetic, RH, RFLP, Sequence, and
10 This document proposes an object heirarchy for maps, markers, and
14 * A Map is an object which contains mapable elements.
15 * A Map can be defined for a given organism or population of individuals.
16 * A Mappable element is an element with a position within a map.
18 Background information
19 Maps are made up of elements which are mappable. This includes
20 genetic and physical markers.
22 A genetic map consists of markers which have a given recombination
23 distance between them. This distance is usually given as
24 centi-morgans or 1% recombination between them. Other distances
25 include ... Examples of these are the publicly available
26 Marshfield and Genethon maps.
28 Radiation hybrid maps consist of markers which have been mapped to
29 radiation hybrid panels. Typically these markers are STSes which
30 have been processed on RH panels. The distance between markers is
31 calculated in centi-Rads which represent . Examples of these include
32 Whitehead STS, GeneMap '99.
34 Restriction Enzyme (RE) maps are used to describe RE cut points in a
35 given sequence and can be used to "fingerprint" sections of DNA
36 (typically BAC clones). Clones which share a statitistically (based
37 on known frequency of RE cutting) signifigant collection fingerprints
38 are likely to overlap. Additionally
40 Physical maps or BAC/PAC/YAC maps represent clone fragment overlap.
41 These maps are used to to represent how clones overlap and form a
42 consensus sequence of a genomic or cDNA region.
44 Sequence maps represent the known consensus sequence for a given
45 region of typically genomic DNA.
47 LD and Haplotype maps ...
49 Comparisions between maps from different organisms can yield useful
50 observations about trends in evolution. Additionally comparisons of
51 maps for the same species can provide insight into information such
52 as recombination hot spots and DNA stability.
55 Maps are objects which are made up of mappable elements. A mappable
56 element has a position on a map and can be tested for equality and
57 relative position to other mappable element positions.
59 These are some baseline interface and object definitions. Other work
60 has been done by Philip Lijnzaad, Emmanuel Barillot and OMG folks to
61 create definitions for maps.
65 string getID // unique identifier -- this goes with Juha's
66 // identifiable property?
71 Bio::AliasableI isa Bio::NameableI
75 Bio::Map::MapI isa Bio::NameableI isa Bio::Identifiable
76 MapIterator getAllElements // for in-order iterator access)
77 ?Bio::ChromosomeI? chromosome // Should maps be build one per
78 // chromosome aggregated for
79 // a whole report set.
80 Bio::SpeciesI species // use existing BP species object
81 // which may need to be more robust
82 numeric length // not sure what to return for
83 // relative or RFLP maps
84 string units // Map units
85 string name // Map Name
89 // Where to handle the fact that RFLP
90 // Markers have multiple Map positions
91 PositionI position(MapI)
92 boolean equals(MappableI)
93 boolean less_than(MappableI)
94 boolean greater_than(MappableI)
97 // may be undef to handle relative maps [RE].
98 // This is where a known position for a marker can be retrieved
99 // Multiple positions are possible for RE on a sequence map
100 Array<string> positionValues
102 Bio::MarkerI isa Bio::MappableI isa Bio::AliasableI
104 // heikki to help fill in Variant and Allele information
105 Bio::LiveSeq::AlleleI
107 Bio::LiveSeq::VariantI isa Bio::MarkerI
108 Bio::PrimarySeqI getFwdPrimer()
109 Bio::PrimarySeqI getRevPrimer()
110 // I assume there should always be a primary set of
111 // of markers which defined start/end points
112 // should this be hidden inside more methods to
114 Bio::LiveSeq::AlleleI getAlleles()
117 Bio::Marker::RestrictionEnzyme isa Bio::MarkerI
118 Bio::Marker::STS isa Bio::MarkerI
119 Bio::Marker::Microsat isa Bio::LiveSeq::VariantI
120 Bio::Marker::CytogeneticBand isa Bio::MarkerI
121 Bio::Marker::VLTR isa Bio::MarkerI
125 Bio::Map::Cytogenetic isa Bio::Map::MapI
127 Bio::Map::RadiationHybrid
130 string getSex // code as a string? - only
132 Bio::Map::Sequence // Should probably be Bio::Assembly or these two
133 // need to work together Sequence Map could be
134 // be built with Bio::Assemblies
135 Bio::Map::Haplotype // what would this entail -- SNP components?
138 Caveats, questions, etc
139 -----------------------
140 Namespace is very flexible here.
142 An important useful result of this toolkit will be the ability to
143 programatically go from one map to another. So Querying Maps for a
144 marker - perhaps based on that marker's unique id will allow on to
145 compare distances on different maps or go from genetic to sequence
148 Not sure if we should be doing a Bio::ChromosomeI or can just code
149 with a string/numeric? Does Polyploidy cause any problems in maps or
150 just in population/allele issues?