Fix a -Wformat-truncation warning with gcc-8
[gromacs/tng.git] / Trajectoryformatspecification.mk
blob96059cfdc78e56db8a21e73635d361e3425c900d
1 Trajectory format specification
3 1. Release notes
4 =============
6 General notes:
7 --------------
9 \* Currently the API is lacking many data retrieval and getter
10 functions. This will be fixed soon.
12 \* It might be problematic searching for a specific time if frames are
13 not consecutive. This problem is somewhat hypothetical, but might be
14 good to point out.
16 \* Currently we cannot track a specific molecule in a grand canonical
17 ensemble.
19 \* We should request comments on the specs from other groups - after v1.
21 Erik's comments:
23 \* Keep the number of calls small
25 \* Use a prefix for the calls, e.g. tng\_open, tng\_close, etc.; use
26 e.g. adv\_ prefix for advanced API
28 \* Help routine to return the full list of atom types
30 \* In general info:
32 - number of stride pointers
34 - for each pointer, the number of frame sets that it skips
36 =\> have 3 stride pointers with defaults 1, 100 and 10,000 frame sets
38 \* include zlib as a supported compressor
40 To do:
41 ------
43 \* Make a drawing
45 \* Include API function for signing of the trajectory
47 \* Sanity checklist:
49 1. Block header size
50 2. Block contents size
51 3. Compare hashes (currently doesn’t abort only prints warning)
53 1. maybe add a flag to choose between abort/warning
54 2. empty hash is always accepted
56 4.
58 Version 0.98
59 ------------
61 \* Use signed int instead of unsigned.
63 \* Changed names from ‘trg’ to ‘tng’.
65 Version 0.97
66 ------------
68 \* Added chains and residues to the molecule description.
70 Version 0.96
71 ------------
73 \*Renamed “Trajectory Box Shape”, “Trajectory Positions” etc.
75 \*Renamed “Trajectory Index Block” to “Particle Mapping Block”
77 Version 0.95
78 ------------
80 \* Changed name of ‘Atom name block’ to ‘Molecule block’ and included
81 connectivity in that block. The number of each molecule is also
82 specified there, unless using variable number of particles (grand
83 canonical ensemble).
85 \* Removed ‘variable number of particles’ and ‘variable number of
86 values’ flags from data blocks.
88 \* Updated the API and headers
90 Version 0.9
91 -----------
93 \* Removed endianness/string block
95 \* Added PGP signature to General Info block
97 \* Removed reserved ‘user data blocks’. Those are already handled as
98 normal ‘data blocks’. Reserved are only box, positions, velocities and
99 forces.
101 Version 0.8
103 \* Use only MD5 hashes
105 \* Changed time (in general info block) to 64 bit int.
107 Version 0.7
108 -----------
110 \* New hierarchy rules
112 \* Changed "Trajectory Info" to "General Info" block
114 \* Moved "BOX SHAPE" before the index and trajectory frame blocks
116 \* general, int., bin32, bin64 user data is explicitly not frame
117 dependent
119 \* Variable number of atoms will be supported in version 1
121 \* Made more clear the nesting in the "trajectory group blocks"
123 \* Allow header only files, i.e. no trajectory blocks
125 \* Removed number of frames from the "index block" description
127 \* Removed ASCII recommendation from the string description, i.e. [1]
129 \* Added molecule ID to “atom name block”
131 Version 0.6
132 -----------
134 Changed endianness block to endianness and string length block.
136 Changed order of hash specifications in the header block.
138 Added initial API specifications
140 Version 0.5
141 -----------
143 Included an additional header block that specifies the endianness. It
144 comes before all other blocks.
146 2. Specifications
147 ==============
149 General specifications are given below. Others are described in the
150 relevant block sections.
152 1. The file contains a number of blocks.
153 2. The order of the blocks follows the order specified in section 4
154 “Description of blocks”
155 3. All integers and floating point (floats and doubles) values are
156 stored using big endian byte ordering. Conversions to and from the
157 native format of the computer is performed during reading and
158 writing of numerical fields.
159 4. MD5 hashes are used to verify the integrity of the data.
160 5. Strings are limited to a max length of 1023 characters and are
161 terminated by a null character (‘\\000’). If longer text data must
162 be saved a data block containing multiple entries of general
163 (character/string) data can be used.
164 6. If a trajectory converter program encounters errors during reading a
165 block which format is not recognized, the block is to be written out
166 as binary object without modification.
167 7. Each group of blocks to contain a Table of Contents which lists the
168 included blocks within the group.^[[a]](#cmnt1)^
169 8. No compression (in ver.1) for general data streams such as integers,
170 floating point numbers, particle indices etc.
172 3. Each block contains the following fields (header):
173 ==================================================
175 1. 64 bit size of the header
176 2. 64 bit size of the block contents (except header)
177 3. 64 bit block type identifier
178 4. 16 characters MD5 Hash (or 16 “\\0” characters)
179 5. name[1]
180 6. 64 bit version of the block ^[[b]](#cmnt2)^(allows addition of more
181 fields in the future to existing blocks, although old fields should
182 never be removed, to allow older readers read new files)
184 4. Description of blocks (each with a unique 64 bit identifier and a matching "name"):
185 ===================================================================================
187 1. info block (1) "GENERAL INFO" (required)
188 2. molecules block (2) "MOLECULES" (optional)
189 3. trajectory ids and names (4) “TRAJECTORY IDS AND NAMES” (optional)
190 4. trajectory frames, box shape block (10000) "BOX SHAPE" (optional,
191 can be present before the frame sets if it does not change or inside
192 the frame sets if it varies)
193 5. trajectory frame set block (5) "TRAJECTORY FRAME SET" (required)
194 (multiple “trajectory frame sets” are allowed)
196 1. trajectory table of contents block (6) “BLOCK TABLE OF CONTENTS”
197 (required)
198 2. trajectory frames, box shape block (10000) "BOX SHAPE" (optional,
199 can be present before the frame sets if it does not change or inside
200 the frame sets if it varies)
201 3. trajectory particle mapping block (7) "PARTICLE MAPPING" (required
202 if there are trajectory frames blocks) (multiple particle mapping
203 blocks with corresponding trajectory frames blocks are allowed, e.g.
204 to allow parallel writes of different atom sets)
206 1. trajectory frames, positions, block (10001) "POSITIONS" (optional)
207 2. trajectory frames, velocities, block (10002) "VELOCITIES" (optional)
208 3. trajectory frames, forces, block (10003) "FORCES" (optional)
210 6. ...other specified blocks, both non-trajectory and trajectory
211 blocks, each with unique id & name
213 Data blocks can be used to store whatever data is needed. Data blocks
214 with IDs in the range 10000 to 10999 are reserved for standard data
215 (such as box shape, positions, velocities etc.), whereas IDs from 11000
216 and above can be used for any kind of user data.
218 5. NOTES ABOUT BLOCK KINDS:
219 ========================
221 There can be only one block at the beginning of the file in the standard
222 case with fixed charges throughout the simulation (that’s the case for
223 version 1 of the format). For simulations where charges vary each frame
224 will include a block with the values.
226 The trajectory frame blocks must be collected into frame sets, each
228 such frame set has as its first block the "trajectory frame set block".
229 Each frame set will contain (multiple) particle mapping blocks,
230 positions, velocities, forces etc.^[[c]](#cmnt3)^
232 6. Requirements on block order:
233 ============================
235 1. The order follows section 4. “Description of blocks” with the
236 corresponding nesting of multiple “trajectory frame sets” and
237 “particle mapping blocks”.
238 2. All non-trajectory frame blocks (e.g. user ones) must appear before
239 the trajectory blocks
240 3. Trajectory particle mapping blocks are optional. If they are
241 present, they must appear before the corresponding trajectory data
242 blocks. If there are multiple trajectory data blocks, the
243 corresponding particle mapping blocks come right before them. I.e.
244 ParticleMappingBlock1-\>DataBlock1-\>P.MappingBlock2-\>DataBlock2.
245 4. Blocks within a group of blocks are ordered by their ID.
247 7. Other requirements:
248 ===================
250 1. Most blocks are optional except for the “general info” blocks
251 2. No limit on the number of times that trajectory related blocks are
252 allowed to appear
254 8. Specification of the block contents (all blocks have the same header as described above) for version 1 of each block type.
255 ==========================================================================================================================
257 BLOCK: general info block
258 -------------------------
260 1. name and version of the program used to perform the simulation (upon
261 file creation)[1]
262 2. name and version of the program used when finishing the file[1]
263 3. name of the force field used to perform the simulation
264 [1]^[[d]](#cmnt4)^
265 4. name of the person who created the file [1]
266 5. name of the person who last modified the file [1]
267 6. 64 bit time of initial file creation, seconds since 1970
268 7. 64 bit time of completing the simulation, seconds since 1970
269 8. name of computer/other info where the file was created [1]
270 9. name of computer/other info where the file was completed [1]
271 10. PGP signature (optional and 0 terminated string)
272 11. 8 bit flag Use variable number of atoms.
273 12. 64 bit number of frames in each frame set (this is the expected
274 number of frames in each set, but it does not have to be constant,
275 it is OK to have frame sets with fewer or more frames, e.g. after
276 concatenating multiple trajectory files. This avoids the need to
277 recompress all data after a concatenation, but it means that
278 searching for a specific frame might need a few more steps between
279 frame sets.). For simulations using a grand canonical ensemble it is
280 best to set this to 1 so that the number of atoms in the frame sets
281 can be updated regularly.
282 13. 64 bit pointer from the beginning of the info block to the beginning
283 of the first trajectory frame set [2]
284 14. 64 bit pointer from the beginning of the info block to the beginning
285 of the last trajectory frame set [2] (updated when finishing writing
286 the trajectory file - otherwise set to -1)
287 15. 64 bit length of steps (number of “trajectory frame set blocks”) for
288 long stride pointers (default 100 “trajectory frame set blocks”).
290 BLOCK: molecules block (optional)
291 ---------------------------------
293 1. 64 bit number of molecules
294 2. For each molecule:
296 1. 64 bit Molecule ID
297 2. Molecule name [1]
298 3. 64 bit quaternary structure, e.g. 1 means monomeric, 4 means
299 tetrameric etc.
300 4. 64 bit number of molecules of this kind - only if not using
301 “variable number of atoms” in the “general info block”.
302 5. 64 bit number of chains in the molecule
303 6. 64 bit number of residues in the molecule
304 7. 64 bit number of atoms in the molecule^[[e]](#cmnt5)^
305 8. For each chain:
307 1. 64 bit Chain ID (unique in molecule)
308 2. Chain name [1]
309 3. 64 bit number of residues in the chain
310 4. For each residue:
312 1. 64 bit Residue ID (unique in the chain)
313 2. Residue name [1]
314 3. 64 bit number of atoms in the residue
315 4. For each atom:
317 1. 64 bit Atom ID (unique in the molecule)
318 2. Atom name [1]
319 3. Atom type [1]
321 9. 64 bit number of bonds in the molecule
322 10. For each bond:
324 5. 64 bit integer From Atom ID.
325 6. 64 bit integer To Atom ID.
327 BLOCK: trajectory frame set block
328 ---------------------------------
330 1. 64 bit number of first frame (zero based numbering)
331 2. 64 bit number of frames (NF)
332 3. Array of 64 bit integers specifying the count of each molecule type.
333 The molecule types are listed in the “Atom names block” and should
334 be listed in the same order here. This should only be present when
335 the variable number of atoms flag in the “General info block” is set
336 to TRUE. This is used for e.g. simulations using a grand canonical
337 ensemble (in which case the number of frames in each frame set
338 should be 1).
339 4. 64 bit pointer to the next “trajectory frame set block”.
340 5. 64 bit pointer the previous “trajectory frame set block”.
341 6. 64 bit long stride pointer to the next e.g. 100th “trajectory frame
342 set block”. (Stride length specified in “general info” block.)
343 7. 64 bit long stride pointer to the previous e.g. 100th “trajectory
344 frame set block”.
346 BLOCK: trajectory table of contents
347 -----------------------------------
349 1. 64 bit number of blocks
351 Contains a listing of all data blocks \_present\_ in the frame set. It
352 is possible to have multiple blocks with the same ID, but the ID is only
353 listed once in the “trajectory table of contents” block.
355 It includes for each block type:
357 1. Block name [1]
359 BLOCK: data blocks
360 ------------------
362 Frame dependent data blocks should come after the frame set block to
363 which it belongs. Frame and particle dependent data blocks should come
364 after the relevant particle mapping block (if using any particle mapping
365 block).
367 1. Char data type flag. 0 = character/string data, 1 = 64 bit integer
368 data, 2 = float data (32 bit), 3 = double data (64 bit)
369 2. Char dependency flag. 1 = frame dependent, 2 = particle dependent.
370 Can be combined, i.e. 3 = frame and particle dependent.
371 3. Char sparse data flag to signify if not all frames in the frame sets
372 have data entries in this data block, e.g. energies and positions
373 might be saved at different intervals meaning that at least one of
374 them would be saved as sparse data. Only present if the data is
375 frame dependent.
376 4. 64 bit number of values.
377 5. 64 bit id of the CODEC used to store the positions
378 6. Double (64 bit) multiplier for integers to obtain the appropriate
379 floating point number, for compressed frames [3] [\*\*] (only
380 present if the above CODEC id is \> 0 and if the data type is double
381 or float)
383 If using sparse data the following fields are required:
385 1. 64 bit number of first frame containing data.
386 2. 64 bit number of frames between data points
388 Particle dependent data blocks contain the following fields:
390 1. 64 bit number of first particle as stored in the trajectory, zero
391 based numbering) (J), this must be the same as in the preceding
392 trajectory particle mapping block, if present.
394 1. 64 bit number of particles in block, this must be the same as in the
395 preceding trajectory particle mapping block, if present.
397 Example 1:
399 Box shape block (10000) in a frame set with frames 0-99:
401 1. Data type: 3 (double)
402 2. Dependency: 1 (frame dependent)
403 3. Sparse data: 1
404 4. Number of values: 9
405 5. Codec ID: 0
406 6. First frame containing data: 0
407 7. Number of frames between data points: 50
408 8. For each frame (2 frames with data in this block):
410 1. 9 double (64 bit) values describing the shape of the block
412 Example 2:
414 Positions block (ID 10001) in a frame set with frames 1000-1099:
416 1. Data type: 2 (float)
417 2. Dependency: 3 (frame and particle dependent)
418 3. Sparse data: 1
419 4. Number of values: 3 (x, y and z)
420 5. Coded ID: 0
421 6. First frame containing data: 100
422 7. Number of frames between data points: 10
423 8. Number of first particle: 0
424 9. Number of particles in block: 1000
425 10. For each frame (10 frames with data in this block):
427 1. For each particle (1000 particles):
429 1. 32 bit float x coordinate
430 2. 32 bit float y coordinate
431 3. 32 bit float z coordinate
433 Example 3:
435 Forces block (ID 10003) in a frame set with frames 0-99:
437 11. Data type: 2 (float)
438 12. Dependency: 3 (frame and particle dependent)
439 13. Sparse data: 0
440 14. Number of values: 3 (x, y and z)
441 15. Coded ID: 0
442 16. Number of first particle: 0
443 17. Number of particles in block: 100
444 18. For each frame (100 frames with data in this block):
446 2. For each particle (100 particles):
448 1. 32 bit float x coordinate
449 2. 32 bit float y coordinate
450 3. 32 bit float z coordinate
452 BLOCK: particle mapping block
453 -----------------------------
455 1. 64 bit number of first particle (particle number as stored in the
456 trajectory, zero based numbering) (J)
457 2. 64 bit number of particles in this particle mapping block (M)
458 3. 64 bit array of particle numbers^[[f]](#cmnt6)^ (M values):
460 1. Each value is the number of the real particle corresponding to the
461 particle number as stored in the trajectory.
463 Should no particle mapping block be present, the mapping is the number
465 of the real particle == the particle number as stored in the trajectory.
467 It is possible to have several trajectory/velocities etc. frame blocks
468 within a frame set, e.g. when faster parallel writes or memory
469 considerations are needed. In that case a separate particle mapping
470 block is needed for each of the trajectory/velocities etc. blocks.
472 Relation between trajectory blocks:
473 ===================================
475 Particle mapping blocks contain the remapping of actual particle index
476 and the particle index as appearing in the trajectory file. They are
477 optional. If they are not given, there is no remapping. All trajectory
478 blocks for the same set of particles must follow each other, i.e.
479 positions for particle 0-99, then velocities for particle 0-99, then
480 positions for particle 100-199, then velocities for particle 100-199.
481 All non-particle trajectory blocks must appear before any particle
482 containing trajectory blocks.
484 Limitations on the number of particles in trajectory frame blocks: In
485 order to be able to read and uncompress data there must be a limit on
486 the number of particles in each trajectory frame block, therefore most
487 trajectory frame sets will contain multiple particle mapping / positions
488 / velocities / ... blocks. The limit on the number of particles per
489 trajectory frame blocks should be XXXX. This should be a good value, and
490 not allowed to be set by the user, since this may prevent reading of the
491 files on smaller memory machines.
493 CODEC specifications (id) "name"
494 ================================
496 1. uncompressed (0) "UNCOMPRESSED"
497 2. XTC positions (1) "XTC"
498 3. TNG (2) "TNG"
499 4. …
501 [](#)
503 [\*\*] Storage of compressed positions / velocities / ...: These are now
504 all converted to integers before stored. In order to facilitate
505 recompression without loss of precision it is essential that these are
506 visible as integers. Therefore the compression blocks all must contain
507 somewhere a conversion factor from integer to float.
509 Notes
510 =====
512 [1] UTF-8 text string. Make all text strings zero terminated.
514 [2] 64 bit pointer format: -1UL (all ones), means "not set", which is
515 what should be written whenever a pointer needs to be written when the
516 appropriate value is not yet known, while 0 (all zeros), typically means
517 the end of the list.
519 [3] Floating point format is big-endian IEEE-754, float (32 bit) or
520 double (64 bit).
525 (The API should be separated into one high- and one low-level API, using
526 e.g. a tng\_low tag for the low-level functions.)
528 API documentation is generated using the -DTNG_BUILD_DOCUMENTATION=ON option
529 when running cmake. Requires a doxygen installation.
532 ^[[g]](#cmnt7)^
534 [[a]](#cmnt_ref1)magnus.lundborg:
536 Currently sizes and offsets are not in the TOC block. I think it needs
537 further testing to decide if it is good or not.
539 * * * * *
541 Sander Pronk:
543 So how can you find out where the block is?
545 * * * * *
547 magnus.lundborg:
549 If the offsets are not listed in the TOC block you would have to read
550 the whole frame set, or at least all the block headers in the frame set,
551 which shouldn't be too bad.
553 [[b]](#cmnt_ref2)Roland Schulz:
555 I suggest not to use a version number. This is already a problem with
556 the tpx version and branches. Instead I suggest to have a bitvector
557 where each bit says whether a certain feature is present in this file.
558 Given a central registry of meaning of the bits, this allows different
559 groups/branches/software to add features. Which would be difficult with
560 a version number approach which has an inherently linear ordering. The
561 last bit probably should be reserved to signify whether the 64bit
562 bitvector is extended by a another 64bit. The data should be stored in
563 the order of the bitvectors. A reader which doesn't support a certain
564 bit, cannot read any of the following data if the bit is on.
566 [[c]](#cmnt_ref3)magnus.lundborg:
568 We will have a problem if we want to add data to a frame set in a file.
569 All subsequent frame sets will need to be rewritten. One alternative
570 would be to have a list of pointers to each block of each block type in
571 the frame set table of contents block. But we will have a problem adding
572 rows to that block as well, which in turn could be fixed by having a
573 pointer from the frame set block to the "current" table of contents
574 block and just let the old one remain. We could actually have a flag in
575 block headers to show if the block is "up-to-date". But there is a risk
576 that these pointers will be slow - especially when it comes to writing.
578 [[d]](#cmnt_ref4)Rossen Apostolov:
580 somehow the name of the FF doesn't fit naturally with the rest of the
581 info here :)
583 How about including the simulation setup in the file in a separate
584 block? That will be needed if the file can be used for restarts too.
586 [[e]](#cmnt_ref5)magnus.lundborg:
588 This introduces a bit of redundancy, but helps keeping track of the
589 data.
591 [[f]](#cmnt_ref6)Roland Schulz:
593 might be good to make this optional. And if it isn't given then the
594 numbering is consecutive. The would still give the flexibility that one
595 can specify the first and no of particles which isn't possible without
596 index block.
598 * * * * *
600 Daniel Spångberg:
602 if this is made optional, the comment below the section can be removed.
603 and particle mapping blocks required, since it will not cost much extra
604 to have it.
606 [[g]](#cmnt_ref7)Rossen Apostolov:
608 We should think of a different name for the traj. group blcok, it's
609 confusing