README

   1 # Status of this project
   2
   3 This is an early preview. I think the structure of the API is mostly
   4 stable (ie the relationship of data types should not change radically),
   5 but names might change for better consistency and semantics (how errors
   6 are handled, side effects like logging, etc.) will change in subtle ways
   7 too. And of course lot's of symbols will be added to make a consistent
   8 and conventient interface.
   9
  10 What is currently completely missing, is algorithms for data processing
  11 (eg filtering), standard feedback loops, etc. While these are clearly
  12 within the scope of this project, it turns out many applications don't
  13 need them. Therefore implementation (and API design) will happen later.
  14 (Hopefully somehow fitting with the existing API.)
  15
  16
  17
  18 # What libautomation is: Design decisions and lessons learned
  19
  20 libautomation was created by abstracting common code from several
  21 applications of industrial and home automation. It is assumed that this
  22 code will be useful for writing more automation applications.
  23
  24 libautomation is intended to do as little as possible, relying on linux
  25 functionalities and external libraries (like libev) as much as possible.
  26 It is written entirely in C and every part of it should be easily
  27 replaceable by custom code, if the application has special needs.
  28
  29 The goal here is to make easy tasks simple and complex projects
  30 possible and easily managable.
  31
  32 As such libautomation is not only (maybe even not mainly) about the
  33 code provided, but very much about the lessons learned from writing
  34 automation applications: How to do things and what not to do because
  35 it leads to problems down the road. These lessons have been turned
  36 into design decisions.
  37
  38
  39 ## Use a system daemon instead of a watchdog
  40
  41 While it might seem tempting on embedded automation solutions to use the
  42 watchdog directly in the main application, because this automatically
  43 tests the entire software stack, this approach is too inflexible: Any
  44 additional features need to be integrated into the main application,
  45 which conflicts with trying to keep libautomation very modular.
  46
  47 Instead a system daemon (like procd from OpenWRT) should feed the
  48 watchdog and in turn monitor all applications running on the system.
  49
  50
  51 ## An automation system should be resilent against crashing/hanging processes
  52
  53 libautomation aims for custom made (home brewn) automations solutions.
  54 These often don't run on a 10k Euro PLC, yet might be used in an industrial
  55 environment with high levels of EMI, that might cause occasional hardware
  56 faults in cheap HW. It should be easy to write automation applications
  57 in such a way, that the system can recover from such errors by restarting
  58 processes or resetting the system.
  59
  60
  61 ## library functions might crash instead of fail
  62
  63 This might seem like an unconventional choice at first. But additional to
  64 the remarks about reliability above, think about the following points:
  65  * Usually nobody attends the automation system to read error messages.
  66  * If a syscall returns an error, that can be handled, the library should
  67    do it. If it can't be handled, then restarting the application or the
  68    system seems to be a plausible fix.
  69  * The library API is a lot easier to use when functions cant fail ... ;)
  70
  71 Of course the above is only a guideline, not a principle. If in doubt
  72 failing and not-failing versions of the same function should be provided.
  73
  74
  75 ## Multiple processes need to cooperate
  76
  77 There are many constraints on how an automation algorithm is split into
  78 threads and processes:
  79 * Some actions can cause interference with reading sensors, so you typically
  80   want to have everything in one thread, to control relativ timing of
  81   operations.
  82 * However not every IO can be made non-blocking, e.g. reading a sensor
  83   via sysfs might block for a long time until it runs into some timeout.
  84   Therefore at least multiple threads are needed, if not multiple processes.
  85 * Some input is expensive to read (might come over the network or a slow
  86   bus), so the effort should not be duplicated needlessly.
  87 * Sometimes it becomes desireable two run the same thread twice, controlling
  88   two machines at the same times. When people write multithreaded applications,
  89   they often don't anticipate this case, and if they do, it usually is untested
  90   and more complicated then plain running the application twice.
  91
  92 To make it possible to meet all these contraints, clearly an ability to
  93 share (sensor) data between multiple processes is necessary. To that end
  94 libautomation provides a shared memory interface.
  95
  96
  97
  98 # What to find in the directory tree
  99
 100 The repository currently contains the sources and header files of the
 101 library itself in the folder `lib' and some small demo applications,
 102 that are either helpful utilities or small automation applications
 103 in the folders `tools' and `examples'.
 104
 105
 106 ## Tools
 107
 108 ### atmdump SHMID
 109 The `atmdump' tool prints the contents of the libautomation shared memory
 110 domain SHMID in human readable form to standard output. This is mostly
 111 useful for ad-hoc testing and inspection, but also serves as a demo for
 112 shared memory clients.
 113
 114 ### atmd /path/to/configfile
 115 The `atmd' (for libautomation daemon) reads any number of data source (ie
 116 devices) from the configuration file specified on the command line and
 117 makes their periodically updated values available via shared memory. This
 118 is the demo for providing values in shared memory,
 119
 120 `atmd' is also a very important part of the libautomation ecosystem:
 121 In the case where reading a sensor might block for an unacceptable long
 122 time, `atmd' provides reading the sensor in a seperate process, without
 123 any change to the primary application.
 124
 125 Furthermore `atmd' allows organizing data sources into so called
 126 data source groups, which each can have their own policy regarding handling
 127 of errors when reading a sensor. Each data source group lives in their
 128 own config file. Config files can be loaded recursively with the
 129 `dsgrp' keyword.
 130
 131
 132 ## Examples
 133
 134 ### humiditycontrol /path/to/configfile
 135 is a small air humidity regulator. It measures the relativ humidity and
 136 temperature on two places (typically inside a building and out doors) and
 137 calculates the absolute humidities. If the absolute humidity on the out side
 138 is lower then insides, then a fan is turned on.
 139
 140 There are several configuration settings to control target values, allowed
 141 energy loss when it is cold outsides, etc. This demo is actually useful
 142 for real world applications and can easily get extended and customized to
 143 specific needs, like adding a dehumidifier, etc.
 144
 145 ### line_monitor /path/to/configfile
 146 is a building block for an alert system. It monitors some input (typically
 147 a gpio) indicating the status of some equipment. When the input changes
 148 state, it triggers execution of some external programm like a script doing
 149 a mobile call.
 150
 151 The alert command is executed periodically until either the error condition
 152 is fixed or the operator acknowledges the error.
 153
 154 This application is actually used verbatim in a district heating plant.
 155 Well, since I changed the libautomation API after deploying the system,
 156 it isn't actually verbatim any longer ...
 157
 158
 159
 160 # API Overview
 161
 162 ## Important data structures
 163
 164 ### struct ATM_VALUE
 165 A value (machine size integer) together with a timestamp, when the value
 166 has been obtained.
 167
 168 ### struct ATM_DS
 169 Data Source: Descriptor how to obtain/update a single value. This is
 170 mostly used internally. Users typically either update an ATM_VALUE
 171 manually or have this managed completely by libautomation.
 172
 173 ### struct ATM_DSGRP
 174 A data source group is a list of data sources together with a policy, how
 175 to handle errors on reading data sources. Some useful policies are
 176 predefined, however it is possible to implement arbitrary policies in the
 177 application via callback.
 178
 179 Also data source groups can be stacked by treating data source groups as
 180 pseudo data sources. This allows building complex policies: E.g. you can
 181 have an inner data source group, that retries reading a sensor three times
 182 before failing, and an outer data source group, that resets the data bus
 183 to recover from errors.
 184
 185 In the above example, you would have all sensors of the bus as children
 186 of the data source group, to prevent sensor access while the bus is down.
 187
 188 ### struct ATM_TASK
 189 A task is a repeating timer together with a data source group and an
 190 optional function. Every time the timer fires, all values in the data
 191 source group are updated. As last step the optional function is called.
 192
 193 A task can be type cast to it's ev_timer and thus can be started and
 194 stopped using the usual libev facilities.
 195
 196 libautomation automatically initializes a task with the global symbol
 197 atm_main_task. This task is started when atm_main() is called.
 198
 199 ### TODO: Write something about filters
 200
 201
 202 ## Shared memory interface
 203
 204 As noted above, sharing sensor data between applications is an important
 205 requirement for automation applications. Therefore the shared memory
 206 interface is a core part of libautomation and most features make use of
 207 it.
 208
 209 Each shared memory region has an unique id. Typically an application will
 210 create one shared memory region to export its data and connect to N other
 211 memory regions for data input. The first shared memory region created, is
 212 assigned to the global symbol `atm_shm_stdmem`.
 213
 214 Each shared memory region is organized as key-value database. Where keys
 215 can be arbitrary strings and values are of type `struct ATM_VALUE`.
 216
 217 ### Creating shared memory regions
 218 `atm_shm_create(id)` creates a new shared memory region with unique id.
 219 It is an error, if a memory regiion with the same id has already been
 220 created by an other process. If `id` is NULL, then a memory region is
 221 allocated on the heap instead, making the memory region effectifly private.
 222
 223 It is good policy to use the name of the config file as key. This is
 224 under the assumption, that to instances of the same application surely
 225 would need different config files, because they should not access the same
 226 sensors direclty.
 227
 228 ### Exporting locally calculated values
 229 `atm_shm_register(shmr, key)` registers a new key-value-pair with shared
 230 memory region `shmr` and returns a pointer to the value.
 231
 232 Use `atm_shm_update(struct ATM_VALUE *var, int value)` to automatically
 233 update the timestamp with the value.
 234
 235 ### Exporting sensor data
 236 Registering a new data source with
 237 `atm_ds_register(struct ATM_DSGRP *grp, const char *url, const char *key)'
 238 automatically registers a key-value pair in `atm_shm_stdmem`.
 239
 240 ### Reading data from shared memory
 241 `atm_ds_register(struct ATM_DSGRP *grp, const char *url, const char *key)`
 242 where `url` is of the form "shm:id/key" returns a pointer to the
 243 associated `struct ATM_VALUE`. In this case, the value of `key` is
 244 ignored and no key-value pair is exported.
 245
 246 Optionally there is also `atm_shm_get(const char *id, const char *key)`
 247 with the same effect.
 248
 249
 250 ## Some notes on time keeping
 251
 252 There are three different conventions to handle time. Sorry for the
 253 inconveniance:
 254
 255 ### ev_time from libev
 256 `ev_time` is a double precision floating point data type used by libev
 257 and typically stores the time in something close to seconds since UNIX
 258 epoch. The upside is: no overflow danger and good precision at the same
 259 time. The downside is: floating point arithmetic. :-(
 260
 261 Since we are using libev, we have to use this.
 262
 263 ### atm_time, atm_timestamp()
 264 This uses native integer variables. `atm_timestamp()` returns the time
 265 since booting the system in 1/10 seconds. This also is quite safe from
 266 overflows (assuming at least 32bit integers) and precise enough to keep
 267 track of typical hardware like relais.
 268
 269 `atm_time` is a global variable and automatically set from `atm_timestamp()`
 270 each time an ATM_TASK is run. The idea is to have an rough estimate of the
 271 current time without having to force a context switch (i.e. calling into the
 272 OS) all the time.
 273
 274 ### ATM_TIMER_RES
 275 This macro is defined to 0.001, meaning one millisecond.
 276
 277 Data sources (or actually data source groups) return a positive integer
 278 value when they need to get called again to complete their operation.
 279 (E.g. because they had to reset some bus and need to wait for everything to
 280 initialize.) This return value times `ATM_TIMER_RES` is the waiting time
 281 until resuming the task.
 282
 283 This allows for more fine grained control then 1/10 of a second. Think of
 284 1/10 second as the fastest sensible interval to repeat a task (but nothing
 285 stops you, from repeating faster). Then obviously interruptions of tasks
 286 have to support a much shorter time scale.
 287
 288
 289
 290 # Getting in touch
 291
 292 Libautomation is currently hosted at https://repo.or.cz/libautomation.git -
 293 please ask me if you want push access. I'm easily reachable via e-mail.
 294
 295 If there is sufficient interest, I can open a mailinglist for this
 296 project, but at the moment you need to send all questions and bug
 297 reports to me personally.