doc/warehouse_api_final.md

   1 Image Warehouse API
   2 ===================
   3
   4 Version 0.8
   5 -----------
   6
   7 This is my attempt to document the current state of the REST API
   8 to the image warehouse, in its proposed final state and (in an
   9 appendix) in its current messy state. It does not cover authentication,
  10 fully dynamic configuration, or lesser items such as
  11
  12 * reverse replication (pull from slave/downstream warehouse);
  13
  14 * direct copy;
  15
  16 * cache control;
  17
  18 * HTTP chunked encoding.
  19
  20
  21
  22 In general, data other than object bodies can be returned in either
  23 XML or JSON format, defaulting to XML unless an "Accept" header
  24 containing "/json" is present.
  25
  26 For examples, the convention in this document is for the first
  27 line of an indented block to be the command you would issue, while
  28 the remainder is the output you might expect.
  29
  30 API Root Operations
  31 -------------------
  32
  33 The only operation for the API root is to fetch information about
  34 other API components, including buckets and special endpoints
  35 such as the provider list. In other words, the
  36
  37                 $ curl http://fserver-1:9090
  38                 <api service="image_warehouse" version="1.0">
  39                         <bucket_factory path="http://fserver-1:9090/_new"/>
  40                         <provider_list path="http://fserver-1:9090/_providers"/>
  41                         <bucket path="http://fserver-1:9090/junk2"/>
  42                         <bucket path="http://fserver-1:9090/data"/>
  43                 </api>
  44
  45 The "service" and "version" attributes identify this version
  46 of the warehouse API. Special API endpoints are distinguished
  47 by a leading underscore, as with the "bucket_factory" endpoint
  48 for creating new buckets and the "providers" endpoint for manipulating
  49 cloud-provider information. The remainder are actual buckets.
  50
  51 Provider Operations
  52 -------------------
  53
  54 It is possible to list providers, and to change login credentials
  55 for those providers. Listing is very simple:
  56
  57                 $ curl http://fserver-1:9090/_providers
  58                 <providers>
  59                         <provider name="my tabled">
  60                                 <type>s3</type>
  61                                 <port>80</port>
  62                                 <username>foo</username>
  63                                 <password>bar</password>
  64                         </provider>
  65                         <provider name="backup">
  66                                 <type>http</type>
  67                                 <host>localhost</host>
  68                                 <port>9091</port>
  69                         </provider>
  70                 </providers>
  71
  72 This shows two providers, named "my tabled" (our primary/local
  73 store) and "backup" (a secondary/remote store). The types can
  74 be:
  75
  76 * http: our own API as described in this document
  77
  78 * s3: S3 - includes Amazon S3, tabled, Walrus, ParkPlace, Google
  79   Storage
  80
  81 * cf: CloudFiles or OpenStack Storage ("swift")
  82
  83
  84
  85 For the time being, "s3" is the only fully functional type for
  86 a primary store, while any type can be used for a secondary store.
  87 Slave stores can also be started with the "-f" flag which uses
  88 a directory as a primary store but does no metadata/replication
  89 operations. Eventually, all of these options - including a directory
  90 on a local or distributed filesystem - will be supported as either
  91 primary or secondary stores.
  92
  93 The only modifying operation for providers is an update of the
  94 username and password (must be both at once). For example:
  95
  96                 $ curl -d provider="my tabled" -d username=yyy -d password=zzz \
  97 http://fserver-1:9090/_providers
  98
  99 Bucket Operations
 100 -----------------
 101
 102 Buckets can be created, listed, and deleted. The create command
 103 is like this (using POST).
 104
 105                 $ curl -d name=my_bucket http://fserver-1:9090/_new
 106
 107 Deletion requires that the bucket be empty, but is similarly
 108 simple.
 109
 110                 $ curl -X DELETE http://fserver-1:9090/my_bucket
 111
 112 Here's a listing of a bucket's contents, using JSON just for variety.
 113
 114                 $ curl -H "Accept: */json" http://fserver-1:9090/my_bucket
 115                 [
 116                         {
 117                                 "type": "query",
 118                                 "path": "http://fserver-1:9090/my_bucket/_query"
 119                         {
 120                                 "type": "object",
 121                                 "name": "file1",
 122                                 "path": "http://fserver-1:9090/my_bucket/file1"
 123                         },
 124                         {
 125                                 "type": "object",
 126                                 "name": "file2",
 127                                 "path": "http://fserver-1:9090/my_bucket/file1"
 128                         }
 129                 ]
 130
 131 The query object is used to do complex queries, which will be described
 132 later. The remainder are regular objects.
 133
 134 Object and Attribute Operations
 135 -------------------------------
 136
 137 Objects are represented as small directory trees, with several
 138 elements as shown here:
 139
 140                 $ curl http://fserver-1:9090/my_bucket/file1
 141                 <object>
 142                         <object_body path="http://fserver-1:9090/my_bucket/file1/body"/>
 143                         <object_attr_list path="http://fserver-1:9090/my_bucket/file1/attrs"/>
 144                         <object_attr name="xyz" path="http://fserver-1:9090/my_bucket/file1/attr_xyz"/>
 145                 </object>
 146
 147 The object body can be stored and retrieved using PUT and GET respectively,
 148 and can have any HTTP/MIME type. The attribute-list element
 149 can be used to fetch or set multiple attributes - including values
 150 - at once. To fetch:
 151
 152                 $ curl http://fserver-1:9090/my_bucket/file1/attrs
 153                 <attributes>
 154                         <attribute name="color">blue</attribute>
 155                         <attribute name="flavor">lemon</attribute>
 156                 </attributes>
 157
 158 To set both of these attributes at once:
 159
 160                 $ curl -d color="blue" -d flavor="lemon" http://fserver-1:9090/my_bucket/file1/attrs
 161
 162 Single-attribute operations are also supported. To fetch a
 163 single attribute:
 164
 165                 $ curl http://fserver-1:9090/my_bucket/file1/attr_color
 166                 <attribute name="color">blue</attribute>
 167
 168 The attribute can also be set with a PUT to the same URL.
 169
 170                 $ printf green | curl -T - http://fserver-1:9090/my_bucket/file1/color
 171
 172 Lastly, objects and attributes can be deleted (object deletes
 173 are propagated to secondary warehouses).
 174
 175                 $ curl -X DELETE http://fserver-1:9090/my_bucket/file2
 176
 177 Queries
 178 -------
 179
 180 Queries are supported as in the design doc. Queries can contain
 181 the following features, which are also supported for evaluating
 182 replication policies:
 183
 184 * Literal integers, strings, and dates
 185
 186 * Object-attribute access: $attr
 187
 188 * Indirect object-attribute access: @link_on_cur_obj.link_target_attr
 189
 190 * Site-attribute access (for replication policies only):
 191   #attr
 192
 193 * Comparisons: <, <=, ==, !=, >=, >
 194
 195 * Booleans: &&, ||, !
 196
 197
 198
 199 The syntax to issue a query is as follows.
 200
 201                 $ curl -d '($color == "green") && ($flavor == "lemon")' \
 202                 http://fserver-1:9090/my_bucket/_query
 203                 <objects>
 204                         <object>
 205                                 <bucket>my_bucket</bucket>
 206                                 <key>file1</key>
 207                         </object>
 208                 </objects>
 209
 210 Replication Policies
 211 -------------------------------------------
 212
 213 Replication policies are stored as "_policy" attributes on
 214 objects. To set a policy, use the same mechanism as for other attributes.
 215
 216                 $ printf '$color == "green"' | curl -T - http://fserver-1:9090/my_bucket/file1/_policy
 217
 218 This will cause the warehouse daemon to replicate to all secondary
 219 warehouses whenever the object is changed (including attribute
 220 changes) subsequently. You probably want to set the policy first,
 221 before sending the body, and this is entirely allowable using
 222 any of the attribute-setting mechanisms described above; this
 223 would result in an empty object being created, then the subsequent
 224 body PUT will be replicated. The above example is probably not
 225 what you want for two other reasons:
 226
 227 1. Because the policy only refers to object attributes, it will
 228    replicate to all secondary warehouses.
 229
 230 2. It's cumbersome and inefficient to set separate replication
 231    policies for every object individually.
 232
 233
 234
 235 To specify selective replication, matching object atttributes
 236 with secondary-warehouse attributes, you would do this instead.
 237
 238                 $ printf '$color == #color' | curl -T - http://fserver-1:9090/my_bucket/file1/_policy
 239
 240 To set a default replication policy for all objects within a bucket,
 241 use the "_default" pseudo-object.
 242
 243                 $ printf '$color == #color' | curl -T - http://fserver-1:9090/my_bucket/_default/_policy
 244
 245 This will cause any modification to a green object to be replicated
 246 to green remote warehouses any time they are changed, but will
 247 not affect blue objects or purple warehouses. Note that the default
 248 replication policy for a bucket is overridden by any specific
 249 per-object policy.
 250
 251 Appendix 1: Major Divergences
 252 -----------------------------
 253
 254 The current code doesn't implement exactly the API described
 255 above. There are many differences in the exact format of data
 256 returned for the API root, provider list, or object listings.
 257 More importantly, the actual URLs and methods used for various
 258 operations are still pending reconciliation with what's described
 259 here. Here are the current equivalents, in approximately the
 260 same order as mentioned above:
 261
 262 * bucket creation: PUT on .../my_bucket
 263
 264 * object-body fetch: GET on .../my_bucket/file1
 265
 266 * object-body store: PUT on .../my_bucket/file1
 267
 268 * multi-attribute set: POST on .../my_bucket with key=file1
 269
 270 * bucket and attribute deletes are not yet implemented
 271
 272
 273
 274 There are also a couple of special control operations, implemented
 275 as POST methods on the object. The first of these is to force re-evaluation
 276 of the relevant replication policies and trigger re-replication
 277 to appropriate remote warehouses (equivalent to a PUT on the
 278 object body except that there's no data transfer from the client).
 279
 280                 $ curl -d op=push http://fserver-1:9090/my_bucket/file1
 281
 282 The second control operation is used to determine whether replication
 283 to a specific remote warehouse has finished.
 284
 285                 $ curl -d op=check loc=backup http://fserver-1:9090/my_bucket/file1
 286
 287 This will return a 404 (Not Found) if the object has not been replicated
 288 to that location, or a 200 (OK) if it has.
 289
 290 Appendix 2: JSON Configuration Format
 291 -------------------------------------
 292
 293 The initial configuration for the image warehouse is pulled
 294 from a JSON configuration file, repo.json in the current directory
 295 by default. This defines a set of required attributes plus any
 296 others that the user might want to use in replication policies.
 297 Here's an example:
 298
 299                 [
 300                         {
 301                                 "name": "my tabled",
 302                                 "type": "s3",
 303                                 "host": "localhost",
 304                                 "port": 80,
 305                                 "key": "foo",
 306                                 "secret": "bar",
 307                                 "color": "blue"
 308                         },
 309                         {
 310                                 "name": "backup",
 311                                 "type": "http",
 312                                 "host": "localhost",
 313                                 "port": 9091
 314                         }
 315                 ]
 316
 317 This defines a primary (local) warehouse named "my tabled" which
 318 is using S3 on localhost. In this case the user name and password
 319 are required - named "key" and "secret" in the file for legacy
 320 reasons. We also have a secondary (remote) warehouse named "backup"
 321 that we'll replicate to, and we don't care what back end it uses.
 322 Since our interface to it is our own HTTP-based protocol, we don't
 323 (currently) need a user name and password. Lastly, we've defined
 324 our own "color" attribute to be used in making replication decisions.