sqlite3/www/version3.tcl

   1 #!/usr/bin/tclsh
   2 source common.tcl
   3 header {SQLite Version 3 Overview}
   4 puts {
   5 <h2>SQLite Version 3 Overview</h2>
   6
   7 <p>
   8 SQLite version 3.0 introduces important changes to the library, including:
   9 </p>
  10
  11 <ul>
  12 <li>A more compact format for database files.</li>
  13 <li>Manifest typing and BLOB support.</li>
  14 <li>Support for both UTF-8 and UTF-16 text.</li>
  15 <li>User-defined text collating sequences.</li>
  16 <li>64-bit ROWIDs.</li>
  17 <li>Improved Concurrency.</li>
  18 </ul>
  19
  20 <p>
  21 This document is a quick introduction to the changes for SQLite 3.0
  22 for users who are already familiar with SQLite version 2.8.
  23 </p>
  24
  25 <h3>Naming Changes</h3>
  26
  27 <p>
  28 SQLite version 2.8 will continue to be supported with bug fixes
  29 for the foreseeable future.  In order to allow SQLite version 2.8
  30 and SQLite version 3.0 to peacefully coexist, the names of key files
  31 and APIs in SQLite version 3.0 have been changed to include the
  32 character "3".  For example, the include file used by C programs
  33 has been changed from "sqlite.h" to "sqlite3.h".  And the name of
  34 the shell program used to interact with databases has been changed
  35 from "sqlite.exe" to "sqlite3.exe".  With these changes, it is possible
  36 to have both SQLite 2.8 and SQLite 3.0 installed on the same system at
  37 the same time.  And it is possible for the same C program to link
  38 against both SQLite 2.8 and SQLite 3.0 at the same time and to use
  39 both libraries at the same time.
  40 </p>
  41
  42 <h3>New File Format</h3>
  43
  44 <p>
  45 The format used by SQLite database files has been completely revised.
  46 The old version 2.1 format and the new 3.0 format are incompatible with
  47 one another.  Version 2.8 of SQLite will not read a version 3.0 database
  48 files and version 3.0 of SQLite will not read a version 2.8 database file.
  49 </p>
  50
  51 <p>
  52 To convert an SQLite 2.8 database into an SQLite 3.0 database, have
  53 ready the command-line shells for both version 2.8 and 3.0.  Then
  54 enter a command like the following:
  55 </p>
  56
  57 <blockquote><pre>
  58 sqlite OLD.DB .dump | sqlite3 NEW.DB
  59 </pre></blockquote>
  60
  61 <p>
  62 The new database file format uses B+trees for tables.  In a B+tree, all
  63 data is stored in the leaves of the tree instead of in both the leaves and
  64 the intermediate branch nodes.  The use of B+trees for tables allows for
  65 better scalability and the storage of larger data fields without the use of
  66 overflow pages.  Traditional B-trees are still used for indices.</p>
  67
  68 <p>
  69 The new file format also supports variable pages sizes between 512 and
  70 32768 bytes.  The size of a page is stored in the file header so the
  71 same library can read databases with different pages sizes, in theory,
  72 though this feature has not yet been implemented in practice.
  73 </p>
  74
  75 <p>
  76 The new file format omits unused fields from its disk images.  For example,
  77 indices use only the key part of a B-tree record and not the data.  So
  78 for indices, the field that records the length of the data is omitted.
  79 Integer values such as the length of key and data are stored using
  80 a variable-length encoding so that only one or two bytes are required to
  81 store the most common cases but up to 64-bits of information can be encoded
  82 if needed.
  83 Integer and floating point data is stored on the disk in binary rather
  84 than being converted into ASCII as in SQLite version 2.8.
  85 These changes taken together result in database files that are typically
  86 25% to 35% smaller than the equivalent files in SQLite version 2.8.
  87 </p>
  88
  89 <p>
  90 Details of the low-level B-tree format used in SQLite version 3.0 can
  91 be found in header comments to the
  92 <a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/btree.c">btree.c</a>
  93 source file.
  94 </p>
  95
  96 <h3>Manifest Typing and BLOB Support</h3>
  97
  98 <p>
  99 SQLite version 2.8 will deal with data in various formats internally,
 100 but when writing to the disk or interacting through its API, SQLite 2.8
 101 always converts data into ASCII text.  SQLite 3.0, in contrast, exposes
 102 its internal data representations to the user and stores binary representations
 103 to disk when appropriate.  The exposing of non-ASCII representations was
 104 added in order to support BLOBs.
 105 </p>
 106
 107 <p>
 108 SQLite version 2.8 had the feature that any type of data could be stored
 109 in any table column regardless of the declared type of that column.  This
 110 feature is retained in version 3.0, though in a slightly modified form.
 111 Each table column will store any type of data, though columns have an
 112 affinity for the format of data defined by their declared datatype.
 113 When data is inserted into a column, that column will make at attempt
 114 to convert the data format into the columns declared type.   All SQL
 115 database engines do this.  The difference is that SQLite 3.0 will
 116 still store the data even if a format conversion is not possible.
 117 </p>
 118
 119 <p>
 120 For example, if you have a table column declared to be of type "INTEGER"
 121 and you try to insert a string, the column will look at the text string
 122 and see if it looks like a number.  If the string does look like a number
 123 it is converted into a number and into an integer if the number does not
 124 have a fractional part, and stored that way.  But if the string is not
 125 a well-formed number it is still stored as a string.  A column with a
 126 type of "TEXT" tries to convert numbers into an ASCII-Text representation
 127 before storing them.  But BLOBs are stored in TEXT columns as BLOBs because
 128 you cannot in general convert a BLOB into text.
 129 </p>
 130
 131 <p>
 132 In most other SQL database engines the datatype is associated with
 133 the table column that holds the data - with the data container.
 134 In SQLite 3.0, the datatype is associated with the data itself, not
 135 with its container.
 136 <a href="http://www.paulgraham.com/">Paul Graham</a> in his book
 137 <a href="http://www.paulgraham.com/acl.html"><i>ANSI Common Lisp</i></a>
 138 calls this property "Manifest Typing".
 139 Other writers have other definitions for the term "manifest typing",
 140 so beware of confusion.  But by whatever name, that is the datatype
 141 model supported by SQLite 3.0.
 142 </p>
 143
 144 <p>
 145 Additional information about datatypes in SQLite version 3.0 is
 146 available
 147 <a href="datatype3.html">separately</a>.
 148 </p>
 149
 150 <h3>Support for UTF-8 and UTF-16</h3>
 151
 152 <p>
 153 The new API for SQLite 3.0 contains routines that accept text as
 154 both UTF-8 and UTF-16 in the native byte order of the host machine.
 155 Each database file manages text as either UTF-8, UTF-16BE (big-endian),
 156 or UTF-16LE (little-endian).  Internally and in the disk file, the
 157 same text representation is used everywhere.  If the text representation
 158 specified by the database file (in the file header) does not match
 159 the text representation required by the interface routines, then text
 160 is converted on-the-fly.
 161 Constantly converting text from one representation to another can be
 162 computationally expensive, so it is suggested that programmers choose a
 163 single representation and stick with it throughout their application.
 164 </p>
 165
 166 <p>
 167 In the current implementation of SQLite, the SQL parser only works
 168 with UTF-8 text.  So if you supply UTF-16 text it will be converted.
 169 This is just an implementation issue and there is nothing to prevent
 170 future versions of SQLite from parsing UTF-16 encoded SQL natively.
 171 </p>
 172
 173 <p>
 174 When creating new user-defined SQL functions and collating sequences,
 175 each function or collating sequence can specify it if works with
 176 UTF-8, UTF-16be, or UTF-16le.  Separate implementations can be registered
 177 for each encoding.   If an SQL function or collating sequences is required
 178 but a version for the current text encoding is not available, then
 179 the text is automatically converted.  As before, this conversion takes
 180 computation time, so programmers are advised to pick a single
 181 encoding and stick with it in order to minimize the amount of unnecessary
 182 format juggling.
 183 </p>
 184
 185 <p>
 186 SQLite is not particular about the text it receives and is more than
 187 happy to process text strings that are not normalized or even
 188 well-formed UTF-8 or UTF-16.  Thus, programmers who want to store
 189 IS08859 data can do so using the UTF-8 interfaces.  As long as no
 190 attempts are made to use a UTF-16 collating sequence or SQL function,
 191 the byte sequence of the text will not be modified in any way.
 192 </p>
 193
 194 <h3>User-defined Collating Sequences</h3>
 195
 196 <p>
 197 A collating sequence is just a defined order for text.  When SQLite 3.0
 198 sorts (or uses a comparison operator like "<" or ">=") the sort order
 199 is first determined by the data type.
 200 </p>
 201
 202 <ul>
 203 <li>NULLs sort first</li>
 204 <li>Numeric values sort next in numerical order</li>
 205 <li>Text values come after numerics</li>
 206 <li>BLOBs sort last</li>
 207 </ul>
 208
 209 <p>
 210 Collating sequences are used for comparing two text strings.
 211 The collating sequence does not change the ordering of NULLs, numbers,
 212 or BLOBs, only text.
 213 </p>
 214
 215 <p>
 216 A collating sequence is implemented as a function that takes the
 217 two strings being compared as inputs and returns negative, zero, or
 218 positive if the first string is less than, equal to, or greater than
 219 the first.
 220 SQLite 3.0 comes with a single built-in collating sequence named "BINARY"
 221 which is implemented using the memcmp() routine from the standard C library.
 222 The BINARY collating sequence works well for English text.  For other
 223 languages or locales, alternative collating sequences may be preferred.
 224 </p>
 225
 226 <p>
 227 The decision of which collating sequence to use is controlled by the
 228 COLLATE clause in SQL.  A COLLATE clause can occur on a table definition,
 229 to define a default collating sequence to a table column, or on field
 230 of an index, or in the ORDER BY clause of a SELECT statement.
 231 Planned enhancements to SQLite are to include standard CAST() syntax
 232 to allow the collating sequence of an expression to be defined.
 233 </p>
 234
 235 <h3>64-bit ROWIDs</h3>
 236
 237 <p>
 238 Every row of a table has a unique rowid.
 239 If the table defines a column with the type "INTEGER PRIMARY KEY" then that
 240 column becomes an alias for the rowid.  But with or without an INTEGER PRIMARY
 241 KEY column, every row still has a rowid.
 242 </p>
 243
 244 <p>
 245 In SQLite version 3.0, the rowid is a 64-bit signed integer.
 246 This is an expansion of SQLite version 2.8 which only permitted
 247 rowids of 32-bits.
 248 </p>
 249
 250 <p>
 251 To minimize storage space, the 64-bit rowid is stored as a variable length
 252 integer.  Rowids between 0 and 127 use only a single byte.
 253 Rowids between 0 and 16383 use just 2 bytes.  Up to 2097152 uses three
 254 bytes.  And so forth.  Negative rowids are allowed but they always use
 255 nine bytes of storage and so their use is discouraged.  When rowids
 256 are generated automatically by SQLite, they will always be non-negative.
 257 </p>
 258
 259 <h3>Improved Concurrency</h3>
 260
 261 <p>
 262 SQLite version 2.8 allowed multiple simultaneous readers or a single
 263 writer but not both.  SQLite version 3.0 allows one process to begin
 264 writing the database while other processes continue to read.  The
 265 writer must still obtain an exclusive lock on the database for a brief
 266 interval in order to commit its changes, but the exclusive lock is no
 267 longer required for the entire write operation.
 268 A <a href="lockingv3.html">more detailed report</a> on the locking
 269 behavior of SQLite version 3.0 is available separately.
 270 </p>
 271
 272 <p>
 273 A limited form of table-level locking is now also available in SQLite.
 274 If each table is stored in a separate database file, those separate
 275 files can be attached to the main database (using the ATTACH command)
 276 and the combined databases will function as one.  But locks will only
 277 be acquired on individual files as needed.  So if you redefine "database"
 278 to mean two or more database files, then it is entirely possible for
 279 two processes to be writing to the same database at the same time.
 280 To further support this capability, commits of transactions involving
 281 two or more ATTACHed database are now atomic.
 282 </p>
 283
 284 <h3>Credits</h3>
 285
 286 <p>
 287 SQLite version 3.0 is made possible in part by AOL developers
 288 supporting and embracing great Open-Source Software.
 289 </p>
 290
 291
 292 }
 293 footer {$Id: version3.tcl,v 1.5 2005/03/31 21:02:46 drh Exp $}