doc/src/sgml/xtypes.sgml

   1 <!-- $PostgreSQL$ -->
   2
   3  <sect1 id="xtypes">
   4   <title>User-Defined Types</title>
   5
   6   <indexterm zone="xtypes">
   7    <primary>data type</primary>
   8    <secondary>user-defined</secondary>
   9   </indexterm>
  10
  11   <para>
  12    As described in <xref linkend="extend-type-system">,
  13    <productname>PostgreSQL</productname> can be extended to support new
  14    data types.  This section describes how to define new base types,
  15    which are data types defined below the level of the <acronym>SQL</>
  16    language.  Creating a new base type requires implementing functions
  17    to operate on the type in a low-level language, usually C.
  18   </para>
  19
  20   <para>
  21    The examples in this section can be found in
  22    <filename>complex.sql</filename> and <filename>complex.c</filename>
  23    in the <filename>src/tutorial</> directory of the source distribution.
  24    See the <filename>README</> file in that directory for instructions
  25    about running the examples.
  26   </para>
  27
  28  <para>
  29   <indexterm>
  30    <primary>input function</primary>
  31   </indexterm>
  32   <indexterm>
  33    <primary>output function</primary>
  34   </indexterm>
  35   A user-defined type must always have input and output
  36   functions.<indexterm><primary>input function</primary><secondary>of
  37   a data type</secondary></indexterm><indexterm><primary>output
  38   function</primary><secondary>of a data type</secondary></indexterm>
  39   These functions determine how the type appears in strings (for input
  40   by the user and output to the user) and how the type is organized in
  41   memory.  The input function takes a null-terminated character string
  42   as its argument and returns the internal (in memory) representation
  43   of the type.  The output function takes the internal representation
  44   of the type as argument and returns a null-terminated character
  45   string.  If we want to do anything more with the type than merely
  46   store it, we must provide additional functions to implement whatever
  47   operations we'd like to have for the type.
  48  </para>
  49
  50  <para>
  51   Suppose we want to define a type <type>complex</> that represents
  52   complex numbers. A natural way to represent a complex number in
  53   memory would be the following C structure:
  54
  55 <programlisting>
  56 typedef struct Complex {
  57     double      x;
  58     double      y;
  59 } Complex;
  60 </programlisting>
  61
  62   We will need to make this a pass-by-reference type, since it's too
  63   large to fit into a single <type>Datum</> value.
  64  </para>
  65
  66  <para>
  67   As the external string representation of the type, we choose a
  68   string of the form <literal>(x,y)</literal>.
  69  </para>
  70
  71  <para>
  72   The input and output functions are usually not hard to write,
  73   especially the output function.  But when defining the external
  74   string representation of the type, remember that you must eventually
  75   write a complete and robust parser for that representation as your
  76   input function.  For instance:
  77
  78 <programlisting><![CDATA[
  79 PG_FUNCTION_INFO_V1(complex_in);
  80
  81 Datum
  82 complex_in(PG_FUNCTION_ARGS)
  83 {
  84     char       *str = PG_GETARG_CSTRING(0);
  85     double      x,
  86                 y;
  87     Complex    *result;
  88
  89     if (sscanf(str, " ( %lf , %lf )", &x, &y) != 2)
  90         ereport(ERROR,
  91                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  92                  errmsg("invalid input syntax for complex: \"%s\"",
  93                         str)));
  94
  95     result = (Complex *) palloc(sizeof(Complex));
  96     result->x = x;
  97     result->y = y;
  98     PG_RETURN_POINTER(result);
  99 }
 100 ]]>
 101 </programlisting>
 102
 103   The output function can simply be:
 104
 105 <programlisting><![CDATA[
 106 PG_FUNCTION_INFO_V1(complex_out);
 107
 108 Datum
 109 complex_out(PG_FUNCTION_ARGS)
 110 {
 111     Complex    *complex = (Complex *) PG_GETARG_POINTER(0);
 112     char       *result;
 113
 114     result = (char *) palloc(100);
 115     snprintf(result, 100, "(%g,%g)", complex->x, complex->y);
 116     PG_RETURN_CSTRING(result);
 117 }
 118 ]]>
 119 </programlisting>
 120  </para>
 121
 122  <para>
 123   You should be careful to make the input and output functions inverses of
 124   each other.  If you do not, you will have severe problems when you
 125   need to dump your data into a file and then read it back in.  This
 126   is a particularly common problem when floating-point numbers are
 127   involved.
 128  </para>
 129
 130  <para>
 131   Optionally, a user-defined type can provide binary input and output
 132   routines.  Binary I/O is normally faster but less portable than textual
 133   I/O.  As with textual I/O, it is up to you to define exactly what the
 134   external binary representation is.  Most of the built-in data types
 135   try to provide a machine-independent binary representation.  For
 136   <type>complex</type>, we will piggy-back on the binary I/O converters
 137   for type <type>float8</>:
 138
 139 <programlisting><![CDATA[
 140 PG_FUNCTION_INFO_V1(complex_recv);
 141
 142 Datum
 143 complex_recv(PG_FUNCTION_ARGS)
 144 {
 145     StringInfo  buf = (StringInfo) PG_GETARG_POINTER(0);
 146     Complex    *result;
 147
 148     result = (Complex *) palloc(sizeof(Complex));
 149     result->x = pq_getmsgfloat8(buf);
 150     result->y = pq_getmsgfloat8(buf);
 151     PG_RETURN_POINTER(result);
 152 }
 153
 154 PG_FUNCTION_INFO_V1(complex_send);
 155
 156 Datum
 157 complex_send(PG_FUNCTION_ARGS)
 158 {
 159     Complex    *complex = (Complex *) PG_GETARG_POINTER(0);
 160     StringInfoData buf;
 161
 162     pq_begintypsend(&buf);
 163     pq_sendfloat8(&buf, complex->x);
 164     pq_sendfloat8(&buf, complex->y);
 165     PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
 166 }
 167 ]]>
 168 </programlisting>
 169  </para>
 170
 171  <para>
 172   Once we have written the I/O functions and compiled them into a shared
 173   library, we can define the <type>complex</type> type in SQL.
 174   First we declare it as a shell type:
 175
 176 <programlisting>
 177 CREATE TYPE complex;
 178 </programlisting>
 179
 180   This serves as a placeholder that allows us to reference the type while
 181   defining its I/O functions.  Now we can define the I/O functions:
 182
 183 <programlisting>
 184 CREATE FUNCTION complex_in(cstring)
 185     RETURNS complex
 186     AS '<replaceable>filename</replaceable>'
 187     LANGUAGE C IMMUTABLE STRICT;
 188
 189 CREATE FUNCTION complex_out(complex)
 190     RETURNS cstring
 191     AS '<replaceable>filename</replaceable>'
 192     LANGUAGE C IMMUTABLE STRICT;
 193
 194 CREATE FUNCTION complex_recv(internal)
 195    RETURNS complex
 196    AS '<replaceable>filename</replaceable>'
 197    LANGUAGE C IMMUTABLE STRICT;
 198
 199 CREATE FUNCTION complex_send(complex)
 200    RETURNS bytea
 201    AS '<replaceable>filename</replaceable>'
 202    LANGUAGE C IMMUTABLE STRICT;
 203 </programlisting>
 204  </para>
 205
 206  <para>
 207   Finally, we can provide the full definition of the data type:
 208 <programlisting>
 209 CREATE TYPE complex (
 210    internallength = 16,
 211    input = complex_in,
 212    output = complex_out,
 213    receive = complex_recv,
 214    send = complex_send,
 215    alignment = double
 216 );
 217 </programlisting>
 218  </para>
 219
 220  <para>
 221   When you define a new base type,
 222   <productname>PostgreSQL</productname> automatically provides support
 223   for arrays of that
 224   type.<indexterm><primary>array</primary><secondary>of user-defined
 225   type</secondary></indexterm>  The array type typically
 226   has the same name as the base type with the underscore character
 227   (<literal>_</>) prepended.
 228  </para>
 229
 230  <para>
 231   Once the data type exists, we can declare additional functions to
 232   provide useful operations on the data type.  Operators can then be
 233   defined atop the functions, and if needed, operator classes can be
 234   created to support indexing of the data type.  These additional
 235   layers are discussed in following sections.
 236  </para>
 237
 238  <para>
 239    <indexterm>
 240     <primary>TOAST</primary>
 241     <secondary>and user-defined types</secondary>
 242    </indexterm>
 243   If the values of your data type vary in size (in internal form), you should
 244   make the data type <acronym>TOAST</>-able (see <xref
 245   linkend="storage-toast">). You should do this even if the data are always
 246   too small to be compressed or stored externally, because
 247   <acronym>TOAST</> can save space on small data too, by reducing header
 248   overhead.
 249  </para>
 250
 251  <para>
 252   To do this, the internal representation must follow the standard layout for
 253   variable-length data: the first four bytes must be a <type>char[4]</type>
 254   field which is never accessed directly (customarily named
 255   <structfield>vl_len_</>). You
 256   must use <function>SET_VARSIZE()</function> to store the size of the datum
 257   in this field and <function>VARSIZE()</function> to retrieve it. The C
 258   functions operating on the data type must always be careful to unpack any
 259   toasted values they are handed, by using <function>PG_DETOAST_DATUM</>.
 260   (This detail is customarily hidden by defining type-specific
 261   <function>GETARG_DATATYPE_P</function> macros.) Then, when running the
 262   <command>CREATE TYPE</command> command, specify the internal length as
 263   <literal>variable</> and select the appropriate storage option.
 264  </para>
 265
 266  <para>
 267   If the alignment is unimportant (either just for a specific function or
 268   because the data type specifies byte alignment anyways) then it's possible
 269   to avoid some of the overhead of <function>PG_DETOAST_DATUM</>. You can use
 270   <function>PG_DETOAST_DATUM_PACKED</> instead (customarily hidden by
 271   defining a <function>GETARG_DATATYPE_PP</> macro) and using the macros
 272   <function>VARSIZE_ANY_EXHDR</> and <function>VARDATA_ANY</> to access
 273   a potentially-packed datum.
 274   Again, the data returned by these macros is not aligned even if the data
 275   type definition specifies an alignment. If the alignment is important you
 276   must go through the regular <function>PG_DETOAST_DATUM</> interface.
 277  </para>
 278
 279  <note>
 280   <para>
 281    Older code frequently declares <structfield>vl_len_</> as an
 282    <type>int32</> field instead of <type>char[4]</>.  This is OK as long as
 283    the struct definition has other fields that have at least <type>int32</>
 284    alignment.  But it is dangerous to use such a struct definition when
 285    working with a potentially unaligned datum; the compiler may take it as
 286    license to assume the datum actually is aligned, leading to core dumps on
 287    architectures that are strict about alignment.
 288   </para>
 289  </note>
 290
 291  <para>
 292   For further details see the description of the
 293   <xref linkend="sql-createtype" endterm="sql-createtype-title"> command.
 294  </para>
 295 </sect1>