1 <title>sqlgg: SQL Guided (code) Generator</title>
2 <span class="right border">[ygrek.org.ua](http://ygrek.org.ua/) / [p](http://ygrek.org.ua/p/) / [sqlgg](http://ygrek.org.ua/p/sqlgg/)</span>
4 sqlgg: SQL Guided (code) Generator
5 ==================================
7 *SQL query parser and binding code generator for C#, C++, Java, Objective Caml.*
10 * [Solution](#solution)
11 * [Complete example](#example)
13 * [Download](#download)
20 Writing database layer code is usually tedious and error-prone, due to the mix of different
21 languages. SQL queries constructed dynamically need to bind external data (from application), and
22 the resulting rowset must be decomposed into application native data. The impedance mismatch between
23 database layer and application code, as well as the absence of static (compile-time) checking,
24 frequently leads to runtime failures (and sometimes even security holes). One
25 can factor out all common database communication code, hide the database under some
26 application-specific abstraction, but one always needs to manually specify correspondence between
27 SQL query binding slots (or resulting rowset columns) and code variables. This mapping should be
28 updated manually every time SQL query is modified.
34 SQL query parser and code generator which ensures that application code and database queries are in
35 sync. It analyzes SQL query and determines the set of input parameters (values for INSERT, run-time
36 substitution parameters) and the set of resulting columns (for SELECT). Then it generates the code
37 in host language, mapping query parameters on function arguments with corresponding native data
38 types. So basically you provide an SQL query and generator creates a function which takes the set of
39 typed parameters as required to fill slots in a query. Generated code binds provided parameters into
40 query and executes it. The code for SELECT statements additionally invokes a strongly-typed callback
41 function for every row of the resulting rowset. The most fruitful consequence of such
42 approach is that the host language compiler will itself check that functions generated from SQL
43 queries will be called correctly (i.e. all parameters bound with correct types). So if you modify
44 the query and forget to update the code -- the compiler will point on erroneous parts.
50 Suppose we have the database of some money relationships for the group of people. We use the
51 following SQL tables and queries:
54 CREATE TABLE person (id INTEGER PRIMARY KEY AUTOINCREMENT,name TEXT,surname TEXT);
56 INSERT INTO person (name,surname) VALUES;
59 CREATE TABLE money (src INTEGER, dst INTEGER, amount INTEGER);
61 INSERT INTO money VALUES;
64 SELECT name || ' ' || surname AS fullname, SUM(amount) as total FROM person JOIN money ON src = id GROUP BY id;
66 SELECT DISTINCT surname FROM person JOIN money ON src = id AND dst = (SELECT id FROM person WHERE surname LIKE ?) LIMIT ?;
68 DROP TABLE IF EXISTS person;
69 DROP TABLE IF EXISTS money;
71 Generate the binding C++ code (boilerplate omitted, only function prototypes shown):
72 <span class="right also">
73 (corresponding [C#][demo_csharp_gen] [C++][demo_cxx_gen] [Java][demo_java_gen]
74 [OCaml][demo_caml_gen] [XML][demo_xml_gen])
77 // DO NOT EDIT MANUALLY
79 // generated by sqlgg 0.2.2+ (f5b76ac7) on 2009-05-30T19:22Z
80 // visit http://ygrek.org.ua/p/sqlgg/
84 template <class Traits>
89 create_person(typename Traits::connection db);
91 }; // struct create_person
95 add_person(typename Traits::connection db);
96 bool operator()(typename Traits::Text const& name, typename Traits::Text const& surname);
97 }; // struct add_person
101 create_money(typename Traits::connection db);
103 }; // struct create_money
107 add_money(typename Traits::connection db);
108 bool operator()(typename Traits::Int const& src, typename Traits::Int const& dst, typename Traits::Int const& amount);
109 }; // struct add_money
113 calc_total(typename Traits::connection db);
115 bool operator()(T result);
116 }; // struct calc_total
120 list_donors(typename Traits::connection db);
122 bool operator()(typename Traits::Text const& _0, typename Traits::Int const& _1, T result);
123 }; // struct list_donors
127 drop_person(typename Traits::connection db);
129 }; // struct drop_person
133 drop_money(typename Traits::connection db);
135 }; // struct drop_money
137 create_person create_person;
138 add_person add_person;
139 create_money create_money;
141 calc_total calc_total;
142 list_donors list_donors;
143 drop_person drop_person;
144 drop_money drop_money;
146 sqlgg(typename Traits::connection db) : create_person(db), add_person(db), create_money(db), add_money(db), calc_total(db), list_donors(db), drop_person(db), drop_money(db)
151 Things to note above:
153 2. Syntax. All queries are written in plain SQL. Every statement should be terminated with
154 semicolon. There is no need to write (?,?) after VALUES in INSERT statement. The annotation
155 `[sqlgg] name=function_name` (or simply `@function_name`) before the query specifies the name
156 of the generated function (this annotation is optional but very convenient).
157 1. The generated code is parametrized by database-specific class `Traits`. It specifies the
158 correspondence between SQL and native types, provides types for database connection and
159 implements actual code to execute statements. `Traits` should be implemented
160 once for every specific database API.
161 1. Each statement has a corresponding `struct` which takes connection object as a constructor
162 argument. This allows to reuse prepared statements several times (if database supports this
164 1. `add_money()` function call operator takes three data parameters -- the values to INSERT into
165 table (note the names and types).
166 3. `calc_total()` returns data via `result` parameter, which is a callback that gets executed for
167 each row of the resulting rowset, with statically typed parameters binding each column of a row.
168 In this case the parameters are `fullname` of type `Traits::Text` and `total` of type
170 4. The types of parameters for `list_donors` were inferred correctly (`limit` is `Int` and `surname`
171 is `Text`. SQL is not a statically-typed language so the inferred types are based on some
172 reasonable assumptions.
173 5. Statements of arbitrary depth across many tables are supported.
174 6. Statements are checked for correctness as far as generator is concerned, so it will detect
175 syntax errors, non-existent columns in expressions, mismatched columns in compound statements,
176 ambiguous column names etc.
178 Then manually-written C++ code boils down to (without error-checking):
179 <span class="right also">
180 (corresponding [C#][demo_csharp_mysql] [C++][demo_cxx_sqlite3] [Java][demo_java_mysql]
181 [OCaml][demo_caml_sqlite3])
184 #include "../impl/sqlite3_traits.hpp" // sqlite3 traits
185 #include "demo_cxx_gen.hpp" // generated
191 typedef sqlgg<sqlite3_traits> gen_t;
192 typedef long long int64;
194 struct output_transfers
196 void operator()(std::string const& fullname, int total)
198 cout << fullname << " = " << total << endl;
204 void operator()(std::string const& surname)
206 cout << surname << endl;
213 sqlite3_open(":memory:", &db);
221 // add all person records
222 gen.add_person("John","Black");
223 int64 john = sqlite3_last_insert_rowid(db);
224 gen.add_person("Ivan","Petrov");
225 int64 ivan = sqlite3_last_insert_rowid(db);
226 gen.add_person("Sancho","Alvares");
227 int64 sancho = sqlite3_last_insert_rowid(db);
229 // add money relations
230 gen.add_money(john,ivan,200);
231 gen.add_money(john,sancho,100);
232 gen.add_money(john,sancho,250);
233 gen.add_money(sancho,ivan,300);
235 // summarize by person
236 cout << "Total transfers:" << endl;
237 gen.calc_total(output_transfers());
240 cout << "Donors:" << endl;
241 gen.list_donors("petrov",100,output_donors());
243 // properly close database
249 The code is straightforward and free of most database-specific details. More importantly it is
250 statically checked by the compiler -- suppose we change the database schema and add one more field
251 to `money` table (e.g. timestamp of transfer). Compilation rightfully fails:
253 demo_cxx.cpp: In function `int main()':
254 demo_cxx.cpp:29: error: no matching function for call to
255 `sqlgg<sqlite3_traits>::add_money(sqlite3*&, int64&, int64&, int)'
256 demo_cxx_gen.hpp:75: note: candidates are: static bool sqlgg<Traits>::add_money(typename
257 Traits::connection, const typename Traits::Int&, const typename Traits::Int&, const typename
258 Traits::Int&, const typename Traits::Int&) [with Traits = sqlite3_traits]
264 Distinguishing feature of **sqlgg** is that it starts off with actual SQL queries, not object models
265 or SQL table descriptions. Ideally, it generates {meta,template,abstract,generic} code which is later
266 specialized for the actual database/environment by the means of the target language. The
267 generated code doesn't require any special runtime support and doesn't impose any restrictions on
270 The main idea is that the generator should take care only of semantic binding between SQL and code
271 sides, being as unobtrusive as possible. So the choice of the specific database and API is a
272 programmer's choice. Similarly, queries to the database are expressed in plain SQL, so that the
273 generator can be easily plugged in any existing project -- just move all SQL statements used in the
274 code to a separate file and feed it to generator. The generated code can be easily inspected just as
275 any other code in the project, and you can even edit it manually if such need arises (though it is
276 highly not recommended). In order to keep code and database layer in sync
277 the generator must be invoked every time SQL queries are modified.
279 For now the status of this project is **works for me** . It is evolving and there is
280 plenty of room for improvement. The generated code and traits interfaces may change from time to
281 time. I am interested in opinions on output code structure/naming best fitting the target
284 This project was started when I found myself editing existing code
285 with tons of C++ wrappers for SQL queries, each binding several parameters and decomposing results.
286 I used it for C++ desktop applications with [sqlite3](http://sqlite.org) engine and
287 C#/[Mono](http://mono-project.com) server-side
288 component with [mysql](http://dev.mysql.com) database.
290 **Sqlgg** understands the subset of the standard SQL language with arbitrary database-specific
291 extensions. The goal is to cover as much SQL as possible in order to accept most of the real-world
292 SQL statements unmodified.
294 Available output languages:
296 * **C#**: code for System.Data ([generated code example][demo_csharp_gen],
297 [usage example][demo_csharp_mysql]).
298 * **C++**: code is parametrized with template class for database specific code
299 ([generated code example][demo_cxx_gen], [sqlite3 traits][sqlite3_cxx],
300 [usage example (sqlite3)][demo_cxx_sqlite3], [mysql traits][mysql_cxx],
301 [usage example (mysql)][demo_cxx_mysql]).
302 * **Java**: code for JDBC ([generated code example][demo_java_gen],
303 [usage example][demo_java_mysql]).
304 * **OCaml**: functor parametrized with module for database specific code
305 [Sqlgg\_traits.M][sqlgg_caml] ([generated code example][demo_caml_gen],
306 [OCaml-SQLite3 traits][sqlite3_caml], [usage example][demo_caml_sqlite3]).
307 * **XML**: input parameters and rowset schema with types and names -- everything needed
308 to further generate code (e.g. using XSLT) in some other language
309 ([generated xml example][demo_xml_gen])
311 [OCaml-SQLite3]: http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=471
313 [sqlite3_cxx]: impl/sqlite3_traits.hpp
314 [mysql_cxx]: impl/mysql_traits.hpp
315 [sqlite3_caml]: impl/sqlgg_sqlite3.ml
316 [sqlgg_caml]: impl/sqlgg_traits.ml
318 [demo_cxx_gen]: demo/demo_cxx_gen.hpp
319 [demo_caml_gen]: demo/demo_caml_gen.ml
320 [demo_xml_gen]: demo/demo_xml_gen.xml
321 [demo_java_gen]: demo/demo_java_gen.java
322 [demo_csharp_gen]: demo/demo_csharp_gen.cs
324 [demo_cxx_sqlite3]: demo/demo_cxx.cpp
325 [demo_cxx_mysql]: demo/demo_cxx_mysql.cpp
326 [demo_caml_sqlite3]: demo/demo_caml.ml
327 [demo_java_mysql]: demo/demo_java.java
328 [demo_csharp_mysql]: demo/demo_csharp.cs
334 sqlgg 0.2.2 for [Windows](dist/sqlgg-0.2.2.zip) and [Linux](dist/sqlgg-0.2.2.tar.gz) (released on 29 June 2009). [Changes history](dist/changelog).
336 Try it [online](sqlgg.cgi).
338 Get the [source](http://repo.or.cz/w/sqlgg.git).
339 <!--: `git clone git://repo.or.cz/sqlgg.git`
340 ([rss](http://repo.or.cz/w/sqlgg.git?a=rss))-->
346 * allow to parametrize SQL syntax itself (ORDER BY ASC|DESC) : unsafe and enumeration
347 * allow to tolerate errors in query
348 * special code for queries returning single values (COUNT(*) and the like)
349 * support BULK INSERTs (how?)
350 * decide what to do with unquoted identifiers matching keywords
351 * provide ways to extend lexer with tokens in runtime
352 * split overview into several pages
353 * some database API are inadequately strict about data types (notably ADO.NET), need native-type annotations in queries
354 * choose better names for some common cases (WHERE id = ? etc)
355 * fix line numbers in error output
356 * enhance error reporting
357 * calculate types more precisely, type-inferrer is too primitive
359 * detect statements on single tables and group the corresponding generated code together
360 * check function names for uniqueness
361 * support/test other SQL engines
362 * generate code for more languages
364 * distinguish predicates and expressions
365 * type check expressions
369 <span class="right">[sqlgg@ygrek.org.ua](mailto:sqlgg@ygrek.org.ua)</span>
372 .right { position: absolute; right: 1em; }
373 .also { margin-top: 1em; font-style: italic; font-size: 80%; }
374 .border { padding: 0.5em; border: 1px solid black; }
375 code { font-family: monospace; }
376 pre { background-color: #eee; border: 1px solid #0f0; }
377 :not(pre) > code { font-size: 1em; }
378 html { font-family: sans-serif; }