language/duchain/Mainpage.dox

   1 /*!
   2  * @mainpage Definition-Use Chain and Type System
   3  *
   4  * Overview | \ref duchain-design "Design" | \ref Implementing "Implementing" | \ref Using "Using"
   5  *
   6  * The definition-use chain and type system provide a language-neutral
   7  * representation of source code structure, used to provide language-based
   8  * features to all implemented languages in a generic manner.
   9  *
  10  * An introduction to the duchain can be found in the \ref duchain-design document.
  11  *
  12  * Details about how to provide a duchain and type system for your favourite
  13  * language can be found here: \ref Implementing.
  14  *
  15  * @licenses
  16  * @lgpl
  17  *
  18  * For questions and discussons about editor either contact the author
  19  * or the <a href="mailto:kdevelop-devel@kdevelop.org">kdevelop-devel@kdevelop.org</a>
  20  * mailing list.
  21  */
  22
  23 /** \page duchain-design Definition-Use Chain Design
  24
  25  * @ref index "Overview" | Design | \ref Implementing "Implementing" | \ref Using "Using"
  26
  27 \section overview Overview
  28
  29 The duchain is a sequence of contexts in a code file, and the associated definitions which occur in those contexts.  A simplified way of thinking about it is that for each set of brackets (curly {} or not ()), there is a separate context.  Each context is represented by a \ref KDevelop::DUContext.  Each context will have one parent context (except in the case of the top level context which has none), and any number of child contexts (including none).  Additionally, each context can import any number of other contexts.  The reason for this will become clear later.  Thus, the \ref KDevelop::DUContext structure resembles a directed acyclic graph, for those familiar with the concept.
  30
  31
  32 \section parsing Parsing
  33
  34 These \ref KDevelop::DUContext "DUContexts" are created on the first pass after parsing the code to an AST (abstract syntax tree).  Also, in this stage the data types are parsed, and any declarations which are encountered are recorded against the context in which they are encountered in.  Each declaration is represented by a Declaration.
  35
  36 Parsing code is arranged into builder classes, which subclass the AST visitor pattern.  They are designed to be able to subclass each other, thus achieving multiple goals with each pass (as described in the above paragraph).
  37
  38 For most languages, the first pass is accomplished by the \ref KDevelop::AbstractContextBuilder "AbstractContextBuilder", \ref KDevelop::AbstractTypeBuilder "AbstractTypeBuilder", and \ref KDevelop::AbstractDeclarationBuilder "AbstractDeclarationBuilder".  The customised builder class is a subclass of each of these classes. Thus, in the first pass, the \ref KDevelop::AbstractContextBuilder "AbstractContextBuilder" creates the \ref KDevelop::DUContext "DUContext" tree, the \ref KDevelop::AbstractTypeBuilder "AbstractTypeBuilder" records which \ref KDevelop::AbstractType "types" are encountered, and the \ref KDevelop::AbstractDeclarationBuilder "AbstractDeclarationBuilder" creates \ref KDevelop::Declaration "Declaration" instances which are associated with the current type and context.
  39
  40 The second pass is the creation of uses, accomplished a subclass of both the \ref KDevelop::AbstractContextBuilder and the \ref KDevelop::AbstractUseBuilder.  On the second pass, we only iterate previously parsed contexts (as they are already created).  Then, as variable uses are encountered, a \ref KDevelop::Use is created for each.  A \ref KDevelop::Declaration is searched for in the current context, and if one is found, they are associated with each other.
  41
  42
  43 \section classes Classes and their purposes
  44
  45 \li \ref KDevelop::DUChain - a global object which keeps track of all loaded source files and the top level context of their definition-use chains.
  46
  47 \li \ref KDevelop::DUContext - an object which represents a single context in a source file, and stores information about parent and child \ref KDevelop::DUContext "DUContexts", and \ref KDevelop::Declarations "Declarations", \ref KDevelop::Definitions "Definitions" and \ref KDevelop::Use "Uses" which occur in them.  Also provides convenience methods for searching the chain.
  48
  49 \li \ref KDevelop::Declaration - an object which represents a single declaration.  Has several subclasses which store more information specific to the type of declaration which is being represented.
  50
  51 \li \ref KDevelop::Definition - an object which represents a definition corresponding to a \ref KDevelop::Declaration "Declaration".
  52
  53 \li \ref KDevelop::Use - an object which represents a use of a particular declaration.
  54
  55 \li \ref KDevelop::SymbolTable - a hash which stores identifiers available in the top level context of a source file and their respective \ref KDevelop::Declaration "Declarations".
  56
  57 \li KDevelop::*Builder - objects whose purpose is to iterate the parsed AST and produce instances of the duchain objects.
  58
  59 \li \ref KDevelop::AbstractType - the base class for types.
  60
  61
  62 \section searching Definition-use chain searching
  63
  64 Because iterating a complete definition-use chain can become expensive when they are large, when a search is being performed (eg. for a declaration corresponding to a certain identifier) it is first performed up to the top level context, then the symbol table is consulted.  The symbol table is a hash of all identifiers which are known to the entire duchain.  All potential matches are evaluated to see if they are visible from the location of the use.
  65
  66
  67 \section locking Locking
  68
  69 The duchain is designed to operate in a multithreaded environment.  This means that multiple parse jobs may be operating simultaneously, reading from and writing to the duchain.  Thus, locking is required.
  70
  71 A single read-write lock is used to serialise writes to the chain and allow concurrent reads.  Thus, to call non-const methods, you must hold a write lock, and for const methods, a read lock.  Customised read/write lockers have been created, called DUChainWriteLocker and DUChainReadLocker.  You must not request a write lock while holding a read lock, or you could cause a deadlock.
  72
  73 Also, when manipulating text editor ranges, the \ref KTextEditor::SmartInterface must be locked. \warning You must <em>never</em> attempt to acquire the duchain read or write lock when holding the smart lock, else you may cause a deadlock. See code in \ref KDevelop::AbstractContextBuilder::openContextInternal and \ref KDevelop::DUChainBase.
  74
  75
  76 \section plugin-interface Interface for plugins
  77
  78 As plugins will be accessing the \ref KDevelop::DUChain from the main thread, they will need to hold a read lock.  In order to be notified of changes to the \ref KDevelop::DUChain, an observer interface is offered.  See \ref KDevelop::DUChainObserver.
  79
  80
  81 \section text-editor-integration Text editor integration
  82
  83 The main classes are subclasses of a base class, \ref KDevelop::DUChainBase.  This object holds a reference to the text range.  When the source file is opened in an editor, the \ref KDevelop::EditorIntegrator will create smart text ranges, which are bound to the editor's copy of the document.  From there, highlighting can be applied to these ranges, as well as other advanced functions (see the \ref KTextEditor documentation for possibilities).  The language support will convert these ranges to smart ranges when the corresponding document is loaded into an editor.
  84
  85
  86 \section future Future features - ideas
  87
  88 The completed duchain should allow for code refactoring, intelligent navigation, improved automatic code generation (eg. "create switch statement"), context-sensitive code completion, integration of documentation, debugger integration, a code structure view, call graph, static code analysis etc.
  89
  90 */
  91
  92 /**
  93  * \page Implementing Implementing Definition-Use Chains for a specific language
  94  *
  95  * \ref index "Overview" | \ref duchain-design "Design" | Implementing | \ref Using "Using"
  96  *
  97  * \section create Creating the Definition-Use Chain
  98  *
  99  * To create a definition-use chain for a programming language, you need the following:
 100  * \li a parser for the language,
 101  * \li a context builder,
 102  * \li a type builder,
 103  * \li a declaration builder,
 104  * \li and a use builder.
 105  *
 106  * Once you have everything up to the declaration builder, your language's classes, functions etc.
 107  * will automatically appear in the class browser, and be able to perform limited refactoring.
 108  *
 109  * Once you have the use builder, you will automatically have full support for context browsing.
 110  *
 111  * Code completion support requires further work specific to your language, see \ref cc
 112  *
 113  * \subsection parser Parser
 114  * Parsers in %KDevelop can be created in any way as long as they produce an AST (abstract
 115  * syntax tree).  Most supported languages have parsers generated by kdevelop-pg-qt.
 116  * This is a LL parser generator, and allows you to specify the grammar from which the
 117  * parser and AST are generated.  The parser will also need a lexer, common solutions are to
 118  * use flex to create one for you, or to create one by hand.
 119  *
 120  * \subsection Generic DUChain Builders
 121  *
 122  * The abstract builder classes (detailed below) provide convenience functions for creating a
 123  * definition-use chain.  They are template classes which require 2 or 3 class
 124  * types:
 125  * - T: your base AST node type
 126  * - NameT: your identifier AST node type, if you have only one, or your base AST node type
 127  *          if more than one exist
 128  * - Base class: your base class, eg. for your use builder, you will usually supply your custom
 129  *               context builder here.
 130  *
 131  * \subsection context Context Builder
 132  * By subclassing \ref KDevelop::AbstractContextBuilder "AbstractContextBuilder", you will have everything you need to
 133  * keep track of contexts as you iterate the AST.  When a new context is encountered, such
 134  * as a new block (eg. between {} brackets), create a new context with KDevelop::AbstractContextBuilder::openContext(),
 135  * and close it with KDevelop::AbstractContextBuilder::closeContext().
 136  *
 137  * Some languages do not need a context to be created for each block, for example languages
 138  * where declarations are visible after the block in which they were defined (eg. php).
 139  *
 140  * \subsection type Type Builder
 141  * By subclassing \ref KDevelop::AbstractTypeBuilder "AbstractTypeBuilder", you can create types
 142  * when one is encountered in your AST by calling openType().  Again, you need to closeType()
 143  * when the type is exited.  Complex types are built up this way by creating the type at each node,
 144  * ie. with int[], first an array type is opened, then an integral type representing an integer
 145  * is opened and closed, then when the array type is closed, you can retrieve the lastType()
 146  * and set that as the type which is being made into an array.
 147  *
 148  * \subsection declaration Declaration Builder
 149  * By subclassing \ref KDevelop::AbstractDeclarationBuilder "AbstractDeclarationBuilder", you can create
 150  * declarations when they are encountered in your AST.  Usually you will assign the lastType()
 151  * or currentType() to them within closeDeclaration().
 152  *
 153  * \subsection use Use Builder
 154  * By subclassing \ref KDevelop::AbstractUseBuilder "AbstractUseBuilder", you can create uses when they are encountered
 155  * in your AST, and they will be automatically registered with the current context.
 156  *
 157  * \section cc Implementing Code Completion
 158  *
 159  * To provide code completion for your language, you will need to implement the following:
 160  * \todo complete this section
 161  */
 162
 163 /**
 164  * \page Using Using already created Definition-Use Chains in plugins
 165  *
 166  * \ref index "Overview" | \ref duchain-design "Design" | \ref Implementing "Implementing" | Using
 167  *
 168  * \section intro Introduction
 169  * This section is designed for developers who want to use definition-use chains, for example to provide
 170  * code generation, refactoring, or other advanced language-specific functionality.  First some important
 171  * fundamentals of using the duchain classes will be covered.
 172  *
 173  * \subsection pointers Definition-use chain pointers and references
 174  * As the definition-chain is a dynamic entity, safe pointers (DU*Pointer) and indirect references (Indexed*)
 175  * are required to reference objects in a thread-safe way, and in a way that allows minimisation of memory use by saving
 176  * non-referenced chains to disk.  While you do not hold the KDevelop::DUChain::lock(),
 177  * these pointers and references should not be accessed, because the objects they will return may be
 178  * modified by other threads.
 179  *
 180  * The KDevelop::DUChain::lock() is a read-write lock, which means that if you don't intend to
 181  * change the chain (which you won't, unless you are a language plugin developer), you only need a read-lock.
 182  * This has the advantage of allowing multiple threads to safely read from the chain simultaneously.
 183  * The easiest way to acquire this lock is to use KDevelop::DUChainReadLocker:
 184  * \code
 185  *    KDevelop::TopDUContextPointer topContext;
 186  *
 187  *    // Retrieve the top context for myUrl (see explanation below)
 188  *    topContext = KDevelop::DUChainUtils::standardContextForUrl( myUrl );
 189  *
 190  *    // Lock the duchain for reading
 191  *    KDevelop::DUChainReadLocker readLock( KDevelop::DUChain::lock() );
 192  *
 193  *    // Check if the top context pointer is valid
 194  *    if ( topContext ) {
 195  *      ...
 196  *    }
 197  * \endcode
 198  * Before accessing the top context, this code will block until a read-only lock has been acquired.
 199  * It is then safe to access const functions of all duchain objects.  The lock will continue to be held
 200  * until readLock goes out of scope.
 201  *
 202  * \note It is safe to recursively acquire a read-lock (or a write-lock), but not safe to request a write lock
 203  * once a read lock is held (this may result in a deadlock).
 204  * \note You must not attempt to acquire the duchain lock when you already hold the smart lock (this may result in a deadlock).
 205  *
 206  * In debug builds, if you attempt to access something in the duchain which you do not hold the proper lock
 207  * for, you will encounter an assert (usually triggered by the ENSURE_CHAIN_READ_LOCKED or ENSURE_CHAIN_WRITE_LOCKED macros).
 208  *
 209  * For more information about duchain pointers, see KDevelop::DUChainPointer.
 210  *
 211  * \section accessing Accessing a definition-use chain
 212  * The first step in using a duchain is to retrieve the chain that you are interested in.
 213  * Presumably you will know the URL of the file for which you want to retrieve the chain.
 214  * Some languages (notably C and C++) can have several different chains for one file depending on
 215  * what the definitions of macros were when the files were parsed.  Because of this, the recommended
 216  * way to access the duchain for a document is via KDevelop::DUChainUtils.
 217  *
 218  * \subsection topcontext Accessing top level contexts
 219  * \todo include a note on how to request loading of contexts from disk, and requesting parsing of files which
 220  *  are not currently in the duchain.
 221  *
 222  * Top level contexts (TopDUContext) can be retrieved through KDevelop::DUChainUtils::standardContextForUrl().
 223  * This is the context which is presented to the user when the file is opened (for highlighting, completion etc.).
 224  * In case it is not the context which you are after, all contexts for a file can be retrieved via
 225  * KDevelop::DUChain::chainsForDocument().
 226  *
 227  * \subsection declaration Accessing declarations at a specific location
 228  * If you have a url and a cursor location, you can attempt to retrieve the declaration located at that position
 229  * with KDevelop::DUChainUtils::itemUnderCursor().
 230  *
 231  * \section navigating Navigating a definition-use chain
 232  * \subsection navigating-duobject All duchain objects
 233  * All duchain objects inherit from KDevelop::DUChainBase.  This is in turn a subclass of KDevelop::DocumentRangeObject.
 234  * Thus, you can retrieve the text range of every object via KDevelop::DocumentRangeObject::range().  If the document
 235  * is currently loaded in a text editor, it will likely have a smart range (KTextEditor::SmartRange), which tracks the position
 236  * of the range when the document is changed.  This can be accessed via KDevelop::DocumentRangeObject::smartRange().
 237  *
 238  * \subsection navigating-contexts Contexts
 239  * Now that you have a chain, you'll probably want to be able to navigate around it.  You can iterate contexts
 240  * by using KDevelop::DUContext::childContexts().  You can then retrieve from each
 241  * context a list of local declarations with KDevelop::DUContext::localDeclarations(), and a list of all
 242  * uses in the context with KDevelop::DUContext::uses().
 243  *
 244  * Imported contexts are usually contexts which have declarations which are visible in the current context.
 245  * For example:
 246  * \code
 247  *   for (int i = 0; i < count(); ++i) {
 248  *     kDebug() << i;
 249  *   }
 250  * \endcode
 251  * The code which contains the debug statement will import the for conditions context, which contains the declaration
 252  * of i.  Thus, i's declaration is visible to the debug statement.
 253  *
 254  * Usually, you will not have to worry about these details, as the search functions already take them into account.
 255  * If you want to find a declaration for a given identifier in a given context, you can use one of the
 256  * KDevelop::DUContext::findDeclarations() or KDevelop::DUContext::findLocalDeclarations() functions.
 257  *
 258  * \subsection navigating-declarations Declarations
 259  * Declarations always occur within a context, which can be accessed through KDevelop::Declaration::parentContext().
 260  * Some declarations (eg. namespaces, classes) create a new context which can then contain child declarations, eg.
 261  * variables and functions within a class.  For these declarations, the associated context which contains these
 262  * child declarations can be accessed through KDevelop::Declaration::internalContext(), if one exists.
 263  *
 264  * \subsection navigating-uses Uses
 265  * Uses are instances where a declaration is referenced in the code.  All uses for a declaration can be calculated
 266  * from the duchain, although this can potentially be a time-consuming task. KDevelop::Declaration::uses() will return
 267  * all uses for a declaration, and KDevelop::Declaration::smartUses() will return smart ranges which represent all uses
 268  * in the currently opened documents.
 269  *
 270  * \subsection navigating-types Types
 271  * Declarations may have a type, which can be retrieved through KDevelop::Declaration::abstractType().  Types can then
 272  * be visited using KDevelop::TypeVisitor, or manually with the corresponding calls in the type subclasses.  Types can be
 273  * compared for equality using KDevelop::AbstractType::equals().
 274  *
 275  * \section changes Monitoring chains for changes
 276  * KDevelop::DUChain::notifier() provides support for monitoring chains for changes.  It emits three signals, notifying
 277  * that a branch of a chain has been added, modified, or removed.  It is up to your code to iterate the chain and react
 278  * to any changes that have occurred, if desired.
 279  *
 280  * \section efficiency DUChain efficiency issues
 281  * It was confirmed during the implementation of the DUChain that there is too much information to store the duchain for
 282  * an entire project in memory at the same time (KDevPlatform itself was >1Gb).  Subsequently, saving the chains to disk
 283  * has been implemented.  Following are some of the ramifications of this design.
 284  *
 285  * \subsection referenced-topcontexts Top Context Referencing
 286  * In order to determine which chains can be unloaded from memory, a referenced pointer was introduced called
 287  * KDevelop::ReferencedTopDUContext.  If you are using duchain objects outside of a duchain lock, and you need them to
 288  * remain in memory, you should create a KDevelop::ReferencedTopDUContext for the top context of each of the chains you need.
 289  * This will ensure it is not unloaded.  However, do not use this excessively or %KDevelop will have the same problem
 290  * of using large amounts of memory.
 291  *
 292  * \code
 293  *    KDevelop::TopDUContextPointer topContext;
 294  *    KDevelop::ReferencedTopDUContext topReferenced
 295  *
 296  *    topContext = KDevelop::DUChainUtils::standardContextForUrl( myUrl );
 297  *    topReferenced =  KDevelop::DUChainUtils::standardContextForUrl( myOtherUrl );
 298  *
 299  *    // Both of these pointers may be valid here
 300  *
 301  *    // Sleep
 302  *    sleep(10);
 303  *
 304  *    // Lock the duchain for reading
 305  *    KDevelop::DUChainReadLocker readLock( KDevelop::DUChain::lock() );
 306  *
 307  *    // topContext may not be valid any more, because it may have been saved to disk and unloaded from memory.
 308  *    // topReferenced will still be valid if it was valid when it was retrieved (above).
 309  * \endcode
 310  *
 311  * \subsection code-model Code Model
 312  * In order to facilitate easy access to top level declarations, a list of top level declarations is available from
 313  * KDevelop::CodeModel.  For each parsed file, you can call KDevelop::CodeModel::items() to retrieve a list of declarations
 314  * and some basics about their type.  If you need any further information, the chain must be loaded from disk.
 315  * \code
 316  *   uint count;
 317  *   const CodeModelItem* items;
 318  *   IndexedString file = \<yourFile\>;
 319  *
 320  *   // Retrieve the items for the given file
 321  *   KDevelop::CodeModel::self().items(file, count, items);
 322  *
 323  *   for (int i = 0; i < count; ++i) {
 324  *     CodeModelItem* thisItem = items++;
 325  *
 326  *      // Use the item here.
 327  *      ...
 328  *   }
 329  * \endcode
 330  *
 331  * To access the declaration for each item, use KDevelop::PersistentSymbolTable::declarations().
 332  *
 333  * \section inadequate When the duchain doesn't contain all the information
 334  * If you need more information than is available in the duchain, you're most likely looking at using the AST generated by the
 335  * language support.  Note that this is obviously not language-independent, so it should be a last resort in cases where
 336  * the functionality being supplied is not language-specific.  If the duchain is missing some information that would make
 337  * sense to add, please raise it with the %KDevelop developers.
 338  *
 339  * \todo add mechanism to get at the AST
 340  * \todo keep the AST in memory for loaded files
 341  */
 342
 343
 344 // DOXYGEN_REFERENCES = language/editor
 345 // DOXYGEN_SET_WARN_LOGFILE=language/duchain/doxygen.log
 346 // DOXYGEN_SET_RECURSIVE=yes