3 Debian Code Search requires an index in order to search through tons of source code efficiently. This index needs to be re-created whenever the source code changes — there is no incremental update due to the structure of the index. This document explains how to create a new index and how to update an existing index.
7 1. There is a directory called `/dcs` in which all source and index files are stored.
8 2. DCS runs under a separate user account (called `dcs` in this document) which owns all files in `/dcs`.
10 ## Creating a new index (first deployment)
12 Beware: These steps are untested. Please submit any corrections and/or ask if you have any questions.
14 Create a PostgreSQL database:
18 createdb -O dcs -T template0 -E SQL_ASCII udd
19 createdb -E utf8 -O dcs dcs
23 First of all, we need to mirror the Debian archive:
27 $ [ -d ~/.gnupg ] || mkdir ~/.gnupg
28 $ [ -e ~/.gnupg/trustedkeys.gpg ] || cp /usr/share/keyrings/debian-archive-keyring.gpg ~/.gnupg/trustedkeys.gpg
29 $ debmirror --diff=none --progress --verbose -a none --source -s main -h deb-mirror.de -r /debian source-mirror
30 $ debmirror --diff=none --exclude-deb-section=.* --include golang-mode --nocleanup --progress --verbose -a none --arch amd64 -s main -h deb-mirror.de -r /debian source-mirror
33 (We download the golang-mode binary package because it is small, and we need to make debhelper download at least one package to keep the binary-amd64/Packages files in which we are interested in.)
37 compute-ranking -mirrorPath=/dcs/source-mirror
40 Then, we use `dcs-unpack` to unpack every package in order to make its source code servable and indexable:
44 -mirrorPath=/dcs/source-mirror \
45 -newUnpackPath=/dcs/unpacked \
46 -oldUnpackPath=/invalid
49 Now that we have all source packages unpacked, we need to create the actual index. Depending on the size of your source code, you need to use more or less shards. Currently, I use 6 shards with about 1.2 GiB each. A shard cannot be larger than 2 GiB.
51 Also note that dcs-index creates a large amount of temporary data (many gigabytes, at least 7 GiB). If you have `/tmp` mounted as tmpfs, you might need to set `TMPDIR=/some/path` to place the temporary files in a different directory.
56 -unpackedPath=/dcs/unpacked/ \
60 Finally, start the index backend processes, the source backend process and dcs-web itself:
63 for i in $(seq 0 5); do
64 systemctl start dcs-index-backend@$i.service
66 systemctl start dcs-source-backend.service
67 systemctl start dcs-web.service
70 ## Updating an existing index
72 First of all, change to `/dcs` and re-create the NEW/OLD folders. As explained in the previous section, I like to leave them around just in case something is wrong with the new index.
77 20,69s user 184,01s system 12% cpu 27:20,75 total
81 First of all, get an updated copy of the UDD’s popcon_src table and import it to the PostgreSQL server:
83 echo 'DROP TABLE popcon; DROP TABLE popcon_src;' | psql udd
84 wget -qO- http://udd.debian.org/udd-popcon.sql.xz | xz -d -c | psql udd
87 Then, update your copy of the Debian source mirror. This should not take much more than 15 minutes when using a high-speed mirror.
90 $ debmirror --diff=none --progress --verbose -a none --source -s main -h deb-mirror.de -r /debian source-mirror
91 $ debmirror --diff=none --exclude-deb-section=.* --include golang-mode --nocleanup --progress --verbose -a none --arch amd64 -s main -h deb-mirror.de -r /debian source-mirror
92 97,56s user 110,12s system 26% cpu 12:50,20 total
97 compute-ranking -mirrorPath=/dcs/source-mirror
100 Now, `dcs-unpack` creates a new folder called `unpacked-new` which contains the unpacked source mirror. To save time and space, packages that have not changed from the last index will be hard-linked.
104 -mirrorPath=/dcs/source-mirror \
105 -newUnpackPath=/dcs/unpacked-new \
106 -oldUnpackPath=/dcs/unpacked
107 1165,39s user 448,32s system 23% cpu 1:55:54,96 total
110 Now that we have all source packages unpacked, we need to create the actual index. Depending on the size of your source code, you need to use more or less shards. Currently, I use 6 shards with about 1.2 GiB each. A shard cannot be larger than 2 GiB.
112 Also note that dcs-index creates a large amount of temporary data (many gigabytes, at least 7 GiB). If you have `/tmp` mounted as tmpfs, you might need to set `TMPDIR=/some/path` to place the temporary files in a different directory.
116 -mirrorPath=/dcs/NEW/ \
117 -unpackedPath=/dcs/unpacked-new/ \
119 3418,80s user 1111,40s system 24% cpu 5:05:19,09 total
122 For the next step, it is recommended to create a simple shell script to automate the steps (and reduce downtime as much as possible). Note that the script hardcodes the amount of shards (the `seq 0 5` for 6 shards). Also, it is expected to run as root because it directly uses systemctl calls to restart the index backend processes.
127 mv /dcs/index.*.idx /dcs/OLD/
128 mv /dcs/NEW/index.*.idx /dcs/
129 mv /dcs/unpacked /dcs/OLD/unpacked
130 mv /dcs/unpacked-new /dcs/unpacked
131 for i in $(seq 0 5); do
132 systemctl restart dcs-index-backend@$i.service
136 That’s it! Your index is now up to date. Verify that search still works and enjoy your new index.