Production Server Maintenance
In the following it is assumed, that a production server is setup with Apache and PostgreSQL as described in the previous section. This is mostly a writeup of actions taken to update the production instance with regard to an issue tracked on Github.
Setting up a Development/Maintenance Server
This section presents a possible approach to make a near identical clone of the production server (if not already set up), in order to use it for development/maintenance/updating/testing purposes together with the live production server.
On Production Server
Before cloning most parts of the production server, it is a good idea to get an
idea of how big the individual tables in the database are. For PostgreSQL, we
can do this by executing the following SQL query (e.g. in a psql
command
shell or via pgadmin3
GUI):
jane@jane:~$ psql jane
psql (9.4.15)
Type "help" for help.
jane=> SELECT relname as "Table",
jane-> pg_size_pretty(pg_total_relation_size(relid)) As "Size",
jane-> pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
jane-> FROM pg_catalog.pg_statio_user_tables
jane-> ORDER BY pg_total_relation_size(relid) DESC;
jane=> \q
..which might result in something like the following result:
Table | Size | External Size
---------------------------------------------+---------+---------------
waveforms_continuoustrace | 127 GB | 99 GB
waveforms_file | 4885 MB | 4548 MB
documents_document | 94 MB | 90 MB
documents_documentindexattachment | 87 MB | 87 MB
documents_documentindex | 11 MB | 3128 kB
spatial_ref_sys | 3360 kB | 176 kB
waveforms_path | 2880 kB | 1776 kB
django_admin_log | 1808 kB | 1008 kB
waveforms_mapping | 208 kB | 200 kB
django_session | 112 kB | 88 kB
documents_documenttype | 104 kB | 96 kB
djangoplugins_plugin | 96 kB | 88 kB
auth_permission | 96 kB | 80 kB
documents_documenttype_validators | 88 kB | 80 kB
documents_documenttype_retrieve_permissions | 88 kB | 80 kB
auth_user_user_permissions | 72 kB | 64 kB
waveforms_restriction | 64 kB | 64 kB
auth_user | 56 kB | 48 kB
documents_documenttype_upload_permissions | 40 kB | 40 kB
django_content_type | 40 kB | 32 kB
auth_group_permissions | 32 kB | 32 kB
djangoplugins_pluginpoint | 32 kB | 24 kB
django_migrations | 32 kB | 24 kB
auth_user_groups | 32 kB | 32 kB
waveforms_restriction_users | 32 kB | 32 kB
auth_group | 24 kB | 24 kB
(26 rows)
So we see that we can easily clone everything but the data content in the two
largest tables. By cloning instead of setting up tables manually, we make sure
that the database state on the development server is exactly the same as with
the production server, which is what we want when creating django migrations
that can be applied on the production server. So first we dump all the smaller
tables (as user postgres
). For simplicity we will use the same postgres user
and database names on the development server (otherwise additional options like
--no-user
/--no-owner
might have to be added and additional steps on the
testing server when setting up the dumped databases will likely be necessary):
jane@jane:~$ su
Password:
root@jane:~# su postgres
postgres@jane:~$ cd /tmp
postgres@jane:/tmp$ pg_dump jane --exclude-table-data '*.waveforms_continuoustrace' --exclude-table-data '*.waveforms_file' --file jane.sql
For the big tables, it would be overkill to clone all data, so instead we
extract a (more or less random) subset of the original data (as user
postgres
). In this example we select every 500th waveform file entry,
limiting output to a maximum of 10000 entries (actually the output we get are
only 3703 lines anyway).
postgres@jane:/tmp$ psql jane
psql (9.4.15)
Type "help" for help.
jane=# COPY (
jane(# SELECT t.*
jane(# FROM (
jane(# SELECT *, row_number()
jane(# OVER(ORDER BY id ASC) AS row
jane(# FROM waveforms_file) t
jane(# WHERE t.row % 500 = 0 LIMIT 10000)
jane-# TO '/tmp/jane_waveforms_file.csv'
jane-# CSV HEADER;
COPY 3703
Now we also select a subset of the waveforms_contiguoustrace
table, while
making sure that we select rows that match the previously extracted waveform
files from waveforms_file
table (it might actually not be necessary for the
maintenance operations we perform later that these entries match, this is just
to be on the safe side).
jane=# COPY (
jane(# SELECT *
jane(# FROM waveforms_continuoustrace
jane(# WHERE file_id IN (
jane(# SELECT t.id
jane(# FROM (
jane(# SELECT *, row_number()
jane(# OVER(ORDER BY id ASC) AS row
jane(# FROM waveforms_file) t
jane(# WHERE t.row % 500 = 0 LIMIT 10000)
jane(# LIMIT 50000)
jane-# TO '/tmp/jane_waveforms_ct.csv'
jane-# CSV HEADER;
COPY 50000
jane=# \q
The previous command has added an unwanted additional column row
that we need
to remove from the dumped csv file e.g. using sed
.
postgres@jane:/tmp$ cat jane_waveforms_file.csv | sed -e 's#,[^,]*$##' > jane_waveforms_file_clean.csv
On Development Server
The following will assume, that postgres is running as a database backend in
the same version on the development server (in this case two Debian 8 Jessie
installations). Now on the development/maintenance server, we restore the
dumped database (using same user/table name as on production) as user
"postgres" (copied from production server to development server e.g. via
scp
). First we need to create the appropriate postgres user (if not yet
present). If an old development database already existed we might have to drop
if first (careful, all data is lost) and create a new empty database into which
we then import the dumped table data.
tremor@unzen:~$ su
Password:
root@unzen:~# su postgres
postgres@unzen:~$ createuser --encrypted --pwprompt jane
postgres@unzen:~$ #dropdb jane # careful!
postgres@unzen:~$ createdb -T template0 --owner jane jane
postgres@unzen:~$ psql jane < /tmp/jane.sql
SET
SET
SET
SET
SET
SET
CREATE EXTENSION
COMMENT
CREATE EXTENSION
COMMENT
SET
[...]
ALTER TABLE
ALTER TABLE
ALTER TABLE
REVOKE
REVOKE
GRANT
GRANT
postgres@unzen:~$
Now in addition we also have to import those subsample csv files for the largest tables (again, as user "postgres").
postgres@unzen:~$ psql jane
jane=# \copy waveforms_file FROM '/tmp/jane_waveforms_file_clean.csv' DELIMITER ',' CSV HEADER
COPY 3703
jane=# \copy waveforms_continuoustrace FROM '/tmp/jane_waveforms_ct.csv' DELIMITER ',' CSV HEADER
COPY 50000
jane=# \q
Now what is left is to also setup an identical Python installation on the
development server. In this case the production server is setup as described in
the previous section, i.e. relying on Debian Python packages and some
additional packages installed in user space with pip install --user
(on
Debian this ends up in ~/.local
by default).
So first of all, install the same Debian Python 3 packages as on the production server. The manually installed packages can be looked up on the production server by doing e.g.
jane@jane:~$ aptitude search python3 | egrep '^i '
i python3-all - package depending on all supported Python
i python3-all-dev - package depending on all supported Python
i python3-defusedxml - XML bomb protection for Python stdlib modu
i python3-doc - documentation for the high-level object-or
i python3-examples - examples for the Python language (default
i python3-flake8 - code checker using pycodestyle and pyflake
i python3-gdal - Python 3 bindings to the Geospatial Data A
i python3-geopy - geocoding toolbox for Python3
i python3-markdown - text-to-HTML conversion library/tool (impl
i python3-minimal - minimal subset of the Python language (def
i python3-obspy - ObsPy: A Python Toolbox for seismology
i python3-pip - alternative Python package installer - Pyt
i python3-psycopg2 - Python 3 module for PostgreSQL
i python3-yaml - YAML parser and emitter for Python3
After installing these packages on the development server, we can then simply copy the additionally installed packages, e.g. using rsync or scp. In this example we deliberately copy the additional packages to a different directory that will not be picked up by Python by default, in order to not interfere with other tasks that are running on the development server.
tremor@unzen:~$ rsync -a jane@jane:.local/ .local-jane-dev # trailing slash makes a difference!
We can later tell python to regard this directory explicitly for an individual
Python prompt / program execution by means of the PYTHONUSERBASE
environment
variable..
tremor@unzen:~$ PYTHONUSERBASE=$HOME/.local-jane-dev /usr/bin/python3
Now finally, the jane installation (which is assumed to be a local git
repository, connected to a github remote repository) can also be cloned using
rsync
(addresses of registered remotes might have to be adapted if using
remotes with GPG private/public key authentication).
tremor@unzen:~$ rsync -av jane@jane:jane/ jane-dev # trailing slash makes a difference!
Now some settings have to be adjusted in the local settings on the development
server (most likely only database connection as shown here), so edit key
DATABASES
in file ~/jane-dev/src/jane/local_settings.py
on development
server (e.g. adjust password if using something else than on production
server).
We can then run the Jane development instance on the development server (without Apache backend) and allowing connections from all other machines in the network:
tremor@unzen:~$ cd ~/jane-dev/src
tremor@unzen:~/jane-dev/src$ PYTHONUSERBASE=$HOME/.local-jane-dev /usr/bin/python3 manage.py runserver 0.0.0.0:8000
The development Jane instance can be then accessed from workstations at web
address http://unzen:8000
.
Updating Jane / Doing Migrations
In the following it is summarized how to perform updates on the Jane production instance, especially when database changes are involved which is handled by so called django "migrations". In general, these migrations are created and applied on the development server (which therefore obviously should be in the same state with regard to database schema/layout), committed to version control and then applied on the production server.
On Development Server
- stop jane (usually is not running anyway, as it is not served via apache)
-
make sure all current migrations are applied
tremor@unzen:~/jane-dev/src$ export PYTHONUSERBASE=$HOME/.local-jane-dev tremor@unzen:~/jane-dev/src$ /usr/bin/python3 manage.py migrate Operations to perform: Apply all migrations: documents, admin, waveforms, contenttypes, sessions, djangoplugins, auth Running migrations: No migrations to apply.
-
git pull
in new changes from github and/or make local changes and commit them, see e.g. here for a commit that affects the database schema and that needs a migration -
take notes what should change after a successful update
- feature changes in the web pages, GIS etc.
- changes to database layout
- ...
-
when on a clean commit state of the deployed git branch, have django set up migrations and optionally rename them to have a meaningful name (only ever rename not-yet-applied migrations!!), see e.g. here for a commit with a migration created in this way
tremor@unzen:~/jane-dev/src$ /usr/bin/python3 manage.py makemigrations Migrations for 'waveforms': 0003_auto_20180821_1331.py: - Alter unique_together for continuoustrace (1 constraint(s)) tremor@unzen:~/jane-dev/src$ mv jane/waveforms/migrations/0003_auto_{20180821_1331,waveform_continuoustrace_unique_constraint}.py
-
apply new migration (check what django wants to do beforehand)
tremor@unzen:~/jane-dev/src$ /usr/bin/python3 manage.py showmigrations admin [X] 0001_initial [X] 0002_logentry_remove_auto_add auth [X] 0001_initial [X] 0002_alter_permission_name_max_length [X] 0003_alter_user_email_max_length [X] 0004_alter_user_username_opts [X] 0005_alter_user_last_login_null [X] 0006_require_contenttypes_0002 [X] 0007_alter_validators_add_error_messages contenttypes [X] 0001_initial [X] 0002_remove_content_type_name djangoplugins [X] 0001_initial documents [X] 0001_initial [X] 0002_auto_20161018_0646 sessions [X] 0001_initial waveforms [X] 0001_initial [X] 0002_auto_20160706_1508 [ ] 0003_auto_waveform_continuoustrace_unique_constraint tremor@unzen:~/jane-dev/src$ /usr/bin/python3 manage.py migrate Operations to perform: Apply all migrations: documents, sessions, waveforms, contenttypes, admin, djangoplugins, auth Running migrations: Rendering model states... DONE Applying waveforms.0003_auto_waveform_continuoustrace_unique_constraint... OK
-
if any index fields in the indexed json document data has changed, kick of a reindexing of the related document type! (currently there is no means to request a reindexing of existing documents, so one has to download all documents, delete them on server and upload them again..)
- check that all changes were performed successfully
- check database model/table changes (e.g. new columns, changed unique constraints, ...)
- after running the development server check that any web page/GIS feature changes are reflected correctly
- commit new migration to git repository and push to github
On Production Server
-
stop any processes acting on the postgres databases
- stop jane (i.e. stop apache in our case)
- stop indexers (e.g.
pkill --signal 1 -f 'manage.py index_waveforms '
) -
check that postgres is indeed idle and there are no long running active queries, e.g. doing a SQL query:
postgres@unzen:~$ psql psql (9.4.15) Type "help" for help. postgres=# SELECT * FROM pg_stat_activity; postgres=# SELECT pid, now() - a.query_start AS duration, query, state postgres-# FROM pg_stat_activity AS a postgres-# WHERE state = 'active'; pid | duration | query | state -------+----------+-------------------------------------------------------------+-------- 12535 | 00:00:00 | SELECT pid, now() - a.query_start AS duration, query, state+| active | | FROM pg_stat_activity AS a +| | | WHERE state = 'active'; | (1 row) postgres=# \q
-
if necessary, kill long running queries that would hold up the update (obviously only if those are queries that only read the database, not add new information!):
postgres=# SELECT pg_cancel_backend(<PID of query, see above request that lists active queries>)
-
ideally make backup of database, or at least of the parts that would would hurt if lost, or at least check that scheduled backups are done regularly
-
make sure all current migrations are applied
jane@jane:~/jane/src$ /usr/bin/python3 manage.py migrate
-
git pull
latest changes on deployed branch (created and pushed above on development server) -
apply new migration, if any
jane@jane:~/jane/src$ /usr/bin/python3 manage.py migrate
-
if static page contents have changed, update exported static pages
jane@jane:~/jane/src$ /usr/bin/python3 manage.py collectstatic
-
start jane (i.e. start apache in our case)
- start waveform indexers, if necessary
- check if everything works as expected
- party!