Troubleshooting Continuwuity
Docker can be difficult to use and debug. It's common for Docker misconfigurations to cause issues, particularly with networking and permissions. Please check that your issues are not due to problems with your Docker setup.
Continuwuity issues
Slow joins to rooms
Some slowness is to be expected if you're the first person on your homserver to join a room (which will
always be the case for single-user homeservers). In this situation, your homeserver has to verify the signatures of
all of the state events sent by other servers before your join. To make this process as fast as possible, make sure you have
multiple fast, trusted servers listed in trusted_servers in your configuration, and ensure
query_trusted_key_servers_first_on_join is set to true (the default).
If you need suggestions for trusted servers, ask in the Continuwuity main room.
However, very slow joins, especially to rooms with only a few users in them or rooms created by another user on your homeserver, may be caused by issue !779, which is a longstanding bug with synchronizing room joins to clients. In this situation, you did succeed in joining the room, but the bug caused your homeserver to forget to tell your client. To fix this, clear your client's cache. Both Element and Cinny have a button to clear their cache in the "About" section of their settings.
Configuration not working as expected
Sometimes you can make a mistake in your configuration that
means things don't get passed to Continuwuity correctly.
This is particularly easy to do with environment variables.
To check what configuration Continuwuity actually sees, you can
use the !admin server show-config command in your admin room.
Beware that this prints out any secrets in your configuration,
so you might want to delete the result afterwards!
Lost access to admin room
You can reinvite yourself to the admin room through the following methods:
- Use the
--execute "users make_user_admin <username>"Continuwuity binary argument once to invite yourslf to the admin room on startup - Use the Continuwuity console/CLI to run the
users make_user_admincommand - Or specify the
emergency_passwordconfig option to allow you to temporarily log into the server account (@conduit) from a web client
DNS issues
DNS server overload
If your server experience any of the following symptoms:
- Spurious server log entries with "DNS No connections available", "mismatching responding nameservers", or "error sending request"
- Excessively long room joins (30+ minutes) as seen from server logs
- Partial or non-functional outbound federation
This is likely due to your DNS server being overloaded. Most likely, these problems are encountered in the following scenarios:
- Homeservers hosted on a machine that uses
systemd-resolved. - Docker deployments which use the bridge network's forwarding resolver.
Matrix federation is extremely heavy and sends wild amounts of DNS requests. This makes normal resolvers like the ones above unsuitable for its activity. Ultimately, the best solution/fix for this is to selfhost a high quality caching DNS resolver such as Unbound, and configure Continuwuity to use it.
Follow the DNS tuning guide for details on setting it up.
Intermittent federation failures to a specific server
There may be circumstances where servers fail to connect to each other, probably due to a bad DNS cache. In such cases, issuing !admin debug ping <SERVER_NAME> would return some errors.
To fix this, you can run !admin query resolver flush-cache <SERVER_NAME> to clear the bad cache for that domain, and outbound requests should work again.
You may also use !admin server clear-caches or !admin query resolver flush-cache -a to clear all server/resolver caches, in case of failures with many domains. However, note that this significantly increases your server load for a short period.
RocksDB / database issues
Database corruption
If your database is corrupted and is failing to start (e.g. checksum mismatch), it may be recoverable but careful steps must be taken, and there is no guarantee it may be recoverable.
The first thing that can be done is launching Continuwuity with the
rocksdb_repair config option set to true. This will tell RocksDB to attempt to
repair itself at launch. If this does not work, disable the option and continue
reading.
RocksDB has the following recovery modes:
TolerateCorruptedTailRecordsAbsoluteConsistencyPointInTimeSkipAnyCorruptedRecord
By default, Continuwuity uses TolerateCorruptedTailRecords as generally these may
be due to bad federation and we can re-fetch the correct data over federation.
The RocksDB default is PointInTime which will attempt to restore a "snapshot"
of the data when it was last known to be good. This data can be either a few
seconds old, or multiple minutes prior. PointInTime may not be suitable for
default usage due to clients and servers possibly not being able to handle
sudden "backwards time travels", and AbsoluteConsistency may be too strict.
AbsoluteConsistency will fail to start the database if any sign of corruption
is detected. SkipAnyCorruptedRecord will skip all forms of corruption unless
it forbids the database from opening (e.g. too severe). Usage of
SkipAnyCorruptedRecord voids any support as this may cause more damage and/or
leave your database in a permanently inconsistent state, but it may do something
if PointInTime does not work as a last ditch effort.
With this in mind:
- First start Continuwuity with the
PointInTimerecovery method. See the example config for how to do this usingrocksdb_recovery_mode - If your database successfully opens, clients are recommended to clear their client cache to account for the rollback
- Leave your Continuwuity running in
PointInTimefor at least 30-60 minutes so as much possible corruption is restored - If all goes will, you should be able to restore back to using
TolerateCorruptedTailRecordsand you have successfully recovered your database
Debugging
Note that users should not really need to debug things. If you find yourself
debugging and find the issue, please let us know and/or how we can fix it.
Various debug commands can be found in !admin debug.
Debug/Trace log level
Continuwuity builds without debug or trace log levels at compile time by default
for substantial performance gains in CPU usage and improved compile times. If
you need to access debug/trace log levels, you will need to build without the
release_max_log_level feature or use our provided static debug binaries.
Changing log level dynamically
Continuwuity supports changing the tracing log environment filter on-the-fly using
the admin command !admin debug change-log-level <log env filter>. This accepts
a string without quotes the same format as the log config option.
Example: !admin debug change-log-level debug
This can also accept complex filters such as:
!admin debug change-log-level info,conduit_service[{dest="example.com"}]=trace,ruma_state_res=trace
!admin debug change-log-level info,conduit_service[{dest="example.com"}]=trace,conduit_service[send{dest="example.org"}]=trace
And to reset the log level to the one that was set at startup / last config
load, simply pass the --reset flag.
!admin debug change-log-level --reset
Pinging servers
Continuwuity can ping other servers using !admin debug ping <server>. This takes
a server name and goes through the server discovery process and queries
/_matrix/federation/v1/version. Errors are outputted.
While it does measure the latency of the request, it is not indicative of server performance on either side as that endpoint is completely unauthenticated and simply fetches a string on a static JSON endpoint. It is very low cost both bandwidth and computationally.
Enabling backtraces for errors
Continuwuity can capture backtraces (stack traces) for errors to help diagnose issues. Backtraces show the exact sequence of function calls that led to an error, which is invaluable for debugging.
To enable backtraces, set the RUST_BACKTRACE environment variable before starting Continuwuity:
For systemd deployments, add this to your service file:
Backtrace capture has a performance cost. Avoid leaving it on.
You can also enable it only for panics by setting
RUST_BACKTRACE=1 and RUST_LIB_BACKTRACE=0.
Allocator memory stats
When using jemalloc with jemallocator's stats feature (--enable-stats), you
can see Continuwuity's high-level allocator stats by using
!admin server memory-usage at the bottom.
If you are a developer, you can also view the raw jemalloc statistics with
!admin debug memory-stats. Please note that this output is extremely large
which may only be visible in the Continuwuity console CLI due to PDU size limits,
and is not easy for non-developers to understand.