5.1. On-disk layout

Everything the TOA server keeps for itself lives under toa-server.dataRoot (see Server settings). The layout is plain folders and files - no database - so an operator with shell access can answer support questions and do manual cleanup with ordinary tools.

5.1.1. Top-level structure

<dataRoot>/
  <domain.code>/                  one folder per configured domain
    cmserver2.xml                 cached templates (URL-based domains only)
    yyyy-MM-dd/                   one folder per server-local calendar day
      HHmm-xxxxxxxx/              one folder per import (HHmm + 8 hex chars)
        import.json
        doc-1/
          pages/
            page-1.bin
            page-1.meta.json      EML pages only
            page-2.bin
            ...
        doc-2/                    sibling document, see below
          pages/
            page-1.bin
            ...

The per-domain folder is created on startup. The yyyy-MM-dd folder is created the first time an import lands on that calendar day. The HHmm-xxxxxxxx folder is created when the add-in calls POST /import/<domain>. Nothing else writes into <dataRoot>.

The yyyy-MM-dd and HHmm parts use the server-local time zone. The same zone governs the retention cutoff, so a misconfigured zone has visible consequences both here and in Retention and cleanup. Pin the JVM time zone explicitly on every TOA server instance - see Time zone.

5.1.2. Import identifier

The API-level import id has the form:

yyyy-MM-dd_HHmm-xxxxxxxx

The two halves correspond directly to the date folder and the import folder on disk. Given an id, an operator can locate the import on disk without searching:

<dataRoot>/<domain>/<yyyy-MM-dd>/<HHmm-xxxxxxxx>/

The xxxxxxxx suffix is 8 random hex characters; it makes the id unguessable for download URLs and keeps imports unique within the same minute.

5.1.3. What each file means

import.json

Single source of truth for an import. Operator-relevant fields:

status - DRAFT, PENDING, SUBMITTED or FAILED. FAILED carries an error message; SUBMITTED carries damisBatchId for cross-referencing the storage server.
userName / userEmail - whoever created the import in Outlook. userEmail is also the ownership key enforced on subsequent mutations.
documents[] - the list of documents in this import; each entry carries its template id, attribute values and pages[] metadata (filename, byte size, content type, sidecar filename if any).

The file is rewritten atomically (write to import.json.tmp, then ATOMIC_MOVE). If you ever see import.json.tmp left over, the server crashed mid-rewrite - it is safe to delete; the previous import.json is intact.

doc-N/

One folder per document inside the import. doc-1 always exists and corresponds to the original message the user uploaded. doc-2, doc-3, … are sibling documents created by the “extract attachments” flow - each holds the attachments split out of an EML page in doc-1 (or a later sibling).

doc-N/pages/page-M.bin

Raw page payload. The byte stream is whatever the client posted - typically an EML message for page-1 of doc-1, or a single extracted attachment for sibling-document pages. The .bin extension is intentional; the semantic content type lives in import.json and (for EML) in the sidecar.

doc-N/pages/page-M.meta.json

Sidecar produced for message/rfc822 pages only. Contains the decoded from / subject and the list of MIME attachments (filename, content type, decoded size). It is a convenience index - if the sidecar is missing or unreadable, the .bin is still the source of truth and the server falls back to re-parsing on demand. Sidecars are rewritten when attachments are extracted, so they always match the on-disk EML.

cmserver2.xml

Only present for domains whose templates are loaded from a URL (see Template catalogue). Cached copy of the last successfully downloaded catalogue; used as the fallback when the next refresh fails. Safe to delete - the next refresh re-downloads it. If you delete it while the remote URL is also unreachable, the domain has no catalogue until either the URL recovers or you drop in a copy by hand.

5.1.4. Atomicity guarantees

The layout is designed so that an operator’s mental model matches the filesystem state without race conditions:

A page binary file existing on disk implies the upload completed. Interrupted uploads leave no page-N.bin at all - never a half-written one. The controller streams the request body to a sibling .tmp file and ATOMIC_MOVE it into place.
import.json and the page binaries inside the same import folder are mutated under a per-import lock held by DomainStorage, so concurrent addPage / createDocumentFromAttachments / submit calls cannot interleave their writes.
Atomicity is per-import. Two different imports under the same date folder are independent; backing up or deleting one never affects the other.

5.1.5. Manual operations

Because the layout has no database the following are all safe shell operations, as long as the server is not actively writing to the target import:

Inspect an import: cat <importPath>/import.json, ls <importPath>/doc-*/pages/.
Archive an import: tar or zip the HHmm-xxxxxxxx folder. The server will not notice it is gone until the next API call references it.
Delete a single import: rm -rf the HHmm-xxxxxxxx folder. The corresponding API id will then return 404.
Bulk delete old date folders: see Retention and cleanup.

Do not rename folders or hand-edit import.json while the server is running - the per-import lock is in-process only and external renames will be observed mid-operation. Stop the server first.