Scanning documents and managing them has never been easier - docker container installation

Paperless NGX, Synology and Office Docx

Containers May 18, 2022

Paperless NG has been around for a while, and is an extremely powerful tool to manage your scans. Not only can it read and store most file types (some by converting to pdf) it can also learn the types of documents you regularly use, apply metadata automatically and make everything you have searchable.

This actually took me a little bit of research and time to get right. The basic Paperless image is great for only a handful of file types, and you cannot upload any plaintext or .docx or .xlsx documents to it. With the below, you can.

💡
These steps work on a Synology NAS through docker. There is a lot of documentation and comments for issues on github, so if you are using docker on another platform, the below may not necessarily work for you. I explain the small changes I made to make this suitable for Synology at the bottom of the article, you can jump to it by clicking here

Let's get started.


Prerequisites

You should have:

  1. Docker and docker-compose installed on your machine
  2. The ability to SSH / use CLI/terminal on your machine, or use Portainer to spin up your stacks
  3. Some sort of access and relevant permissions to manage and create new directories

If you want to access Paperless remotely over reverse proxy

  • You should have your own domain name with a fully functioning reverse proxy (check out my SWAG article, and I recommend Cloudflare as an incredibly feature-packed, not to mention free CDN)
  • A proxy docker network (either called proxy or something else, this is the network you use for your reverse proxy/SWAG container to communicate with the services it acts as RP for)
  • You know how to create the relevant records and reverse proxy configurations
  • Note that for nginx configurations there are some requirements, you can read about them here

Creating the containers

This service needs 5 separate containers:

  1. Redis (a cache to speed up db queries)
  2. Postgres database (we're going to call the service 'db')
  3. The webserver (paperless app itself)
  4. Gotenberg (required for converting documents to pdf)
  5. Tika (required for reading openoffice/word documents)

Prepping your file system

  • SSH into your machine, and navigate to your docker or compose directory
  • Copy/paste the following command:
mkdir paperless && cd paperless && mkdir app db redis && touch .env docker-compose.yml && cd app && mkdir consume data export media
This creates our folders, compose and .env files in a new paperless folder, and then the last 4 folders in the 'app' folder

Your folder structure should look like this:

  • Type cd .. followed by Enter to go up one folder level back to your paperless directory

Populating our docker files

We're now going to edit our docker-compose and .env files. You can do this using nano or vim from your CLI/terminal if you like, or through SMB or whatever method you like. Either way, open your docker-compose.yml file and copy paste the following:

services:
  redis:
    image: redis:6.2
    container_name: paperless-redis
    restart: unless-stopped
    volumes:
      - ./redis:/data

  db:
    image: postgres:14
    container_name: paperless-db
    restart: unless-stopped
    volumes:
      - ./db:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: $PGDB
      POSTGRES_USER: $PGUSER
      POSTGRES_PASSWORD: $PGPW

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    container_name: paperlessngx
    restart: unless-stopped
    depends_on:
      - db
      - redis
      - gotenberg
      - tika
    ports:
      - 8777:8000
    volumes:
      - ./app/data:/usr/src/paperless/data
      - ./app/media:/usr/src/paperless/media
      - ./app/export:/usr/src/paperless/export
      - ./app/consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://redis:6379
      PAPERLESS_DBHOST: db
      USERMAP_UID: $PUID 
      USERMAP_GID: $PGID 
      PAPERLESS_TIME_ZONE: $TZ
      PAPERLESS_ADMIN_USER: $ADMINUSER
      PAPERLESS_ADMIN_PASSWORD: $ADMINPW
      PAPERLESS_OCR_LANGUAGE: deu+eng #change as necessary, read the documentation for more info
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000 #don't change
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998 #don't change
    networks:
      - default
#uncomment the following line and change the network name as necessary to add the webserver to your reverse proxy network
      #- proxy

  gotenberg:
    image: gotenberg/gotenberg:7.4
    restart: unless-stopped
    container_name: gotenberg
    ports:
      - 3000:3000 # change the port mapping if you need
    command:
      - "gotenberg"
      - "--chromium-disable-routes=true"
  
  tika:
    image: ghcr.io/paperless-ngx/tika:latest
    container_name: tika
    ports:
      - 9998:9998 # change the port mapping if you need
    restart: unless-stopped    

networks:
  default:
    name: paperless
    ipam:
      config:
        - subnet: 172.xx.0.0/29 #change the subnet as necessary
#uncomment the following lines and change the network name as necessary to add your reverse proxy network
  #proxy: 
    #external: true
This compose will create the 5 containers necessary for a functional paperless experience, as well as a dedicated bridge network
  • Make the relevant changes where it requires it in the compose file
  • Save the file
  • We also need to populate the .env too. Open that file, then copy paste the following:
#environment
PUID=
PGID=
TZ=[Continent/City] #change as necessary, removing the `[]` brackets
PGDB=
PGUSER=
PGPW=
ADMINUSER=
ADMINPW=
do not leave any spaces after the = symbol
  • Complete each line as required
  • Save the file

You should now be in a good place to now spin up your containers by typing docker-compose -p "paperless" up -d into your CLI/terminal while in the paperless directory.

the flag -p allows us to name our docker stack by defining it within the "" quotation marks
  • Check the logs of the containers as you spin them up to make sure there are no errors
  • Once done, you should now be able to access your paperless instance at http://yourMachineIP:8777
the webserver will take the longest to spin up. It is the last to be created, and will also take some time to migrate the database. Don't be surprised if you can't access it immediately, you can follow along in the logs while it does what it needs

When you're in you should see this page:

Login page

Log in using the username and password you set for ADMINUSER and ADMINPW, and it will take you to the dashboard which will be initially empty, save for some 'first use' tips:

Paperless Dashboard

From here you can check out the Admin and Settings pages, and begin uploading documents by either dragging and dropping into the box on the right, or clicking the Browse files button.

There's a lot of functionality you can add to the documents you upload, which is what makes Paperless NGX so good. Read through the official documentation to get a better idea of what suits you.


The difference for Synology

For some reason, internal docker networking doesn't seem to work for communication between webserver and both gotenberg and tika, mainly because the front end is dependent on using the http:// protocol. Normally I'd still expect the http://[service]:[port] method to work, and apparently it does, just not on Synology.

We got around this by publishing the ports for both gotenberg and tika. If you need to change the port mapping, that's fine, but don't change the http address for them in the webserver environment block.


Swag, Authelia and Reverse Proxies
A step-by-step walkthrough to self-host your Reverse Proxy with SWAG, and providing SSO and 2FA security using Authelia, all in docker
Self-hosting OwnCloud: private & shareable file storage
One of the simplest self-hosted tools available, replete with 2FA security, multiple users, group admins, and external file-sharing

PTS

PTS fell down the selfhosted rabbit hole after buying his first NAS in October 2020, only intending to use it as a Plex server. Find him on the Synology discord channel https://discord.gg/vgSq5pcT

Have some feedback or something to add? Comments are welcome!

Please note comments should be respectful, and may be moderated