Don't know about OLKi ? Have a look here.

Here is a tutorial about how to set-up your own OLKi instance.


How to set up an OLKi instance

DATABASE_URL=postgresql://olki:olki@postgres/olki
CACHE_URL=redis://redis:6379
OLKI_WEBSERVER_HOST=olki.cerisara.fr
OLKI_WEBSERVER_PORT=443
OLKI_WEBSERVER_HTTPS=true
OLKI_TRUST_PROXY=["127.0.0.1", "loopback", "172.18.0.0/16"]
OLKI_ADMIN_EMAIL=your@email.com
INSTANCE_NAME="yet another science data repository"
INSTANCE_HOSTNAME="olki.cerisara.fr"
INSTANCE_REGISTRATIONS_OPEN=True
EMAIL_SENDER="olki@cerisara.fr"
EMAIL_URL="smtp+tls://user@:password@smtp.example.org:587"
AUTH_LDAP_ENABLE=True
AUTH_LDAP_NO_NEW_USERS=False
#AUTH_LDAP_SERVER_URI="ldap://localhost:389"
#AUTH_LDAP_USER_DN_TEMPLATE="cn=%(user)s,ou=people,dc=planetexpress,dc=com"
#AUTH_LDAP_REQUIRE_GROUP="group"
LOGLEVEL="debug"
SENTRY_ENABLE=False
#SENTRY_DSN=""
   alias urlencode='node -e "console.log(encodeURIComponent(process.argv[1]))"'
   urlencode 'funnypassword'
version: "3.3"

services:

  olki:
    # If you don't want to use the official image and build one from sources
    # build: ../../
      # network: host
      # context: .
      # dockerfile: ./Dockerfile
    image: ${OLKI_IMAGE-rigelk/olki}
    env_file:
      - .env.docker
    # For local access without passing through the reverse proxy (unsecure access)
    # Do not enable when using stack
    #ports:
      - "${OLKI_EXTERNAL_PORT-127.0.0.1:5000}:5000"
    volumes:
      - media:/app/olki_back/olki/media
      - type: bind
        source: ./.env
        target: /app/.env
    depends_on:
      - postgres
      - redis
      - postfix
    restart: ${RESTART_POLICY-unless-stopped}
    networks:
      - default
      - inner

  postgres:
    image: sameersbn/postgresql:10-2
    environment:
      DB_USER: olki
      DB_PASS: olki
      DB_NAME: olki
      DB_EXTENSION: 'unaccent,pg_trgm'
    volumes:
      - postgres:/var/lib/postgresql/data
    restart: ${RESTART_POLICY-unless-stopped}
    networks:
      - inner

  redis:
    image: redis:5-alpine
    volumes:
      - redis:/data
    restart: ${RESTART_POLICY-unless-stopped}
    networks:
      - inner

  postfix:
    image: mwader/postfix-relay
    environment:
      - POSTFIX_myhostname=${OLKI_WEBSERVER_HOST}
    labels:
      traefik.enable: "false"
    restart: ${RESTART_POLICY-unless-stopped}
    networks:
      - inner

networks:
  default:
  inner:

volumes:
  media:
  postgres:
  redis:
<VirtualHost *:80>
  ServerName olki.cerisara.fr
  Redirect permanent / https://olki.cerisara.fr
</VirtualHost>
<VirtualHost *:443>
  ServerName olki.cerisara.fr

  RewriteEngine On
  RewriteCond %{HTTP:Upgrade} =websocket [NC]
  RewriteRule /(.*)           ws://localhost:5000/$1 [P,L]

  ProxyPass / http://localhost:5000/
  ProxyPassReverse / http://localhost:5000/
  ProxyRequests Off
</VirtualHost>

Useful commands


How to dev the backend

This procedure has been tested on Ubuntu20. Management of python dependencies is done with "poetry".

   sudo apt install python3-dev postgresql libpq-dev
virtualenv -p python3 env
source env/bin/activate
pip install setuptools=44
pip install poetry
sudo su - postgres
createuser --interactive --pwprompt
createdb -O olki olki
make serve-dev-be

Towards MLaaS

For now, the OLKi platform can store corpora in a federated way, period. These features (storage and federation) can serve as a basis for various applications. The first one I'd like to implement is to build, on top of a OLKi network, a platform for Machine Learning as a Service, i.e., a platform that provides scripts, dataset, models and the associated computing power to partner scientists.

I think this is very important to push forward deep learning research in Europe. Indeed, although many of the state-of-the-art deep learning models and codes are distributed freely, this is not enough, because all training and inference scripts have to be adapted to each particular computing center, and this adaptation is far from being easy ! So a new researcher who would like to start experimenting with deep learning faces two main difficulties:

Both of these difficulties constitute the main bottleneck to foster deep learning research in academics and small companies, and it must be solved by sharing knowledge in a federated way.

There is another way to share generic deep learning expertise globally and specific one within our community: OLKi.

I have started to develop OLKi-ML clients, both in CLI and GUI versions to experiment with this MLaaS vision. If you're interested, please drop me a line at @cerisara@mastodon.etalab.gouv.fr