Backup & Recovery#

The following guide describes the process of performing backup and restore operations.

Note

Please contact your Intel® Geti™ account representative or technical support personnel if you have any questions.

On-Prem Server Backup & Recovery#

Below, you will find the process of performing backup/restore operations for Intel Geti On-premises servers.

We are proposing two strategies for backing up On-Prem Server before major upgrade to new OS version or new Intel Geti version.

  • Strategy 1: Backup the Entire Server Disk

  • Strategy 2: Backup Intel Geti Data Folder

Strategy 1: Backup the Entire Server Disk#

The customer can use any preferred Linux tools for performing backup/restore operations.

Linux backup/restore tools examples:

  • Terminal tools: dd, rsync

  • UI based tools: Gnome disk or Time shift

Note

Backup snapshots should be saved on an external disk.

Note

It is better to avoid using the server during the backup/restore operations. We recommend booting to the machine using ubuntu live session for doing the backup/restore operations.

Warning

We encourage the customers to implement a disaster recovery plan for Intel® Geti™ servers. Customers can implement this easily by taking a regular snapshot from the whole disk, the customer can take daily, or weekly snapshot based on their preferred policy, also they should consider taking the snapshot before the major changes like an OS update or Intel® Geti™ platform update.

Strategy 2: Backup Intel® Geti™ Data Folder#

Please connect an external SSD to your server with enough space before scheduling a backup of disk image.

Ensure that there is no training or inference running on the platform.

Then, on the server, open a terminal and execute the following steps.

Step 1: Run the command below to extract database usernames and passwords:#

user@1233:~$ sudo kubectl -n impt get secret impt-mongodb --output json

The result should look like the following:

{
   "apiVersion": "v1",
   "data": {
       "dataset-ie-mongodb-password": "UGRFbFltd0JBNER0",
       "dataset-ie-mongodb-username": "VFpPUXZW",
       "director-mongodb-password": "dG1WdGttNVBTS002",
       "director-mongodb-username": "WXRMWUtE",
       "gateway-mongodb-password": "RGM5R0pnVjducHpR",
       "gateway-mongodb-username": "RWlNWEVS",
       "inference-job-mongodb-password": "cXVLNjRzdGdZZzdk",
       "inference-job-mongodb-username": "WVN0UldC",
       "inference-operator-mongodb-password": "NHY4R3V6RnpQTEpz",
       "inference-operator-mongodb-username": "eEV5amxh",
       "inference-server-mongodb-password": "NHVCckx6VUZja08w",
       "inference-server-mongodb-username": "bERSaVdD",
       "jobs-mongodb-password": "ZERqdVJyb01qNUpB",
       "jobs-mongodb-username": "S3VITm9J",
       "jobs-ms-mongodb-password": "SXNxbUk1cUdrZTFC",
       "jobs-ms-mongodb-username": "Umd2cG1M",
       "jobs-scheduler-mongodb-password": "SkRlZXh0NzVXTVVz",
       "jobs-scheduler-mongodb-username": "YWpPT3J3",
       "mongodb-password": "OVBDQktWODJIYWZ4",
       "mongodb-username": "aFd3Qm15",
       "project-ie-mongodb-password": "QlRpcU95ZXEwTkFX",
       "project-ie-mongodb-username": "bU1RelpM",
       "resource-mongodb-password": "OXUzUk1PNE9WQ2Y4",
       "resource-mongodb-username": "YktJQVpE",
       "spice-db-mongodb-password": "eEhlMkNvd1dvMHds",
       "spice-db-mongodb-username": "YWJkU2Fl",
       "training-operator-mongodb-password": "OVFRTEhhSWs2cGky",
       "training-operator-mongodb-username": "Y1lyYlhB",
       "workload-configuration-mongodb-password": "M01ScHhvR0t0ZElL",
       "workload-configuration-mongodb-username": "Y0tTa3VR"
   }
}

Store this information for the later use, for example on an external drive.

Step 2. Run the following command on the on-prem machine to stop components producing data:#

sudo kubectl scale deployment/impt-project-ie deployment/impt-director deployment/impt-resource deployment/impt-dataset-ie deployment/impt-training-operator deployment/impt-inference-operator deployment/impt-mongodb deployment/impt-media-ms deployment/impt-project-ie -n impt --replicas=0

Wait until the following deployments are stopped: impt-project-ie, impt-director, impt-resource, impt-dataset-ie, impt-training-operator, impt-inference-operator, impt-mongodb, impt-media-ms, impt-project-ie.

You can achieve this by executing the following command and verifying that all listed deployments display a 0/0 value in the READY column.

sudo kubectl get deployments -n impt

Step 3. Run the following command to copy data to external SSD drive (this will take a while!):#

Note

We have to use rsync as it preserves the whole directory and sub-directory permission on each file and folder to copied folder.

sudo rsync -avb --include='/binary_data/***' --include='/mongodb/***' \
   --exclude='*' /<data_folder>/destination_username@ip-address:/<SSD_drive>

<data_folder> - folder where platform’s data are stored - chosen during installation

Step 4. Once copying is done, start the components stopped in the 2nd step:#

sudo kubectl scale deployment/impt-project-ie deployment/impt-director deployment/impt-resource deployment/impt-dataset-ie deployment/impt-training-operator deployment/impt-inference-operator deployment/impt-mongodb deployment/impt-media-ms deployment/impt-project-ie -n impt --replicas=1

After checking if all those components are running (this time they should have the 1/1 value in the READY column) - you can start using the platform again.

Recovery from Backed Up Data#

To recover from back up, open a terminal and execute the following steps:

Step 1: Stopping Components#

Run the following command on the on-prem machine to stop components producing data:

sudo kubectl scale deployment/impt-project-ie deployment/impt-director deployment/impt-resource deployment/impt-dataset-ie deployment/impt-training-operator deployment/impt-inference-operator deployment/impt-mongodb deployment/impt-media-ms deployment/impt-project-ie -n impt --replicas=0

Wait until the following deployments are stopped:

  • impt-project-ie

  • impt-director

  • impt-resource

  • impt-dataset-ie

  • impt-training-operator

  • impt-inference-operator

  • impt-mongodb

  • impt-media-ms

  • impt-project-ie.

To verify, execute the command:

sudo kubectl get deployments -n impt

Make sure all listed deployments display a 0/0 value in the READY column.

Step 2: Verifying Data Folders#

Ensure that folders /<data_folder>/binary_data/ and /<data_folder>/mongodb/ are empty. If not, please remove or move their content.

Step 3: Restoring Data#

Run the command to restore the data from the external SSD drive:

Note

We have to use rsync as it preserves the whole directory and sub-directory permission on each file and folder to the copied folder.

Warning

This process can take some time.

sudo rsync -avb --include='/binary_data/***' --include='/mongodb/***' \
--exclude='*' destination_username@ip-address:/<SSD_drive>/datasets_current \
/<data_folder>/

Step 4: Restarting Components#

Once copying is done, start the components stopped in the 1st step:

sudo kubectl scale deployment/impt-project-ie deployment/impt-director deployment/impt-resource deployment/impt-dataset-ie deployment/impt-training-operator deployment/impt-inference-operator deployment/impt-mongodb deployment/impt-media-ms deployment/impt-project-ie -n impt --replicas=1

Step 5: Editing MongoDB Configuration#

Run the command to edit the MongoDB configuration:

sudo kubectl -n impt edit secret impt-mongodb --output json

This will open the default text editor with configuration information in JSON format. Replace the MongoDB database username and password with the previously extracted ones (see Step 1 in Section 2).

{
   "apiVersion": "v1",
   "data": {
      "dataset-ie-mongodb-password": "UGRFbFltd0JBNER0",
      "dataset-ie-mongodb-username": "VFpPUXZW",
      "director-mongodb-password": "dG1WdGttNVBTS002",
      "director-mongodb-username": "WXRMWUtE",
      "gateway-mongodb-password": "RGM5R0pnVjducHpR",
      "gateway-mongodb-username": "RWlNWEVS",
      "inference-job-mongodb-password": "cXVLNjRzdGdZZzdk",
      "inference-job-mongodb-username": "WVN0UldC",
      "inference-operator-mongodb-password": "NHY4R3V6RnpQTEpz",
      "inference-operator-mongodb-username": "eEV5amxh",
      "inference-server-mongodb-password": "NHVCckx6VUZja08w",
      "inference-server-mongodb-username": "bERSaVdD",
      "jobs-mongodb-password": "ZERqdVJyb01qNUpB",
      "jobs-mongodb-username": "S3VITm9J",
      "jobs-ms-mongodb-password": "SXNxbUk1cUdrZTFC",
      "jobs-ms-mongodb-username": "Umd2cG1M",
      "jobs-scheduler-mongodb-password": "SkRlZXh0NzVXTVVz",
      "jobs-scheduler-mongodb-username": "YWpPT3J3",
      "mongodb-password": "OVBDQktWODJIYWZ4",
      "mongodb-username": "aFd3Qm15",
      "project-ie-mongodb-password": "QlRpcU95ZXEwTkFX",
      "project-ie-mongodb-username": "bU1RelpM",
      "resource-mongodb-password": "OXUzUk1PNE9WQ2Y4",
      "resource-mongodb-username": "YktJQVpE",
      "spice-db-mongodb-password": "eEhlMkNvd1dvMHds",
      "spice-db-mongodb-username": "YWJkU2Fl",
      "training-operator-mongodb-password": "OVFRTEhhSWs2cGky",
      "training-operator-mongodb-username": "Y1lyYlhB",
      "workload-configuration-mongodb-password": "M01ScHhvR0t0ZElL",
      "workload-configuration-mongodb-username": "Y0tTa3VR"
   }
}

Step 6: Verifying Installation#

Open Intel Geti UI and ensure everything is functioning correctly.