Train a custom model with exported data from the Intel® Geti™ platform#

You can export an annotated dataset to train your own custom algorithm with that dataset. The following example shows how to do that for a single class Object Detection model. We use PyTorch Lightning to train a Faster RCNN model.

Preparation#

Export the dataset in COCO format, and unzip the dataset to a directory with the name data.

Create PyTorch Dataset#

COCO annotations are stored in a JSON file. You can find it in data/annotations/instances_default.json.

There are five root items in the JSON file:

  1. Licenses

  2. Info

  3. Categories

  4. Images

  5. Annotations

In this tutorial, we will focus on images and annotations. images contains the filename of the image, and an image_id. Every annotation contains the following items: id, image_id, category_id, segmentation, area, bbox, iscrowd. See the COCO format documentation for more details.

Since we use single label detection, the category_id is always 1 and we do not have segmentation annotations. The annotation we are interested in is bbox. COCO format specifies that bbox elements are [x, y, width, height]. The Faster RCNN network expects bounding box annotations in [x1, y1, x2, y2] format, so you need to create a function to convert the boxes.

def convert_boxes(annotations: dict) -> list:
  """
  Convert annotation boxes in COCO format with absolute coordinates (x1, y1, width, height)
  to boxes with format [x1, y1, x2, y2]

  :param annotations: annotations for image in COCO format
  """
  boxes = []
  for annotation in annotations:
      xmin, ymin, width, height = annotation["bbox"]
      xmax = xmin + width
      ymax = ymin + height
      boxes.append([xmin, ymin, xmax, ymax])
  return boxes

TorchVision contains a CocoDataset for loading data in Coco format, so we do not have to write our own code for loading data, we just override the __getitem__ method to return boxes in the format expected by the network:

class CustomCocoDetection(CocoDetection):
  """
  Custom CocoDetection Dataset with normalized boxes in x1, y1, x2, y2 format
  """

  def __getitem__(self, index):
      image, annotations = super().__getitem__(index)
      boxes = convert_boxes(annotations)
      categories = [annotation["category_id"] for annotation in annotations]
      if len(boxes) > 0:
          labels = {
              "boxes": torch.as_tensor(boxes),
              "labels": torch.as_tensor([1 for _ in range(len(boxes))]),
          }
      else:
          labels = []

      return image, labels

Create PyTorch Lightning DataModule and Model#

You need to create a PyTorch Lightning DataModule which contains DataLoaders for the CustomCocoDetection Dataset.

class DataModule(pl.LightningDataModule):
  """
  DataModule for CocoDataset

  :param data_dir: Path to directory which contains subdirectories "annotations" and "images"
  :param batch_size: Batch size
  """

  def __init__(self, data_dir: Union[str, Path], batch_size: int):
      super().__init__()
      self.data_dir = Path(data_dir)
      self.batch_size = batch_size

  def setup(self, stage=None):
      random.seed(1.414213)
      ds = CustomCocoDetection(
          root=self.data_dir / "images/default",
          annFile=self.data_dir / "annotations/instances_default.json",
      )

      # skip images without annotation
      ds = [item for item in ds if len(item[1]) > 0]

      # select 80 % of the dataset as training data
      train_size = int(len(ds) * 0.8)
      val_size = len(ds) - train_size
      self.dataset_train, self.dataset_val = random_split(ds, [train_size, val_size])

  def train_dataloader(self):
      return DataLoader(
          self.dataset_train,
          batch_size=self.batch_size,
          shuffle=True,
          collate_fn=lambda x:tuple(zip(*x))
      )

The PyTorch Lightning Module wraps the FasterRCNN model. In this simple example, we only provide a training step. See the PyTorch Lightning Documentation for more options.

class DetectionModel(pl.LightningModule):
  def __init__(self):
      super().__init__()
      self.weights = FasterRCNN_ResNet50_FPN_Weights.COCO_V1
      model = torchvision.models.detection.fasterrcnn_resnet50_fpn(
          weights=self.weights
      )
      num_classes = 2  # 1 class (dog) + background
      in_features = model.roi_heads.box_predictor.cls_score.in_features
      model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
      self._model = model

  def forward(self, x):
      images, targets = x
      preprocess = self.weights.transforms()
      images = list(preprocess(image) for image in images)
      targets = [{k: v for k, v in t.items()} for t in targets]
      return self._model(images, targets)

  def configure_optimizers(self):
      optimizer = torch.optim.Adam(self._model.parameters())
      return optimizer

  def training_step(self, batch, batch_idx):
      loss_dict = self.forward(batch)
      total_loss = sum(loss for loss in loss_dict.values())
      print(total_loss)
      return total_loss

Train the model#

To train the model, you need to create a DetectionModel() and DataModule() instance, together with a Trainer. The Trainer.fit() method trains the model with the provided dataset, and save_checkpoint() saves the model checkpoint to use later.

model = DetectionModel()
data = DataModule(data_dir="data", batch_size=4)

trainer = pl.Trainer(max_epochs=5)
trainer.fit(model, data)
trainer.save_checkpoint("model.pth")

References#

PyTorch finetuning tutorial