Object Detection Datasets:COCO

4 min readDec 31, 2020

Technical Note In Construction

Part 1: Introduction

COCO provides multi-object labeling, segmentation mask annotations, image captioning, key-point detection and panoptic segmentation annotations with a total of 81 categories, making it a very versatile and multi-purpose dataset.

A short summary of this data set is as follows:

  • 81 object category (including background)
  • 120k images in the trainval, 20k images in the test-dev
  • Coco can be used for multiple functions: object detection, keypoint detection, image captioning, etc..
  • Annotation is in a JSON file while original images are kept in directories as PNG/JPEG/TIF images.

Part 2: Files in Coco

1. General Introduction to JSON Annotation File

JSON annotation file contains the following information:

  • info: contains high-level information about the dataset.
  • licenses: contains a list of image licenses that apply to images in the dataset.
  • categories: contains a list of categories. Categories can belong to a supercategory
  • images: contains all the image information in the dataset without bounding box or segmentation information. image ids need to be unique
  • annotations: list of every individual object annotation from every image in the dataset

2. Categories in JSON Annotation File

Each category id must be unique. A category can belong to a super-category. As an example, if we have data set to identify flowers and fruits. Flower will be super-category and rose, lily, tulip would be the name of the flowers we want to detect.

Template and example for Categories section of the JSON for COCO

(1) Images in JSON Annotation File

Contains list of all the images in the dataset. Image id should be unique. flickr_url, coco_url and date_captured are optional

Template and example for images section of the json for COCO

(2) Annotations in JSON Annotation File

Contains list of each individual object annotation from every single image in the dataset. This is the section that contains the bounding box output or object segmentation for object detection

If an image has 4 objects that we want to detect then we will have annotations for all 4 objects.

If the entire dataset consists of 150 images and has a total of 200 objects then we will have 200 annotations.

(a) Segmentation contains the x and y coordinates for the vertices of the polygon around every object instance for the segmentation masks.

(b) Area is the area of the bounding box. It is a pixel value.

(c) Iscrowd: if we have a single object segmentation then iscrowd is set to zero. For a collection of objects present in the image, we set iscrowd=1, in which case RLE is used. RLE is Run Length Encoding. When iscrowd=1, then we add attribute counts and size in the segmentation section. A single object (iscrowd=0) if occluded may require multiple polygons.

(d) imageid: It is the id of the image which contains the objects for which we are specifying the annotations . The imageid corresponds to the imageid that we have in the image section.

(e) bbox : Bounding box in COCO is the x and y co-ordinate of the top left and the height and width.

(f) category: it is the category of the object that we have earlier specified in the categories section

(g) id: it is the unique id for the annotations

3. Run Length Encoding (RLE)

RLE is a compression method that works by replacing repeating values by the number of times they repeat.

For example 0 11 0111 00 would become 1 2 1 3 2.

COCO data format provides segmentation masks for every object instance as shown above in the segmentation section.

In Coco, only objects that are denoted as crowd will be encoded with RLE.

Part 3: Coco Python

Coco Python is a Python package that can be used for managing Coco datasets. The repository of this project is: https://github.com/cocodataset/cocoapi

COCO class member

The most import concept is different IDs, and we have image ID, annotation ID and category ID.

  • self.datasetis a dictionary that keeps the annotation .json file.
  • self.imgToAnns is a dictionary that has all the annotation information for each input image in the datasets, and the key of the dictionary is the image_id. When image ID is given, from the dictionary, you can retrieve a list of annotation masks.
  • self.anns is a dictionary that has a single annotation mask, and the key of the dictionary is the annotation id.
  • self.imgs is a dictionary that contains the input image information (image name, width, height and so on), and the dictionary key is image_id.
  • self.cats is a dictionary that contains the general category information of the datasets. The key is the id of the category. The category id is a integer that is in the range 1 and 90 (included).
  • self.catToImgs is a dictionary that contains the image ID list when the category id is given.

COCO class function

  • self.getAnnIds is used to get the annotation id list when the image id is given.
  • self.annToMask This function is used to transform annotation dictionary to a binary image mask.
  • self.showAnns This function is used to visualize the annotations for a given image.



Part 4: Create Your Own COCO Datasets

Part 5: Reference