Semantic Segmentation

Assigns a class label to each pixel, but does not distinguish between instances of the same class.

3. Important Hyperparameters

Tuning hyperparameters is crucial to improving the performance of segmentation models. Below are the most important ones:

a. Learning Rate

  • Description: Controls the step size during optimization.
  • Typical Range: 0.0001 – 0.01 (with learning rate schedules like cosine annealing, or warm restarts).

b. Batch Size

  • Description: The number of samples processed before the model is updated.
  • Typical Range: 2 – 16 (for large images due to memory constraints).

c. Number of Filters / Feature Maps

  • Description: Number of filters in convolutional layers, which controls model capacity.
  • Typical Range: 32 – 512 per layer, depending on model depth and complexity.

d. Optimizer

  • Popular Choices:
    • Adam (adaptive learning rate).
    • SGD with momentum (common in large-scale datasets).

e. Weight Decay / L2 Regularization

  • Description: Helps prevent overfitting by penalizing large weights.
  • Typical Range: 0.0001 – 0.001.

5. Other Important Topics

a. Data Augmentation

  • Techniques: Random crop, horizontal/vertical flipping, color jittering, and elastic deformation.
  • Purpose: Prevent overfitting and improve generalization.

b. Post-Processing Techniques

  • CRF (Conditional Random Field): Often used as a post-processing step to refine segmentation boundaries by enforcing spatial consistency.

c. Evaluation Metrics

  • Mean Intersection over Union (mIoU): The most widely used evaluation metric for segmentation tasks.
  • Pixel Accuracy: The ratio of correctly predicted pixels to total pixels.

6. References

  1. Long, J., Shelhamer, E., & Darrell, T. (2015). “Fully Convolutional Networks for Semantic Segmentation.” CVPR. Link
  2. Ronneberger, O., Fischer, P., & Brox, T. (2015). “U-Net: Convolutional Networks for Biomedical Image Segmentation.” MICCAI. Link
  3. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.” PAMI. Link
  4. Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). “Pyramid Scene Parsing Network.” CVPR. Link
  5. Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., & Liu, W. (2020). “Deep High-Resolution Representation Learning for Visual Recognition.” PAMI. Link
Back to top