Shadow Detection

Intro

As part of a larger contract with a client, I developed an image segmentation method to detect shadows on landscape imagery, primarily in mountainous regions. This particular task presented two primary challenges:

  • there only a few datasets available to train a shadow segmentation network.
  • None of the available datasets contained mountain imagery.

SBU Shadow

Tackling the first point, the SBU Shadow dataset was used to train an initial shadow detection model. This dataset contains numerous images with pixel-level labels of shadows. While this dataset is useful, these shadows only appear on flat surfaces, and are quite distinct from their environments, such as the below:

SBU Example Image

An initial model was trained on this dataset with UNet. While our results on a holdout evaluation set from SBU were decent, we had no way to measure performance on mountain imagery.

Geopose3k

To address this issue, we utilize imagery from geopose3k. Geopose3k contains over three thousand precise camera poses of mountain landscape images. Several methods for automatically producing labeled images were attempted, but ultimately these were ineffective. Using the original pose imagery, pixel-level shadow annotations were applied manually using sparse annotation. While these labels were not extremely accurate, they were fast to produce. Around 150 annotated images were created.

An example of an image from geopose3k with spare annotations. In the annotation, red is "shadow" while green is "not shadow."

Using this new labeled dataset, we were now able to evauate the model trained on SBU Shadow. the segmentation model produced from SBU Shadow was further trained on just the sparsely labeled data. This produced more accurate results overall at test time.

Ax

To provide even better results, hyperparameter tuning was performed with Ax. Ax is a hyperparameter search tool that uses Bayesian and bandit optimization to intelligently search for the best hyperparameters. This process helped to provide more stable training and better overall results.

Results

An original pose image (left) along with the corresponding sparse annotations (right).
The predicted segmentation for shadow/not shadow. White is shadow. Shown from left to right is the performance of UNet trained on just SBU Shadow, UNet trained on SBU + the sparse annotations of geopose3k, and finally UNet trained on SBU + sparse annotations of geopose3k with hyperparameter tuning.
Model   mIoU
UNet SBU   0.4965
UNet SBU+G3K   0.5520
UNet SBU+G3K+Tuning   0.6195

Quantitative results on an evaluation dataset are shown in the table above. These improvements gave us a roughly 12% impovement on a small holdout of the sparsely labeled geopose3k images. Some cleaner qualitative results can be seen below:

Updated: