Comparison of data augmentation techniques

Fall 2020

Takumi Okoshi

Data Scientist

As a Z by HP Data Science Global Ambassador, Takumi Okoshi's content is sponsored and he was provided with HP products.

Hello, I'm Takuoko, a Kaggle Grandmaster.

In this post, I would like to compare data augmentation techniques, for both image classification and testing. Data augmentation is a powerful technique in CV competition.

I have compared the techniques used for image classification, which is the most standard in CV competition.

As a Z by HP Data Science Global Ambassador, I have been provided with high-powered HP workstations.

#developmentenvironment

I am working on a CV competition with the aforementioned powerful systems. Compared to the time when I used V100, I now have systems that can run freely at any time and obtain higher speeds, so the number of experiments that I can run has increased greatly compared to before, and I can also run comparative tests of various papers.

Environments such as PyTorch and CUDA are pre-installed in the systems, so there was no need for me to build the environment.

# Comparison images of data augmentation techniques

I would like to compare two images of CIFAR-100 by mixing them. Some of the figures are cited from papers.

## Mixup

[paper with code](https://paperswithcode.com/paper/mixup-beyond-empirical-risk-minimization)

A technique of mixing two images with lam * img1 + (1 - lam) * img2.

## Manifold Mixup

[paper with code](https://paperswithcode.com/paper/manifold-mixup-better-representations-by)

Example of mixup in the middle layer.

## CutMix

[paper with code](https://paperswithcode.com/paper/cutmix-regularization-strategy-to-train)

A technique in which one image is cut out with a bbox of a certain size and pasted into the other image.

There are two methods: corresponding paste which pastes at the same position as the cut-out position, and random paste which pastes at random positions.

## PatchUp

[paper with code](https://paperswithcode.com/paper/patchup-a-regularization-technique-for)

A method of running CutMix in the middle layer.

## ResizeMix

[paper with code](https://paperswithcode.com/paper/resizemix-mixing-data-with-preserved-object)

A technique in which one image is resized and pasted into the other.

## fmix

[paper with code](https://paperswithcode.com/paper/understanding-and-enhancing-mixed-sample-data)

Compared to CutMix, a mask can be generated and mixed regardless of the shape (does not have to be square).

## SnapMix

[paper with code](https://paperswithcode.com/paper/snapmix-semantically-proportional-mixing-for)

Reduce background image noise by using CAM to weight the label after mixing. Figure is cited from the paper.

## PuzzleMix

[paper with code](https://paperswithcode.com/paper/puzzle-mix-exploiting-saliency-and-local-1)

Reduces background noise by overlapping important parts. Figure is cited from the paper.

# Comparative testing of data augmentation techniques

Test Settings

Dataset：Kaggle’s [Cassava Leaf Disease Classification](https://www.kaggle.com/c/cassava-leaf-disease-classification)

Model：resnet34d

Loss：CrossEntropyLoss

Optimizer：Adam+Lookahead

Image Size：256

Batch Size：64

Epochs：20

Results

Technic: 5fold CV

CutMix random paste: 0.8708

CutMix corresponding paste: 0.8694

PatchUp: 0.8694

ManifoldMixup: 0.8692

ResizeMix: 0.8686

Cutmix corresponding paste + Mixup: 0.868

Fmix: 0.8675

PuzzleMix: 0.8662

Mixup: 0.8661

Summary

As noted in ResizeMix's paper, random paste was more accurate for CutMix. ResizeMix, on the other hand, was less accurate than CutMix, which is a different result from the paper.

Fmix's paper also mentioned that CutMix + Mixup and Fmix + Mixup were more accurate than either method alone, but this could not be replicated.

As for the methods, I felt that random paste and ResizeMix needed careful consideration to be applied, since accuracy would be low in image sets in which the position is fixed in the whole image sets, such as medical images. Techniques such as PuzzleMix and SnapMix are likely to be more effective for tasks in which the subtle points hold importance.

Further hopes

I will continue to use the systems supported for Z by HP Data Science Global Ambassadors to compare and test methods of various papers in CV competitions.

Have a Question?
Contact Sales Support. 

Follow HP Z on Social Media

Instagram

YouTube

Facebook

Monday - Friday

7:00am - 7:30pm (CST) 

Enterprise Sales Support

1-866-625-0242 

Small Business Sales Support

1-866-625-0761

Monday - Friday

7:00am - 7:00pm (CST) 

Government Sales Support 

Federal

1-800-727-5472

State and local 

1-800-727-5472

Monday - Friday

7:00am - 7:00pm (CST) 

Education Sales Support 

K-12 Education

1-800-727-5472

Higher Education

1-800-727-5472

Monday - Sunday

9:00am - 11:00pm (CST) 

Chat with an
HP Z Live Expert

Click on the Chat to Start

 Need Support for Your HP Z Workstation? 

Product may differ from images depicted.

The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

HP Z HP Z HP Z

How would you like to find your Z device?

Which one best describes your industry?

Which best describes you?

Choose all that apply.

Which types of work do you primarily do?

Choose all that apply.

Which software do you use?

Choose all that apply.

For the work you do, we recommend these Z devices:

with the specific configurations listed below

If you want to increase your performance or versatility even more:

Comparison of data augmentation techniques

Takumi Okoshi

Test Settings

Results

Summary

Further hopes

Have a Question?
Contact Sales Support.

Enterprise Sales Support

Small Business Sales Support

Government Sales Support

Education Sales Support

Chat with an
HP Z Live Expert

Need Support for Your HP Z Workstation?

Disclaimers

Select Your Country/Region and Language

HP Worldwide

Select Your Country/Region and Language

HP Z HP Z HP Z

How would you like to find your Z device?

Which one best describes your industry?

Which best describes you?

Choose all that apply.

Which types of work do you primarily do?

Choose all that apply.

Which software do you use?

Choose all that apply.

For the work you do, we recommend these Z devices:

with the specific configurations listed below

If you want to increase your performance or versatility even more:

Comparison of data augmentation techniques

Takumi Okoshi

Test Settings

Results

Summary

Further hopes

Have a Question? Contact Sales Support.

Enterprise Sales Support

Small Business Sales Support

Government Sales Support

Education Sales Support

Chat with an HP Z Live Expert

Need Support for Your HP Z Workstation?

Disclaimers

Select Your Country/Region and Language

HP Worldwide

Select Your Country/Region and Language

Have a Question?
Contact Sales Support. 

Government Sales Support 

Education Sales Support 

Chat with an
HP Z Live Expert

 Need Support for Your HP Z Workstation?