README.md 8.39 KB
Newer Older
Keunwoo Choi's avatar
Keunwoo Choi committed
1 2
# Music Auto-Tagger
Music auto-tagger using keras
keunwoochoi's avatar
keunwoochoi committed
3

Keunwoo Choi's avatar
Keunwoo Choi committed
4

Keunwoo Choi's avatar
Keunwoo Choi committed
5
# WARNING! Alternatives available
Keunwoo Choi's avatar
Keunwoo Choi committed
6

Keunwoo Choi's avatar
Keunwoo Choi committed
7
* IF YOU WANT A TAGGER, please also look at
Keunwoo Choi's avatar
Keunwoo Choi committed
8 9 10 11
  - [compact_cnn](https://github.com/keunwoochoi/music-auto_tagging-keras/tree/master/compact_cnn) 
* IF YOU WANT A FEATURE EXTRACTOR, lookg at either
  - [compact_cnn](https://github.com/keunwoochoi/music-auto_tagging-keras/tree/master/compact_cnn) 
  - [transfer learning music](https://github.com/keunwoochoi/transfer_learning_music)
Keunwoo Choi's avatar
Keunwoo Choi committed
12

Keunwoo Choi's avatar
Keunwoo Choi committed
13
..because `MusicTaggerCNN` and `MusicTaggerCRNN` is based on an old (and a bit incorrect) implementation of Batch Normalization of old Keras (thanks god it worked anyway), it's quite tricky to fix. 
Keunwoo Choi's avatar
Keunwoo Choi committed
14

Keunwoo Choi's avatar
Keunwoo Choi committed
15
## Keras Versions
Keunwoo Choi's avatar
Keunwoo Choi committed
16
* use keras == 1.0.6 for `MusicTaggerCNN`.
Keunwoo Choi's avatar
Keunwoo Choi committed
17 18
* use 1.2 >= keras > 1.0.6 for `MusicTaggerCRNN`. 
* use 1.2 >= keras >= 1.1 for `compact_cnn`.
Keunwoo Choi's avatar
Keunwoo Choi committed
19

Keunwoo Choi's avatar
Keunwoo Choi committed
20

Keunwoo Choi's avatar
Keunwoo Choi committed
21
### The prerequisite -- READ IT!
Keunwoo Choi's avatar
Keunwoo Choi committed
22 23
* You need [`keras`](http://keras.io) to run `example.py`.
  * To use your own audio file, you need [`librosa`](http://librosa.github.io/librosa/).
Keunwoo Choi's avatar
Keunwoo Choi committed
24
* The input data shape is `(None, channel, height, width)`, i.e. following theano convention. If you're using tensorflow as your backend, you should check out `~/.keras/keras.json` if `image_dim_ordering` is set to `th`, i.e.
Keunwoo Choi's avatar
Keunwoo Choi committed
25

Keunwoo Choi's avatar
Keunwoo Choi committed
26 27 28
```json
"image_dim_ordering": "th",
```
Keunwoo Choi's avatar
Keunwoo Choi committed
29
* To use `compact_cnn`, You need to install [Kapre](https://github.com/keunwoochoi/kapre).
Keunwoo Choi's avatar
Keunwoo Choi committed
30

Keunwoo Choi's avatar
Keunwoo Choi committed
31 32
### Files (1)
For `MusicTaggerCNN` and `MusicTaggerCRNN`.
Keunwoo Choi's avatar
Keunwoo Choi committed
33

34 35
* [example_tagging.py](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/example_tagging.py): tagging example, [example_feat_extract.py](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/example_feat_extract.py): feature extraction example
* [music_tagger_cnn.py](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/music_tagger_cnn.py), [music_tagger_crnn.py](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/music_tagger_crnn.py): Models
Keunwoo Choi's avatar
Keunwoo Choi committed
36

Keunwoo Choi's avatar
Keunwoo Choi committed
37 38 39 40 41 42
### Files (2)
For [compact_cnn](https://github.com/keunwoochoi/music-auto_tagging-keras/tree/master/compact_cnn)
* [`main.py`](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/compact_cnn/main.py) for examples. 
* More info on the sub [`README.md`](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/compact_cnn/README.md).


Keunwoo Choi's avatar
Keunwoo Choi committed
43 44
### Structures

45
Left: compact_cnn CNN, music_tager_cnn. Right: music_tagger_crnn
Keunwoo Choi's avatar
Keunwoo Choi committed
46
![alt text](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/imgs/diagrams.png "structures")
Keunwoo Choi's avatar
Keunwoo Choi committed
47

keunwoochoi's avatar
keunwoochoi committed
48
#### MusicTaggerCNN
Keunwoo Choi's avatar
Keunwoo Choi committed
49 50
 * 5-layer 2D Convolutions
 * num_parameter: 865,950
keunwoochoi's avatar
keunwoochoi committed
51
 * AUC score of 0.8654
keunwoochoi's avatar
keunwoochoi committed
52 53
 * **WARNING** with keras >1.0.6, this model does not work properly.
 Please use MusicTaggerCRNN until it is updated!
Keunwoo Choi's avatar
Keunwoo Choi committed
54 55
(FYI: with 3M parameter, a deeper ConvNet showed 0.8595 AUC.)

Keunwoo Choi's avatar
Keunwoo Choi committed
56
#### MusicTaggerCRNN
Keunwoo Choi's avatar
Keunwoo Choi committed
57 58
 * 4-layer 2D Convolutions + 2 GRU 
 * num_parameter: 396,786
keunwoochoi's avatar
keunwoochoi committed
59
 * AUC score: 0.8662
Keunwoo Choi's avatar
Keunwoo Choi committed
60

Keunwoo Choi's avatar
Keunwoo Choi committed
61 62
### How was it trained?
 * Using 29.1s music files in [Million Song Dataset](http://labrosa.ee.columbia.edu/millionsong/)
keunwoochoi's avatar
keunwoochoi committed
63 64
 * split setting: [A repo for split setting](https://github.com/keunwoochoi/MSD_split_for_tagging/) for an identical setting.
 * See [papers](#credits)
Keunwoo Choi's avatar
Keunwoo Choi committed
65 66 67 68 69 70 71 72 73 74 75 76 77
 * The tags are...

```python
['rock', 'pop', 'alternative', 'indie', 'electronic', 'female vocalists', 
'dance', '00s', 'alternative rock', 'jazz', 'beautiful', 'metal', 
'chillout', 'male vocalists', 'classic rock', 'soul', 'indie rock',
'Mellow', 'electronica', '80s', 'folk', '90s', 'chill', 'instrumental',
'punk', 'oldies', 'blues', 'hard rock', 'ambient', 'acoustic', 'experimental',
'female vocalist', 'guitar', 'Hip-Hop', '70s', 'party', 'country', 'easy listening',
'sexy', 'catchy', 'funk', 'electro' ,'heavy metal', 'Progressive rock',
'60s', 'rnb', 'indie pop', 'sad', 'House', 'happy']
```

Keunwoo Choi's avatar
Keunwoo Choi committed
78
### Which is the better predictor?
Keunwoo Choi's avatar
Keunwoo Choi committed
79
 * UPDATE: The most efficient computation, use [`compact_cnn`](https://github.com/keunwoochoi/music-auto_tagging-keras/tree/master/compact_cnn). Otherwise read below.
keunwoochoi's avatar
keunwoochoi committed
80 81 82
 * Training: `MusicTaggerCNN` is faster than `MusicTaggerCRNN` (wall-clock time)
 * Prediction: They are more or less the same. 
 * Memory Usage: `MusicTaggerCRNN` have smaller number of trainable parameters. Actually you can even decreases the number of feature maps. The `MusicTaggerCRNN` still works quite well in the case - i.e., the current setting is a little bit rich (or redundant). With `MusicTaggerCNN`, you will see the performance decrease if you reduce down the parameters. 
Keunwoo Choi's avatar
Keunwoo Choi committed
83

keunwoochoi's avatar
keunwoochoi committed
84
Therefore, if you just wanna use the pre-trained weights, use `MusicTaggerCNN`. If you wanna train by yourself, it's up to you. I would use `MusicTaggerCRNN` after downsizing it to, like, 0.2M parameters (then the training time would be similar to `MusicTaggerCNN`) in general. To reduce the size, change number of feature maps of convolution layers.
Keunwoo Choi's avatar
Keunwoo Choi committed
85

Keunwoo Choi's avatar
Keunwoo Choi committed
86 87 88 89 90 91 92
### Which is the better feature extractor?
By setting `include_top=False`, you can get 256-dim (`MusicTaggerCNN`) or 32-dim (`MusicTaggerCRNN`) feature representation.

In general, I would recommend to use `MusicTaggerCRNN` and 32-dim feature as for predicting 50 tags, 256 features actually sound bit too large. I haven't looked into 256-dim feature but only 32-dim features. I thought of using PCA to reduce the dimension more, but ended up not applying it because `mean(abs(recovered - original) / original)` are `.12` (dim: 32->16), `.05` (dim: 32->24) - which don't seem good enough.

Probably the 256-dim features are redundant (which then you can reduce them down effectively with PCA), or they just include more information than 32-dim ones (e.g., features in different hierarchical levels). If the dimension size would not matter, it's worth choosing 256-dim ones. 

Keunwoo Choi's avatar
Keunwoo Choi committed
93
### Usage
Keunwoo Choi's avatar
Keunwoo Choi committed
94
```bash
keunwoochoi's avatar
keunwoochoi committed
95 96
$ python example_tagging.py
$ python example_feat_extract.py
Keunwoo Choi's avatar
Keunwoo Choi committed
97
```
Keunwoo Choi's avatar
Keunwoo Choi committed
98 99

### Result
Keunwoo Choi's avatar
Keunwoo Choi committed
100 101 102 103 104
*theano, MusicTaggerCRNN*
```python
data/bensound-cute.mp3
[('jazz', '0.444'), ('instrumental', '0.151'), ('folk', '0.103'), ('Hip-Hop', '0.103'), ('ambient', '0.077')]
[('guitar', '0.068'), ('rock', '0.058'), ('acoustic', '0.054'), ('experimental', '0.051'), ('electronic', '0.042')]
Keunwoo Choi's avatar
Keunwoo Choi committed
105

Keunwoo Choi's avatar
Keunwoo Choi committed
106 107 108
data/bensound-actionable.mp3
[('jazz', '0.416'), ('instrumental', '0.181'), ('Hip-Hop', '0.085'), ('folk', '0.085'), ('rock', '0.081')]
[('ambient', '0.068'), ('guitar', '0.062'), ('Progressive rock', '0.048'), ('experimental', '0.046'), ('acoustic', '0.046')]
Keunwoo Choi's avatar
Keunwoo Choi committed
109

Keunwoo Choi's avatar
Keunwoo Choi committed
110 111 112
data/bensound-dubstep.mp3
[('Hip-Hop', '0.245'), ('rock', '0.183'), ('alternative', '0.081'), ('electronic', '0.076'), ('alternative rock', '0.053')]
[('metal', '0.051'), ('indie', '0.028'), ('instrumental', '0.027'), ('electronica', '0.024'), ('hard rock', '0.023')]
keunwoochoi's avatar
keunwoochoi committed
113

Keunwoo Choi's avatar
Keunwoo Choi committed
114 115 116 117
data/bensound-thejazzpiano.mp3
[('jazz', '0.299'), ('instrumental', '0.174'), ('electronic', '0.089'), ('ambient', '0.061'), ('chillout', '0.052')]
[('rock', '0.044'), ('guitar', '0.044'), ('funk', '0.033'), ('chill', '0.032'), ('Progressive rock', '0.029')]
```
keunwoochoi's avatar
keunwoochoi committed
118

Keunwoo Choi's avatar
Keunwoo Choi committed
119 120
### And...

keunwoochoi's avatar
keunwoochoi committed
121 122 123 124
* More info - CNN: 
  * [on this paper](https://arxiv.org/abs/1606.00298), or [blog post](https://keunwoochoi.wordpress.com/2016/06/02/paper-is-out-automatic-tagging-using-deep-convolutional-neural-networks/).
  * Also please take a look on the [slide](https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/slide-ismir-2016.pdf) at ismir 2016. It includes some results that are not in the paper.
* More info - RNN:
Keunwoo Choi's avatar
Keunwoo Choi committed
125
  * [paper](https://arxiv.org/abs/1609.04243), or [blog post](https://keunwoochoi.wordpress.com/2016/09/15/paper-is-out-convolutional-recurrent-neural-networks-for-music-classification/)
Keunwoo Choi's avatar
Keunwoo Choi committed
126

Keunwoo Choi's avatar
Keunwoo Choi committed
127
### Reproduce the experiment
Keunwoo Choi's avatar
Keunwoo Choi committed
128
* [A repo for split setting](https://github.com/keunwoochoi/MSD_split_for_tagging/) for an identical setting of experiments in [two papers](#credits). 
Keunwoo Choi's avatar
Keunwoo Choi committed
129 130
* Audio file: find someone around you who happened to have the preview clips. or you have to crawl the files. I would recommend you to crawl your colleagues...

Keunwoo Choi's avatar
Keunwoo Choi committed
131
### Credits
Keunwoo Choi's avatar
Keunwoo Choi committed
132
* Compact CNN: will be updated.
Keunwoo Choi's avatar
Keunwoo Choi committed
133
* Convnet: [*Automatic Tagging using Deep Convolutional Neural Networks*](https://scholar.google.co.kr/citations?view_op=view_citation&hl=en&user=ZrqdSu4AAAAJ&citation_for_view=ZrqdSu4AAAAJ:3fE2CSJIrl8C), Keunwoo Choi, George Fazekas, Mark Sandler
Keunwoo Choi's avatar
Keunwoo Choi committed
134
17th International Society for Music Information Retrieval Conference, New York, USA, 2016
Keunwoo Choi's avatar
Keunwoo Choi committed
135
* ConvRNN : [*Convolutional Recurrent Neural Networks for Music Classification*](https://scholar.google.co.kr/citations?view_op=view_citation&hl=en&user=ZrqdSu4AAAAJ&sortby=pubdate&citation_for_view=ZrqdSu4AAAAJ:ULOm3_A8WrAC), Keunwoo Choi, George Fazekas, Mark Sandler, Kyunghyun Cho, arXiv:1609.04243, 2016
keunwoochoi's avatar
keunwoochoi committed
136

Keunwoo Choi's avatar
Keunwoo Choi committed
137
* Test music items are from [http://www.bensound.com](http://www.bensound.com).