personnealienee

personnealienee t1_j82ygry wrote

messing with target sound extraction by adding just barebones masknet architechture on top of samplernn. I want to apply this architecture to extracting different layers in electronic misic. for example, pick out just the snare drum track from the full drum machine mix. It is easy to generate datasets using DawDreamer (generating random drum patterns using a sampler currently). considering adding conditioning by the output of a differentiable filter bank

2

personnealienee t1_j82n3k6 wrote

I'd say ditch everything but math and start implementing your models in Pytorch and reading papers and blogs. Python can be learned by doing by someone with cs background. The field moves too fast, relevant stuff starts in ~2014 and is all on arxiv and github (both reference implementations and state-of-the-art code), there are no up to date textbooks. This

https://uvadlc-notebooks.readthedocs.io/en/latest/index.html

course is about the only one I encountered that teaches recent model architectures (it is helpful to read their implementations too). A lot of their models are mostly relevant for vision, but transformers and autoencoders are really useful in NLP. For stuff more specific to NLP ,HugginFace tutorials is a good starting point for digging

https://huggingface.co/course/chapter1/

1