Leo_D517
Leo_D517 OP t1_jd7sq7l wrote
Reply to comment by r4and0muser9482 in [Project] Machine Learning for Audio: A library for audio analysis, feature extraction, etc by Leo_D517
OpenSMILE is mainly used for emotion analysis and classification of audio, while audioFlux focuses on various feature extraction of audio , and is used to study various tasks in the audio field such as Classification, Separation, Music Information Retrieval(MIR) and ASR etc.
Leo_D517 OP t1_jd2k6u1 wrote
Reply to comment by rising_pho3nix in [Project] Machine Learning for Audio: A library for audio analysis, feature extraction, etc by Leo_D517
Thank you for your support. If you are interested, you can join our project. Suggestions and feedback are welcome.
Leo_D517 OP t1_jd2hhov wrote
Reply to comment by fanjink in [Project] Machine Learning for Audio: A library for audio analysis, feature extraction, etc by Leo_D517
First of all, we have noticed this issue and it will be resolved in the upcoming next version. For now, you can install by compiling the source code.
Please follow the steps in the Document to compile the source code.
The steps are as follows:
- Installing dependencies on macOS
Install Command Line Tools for Xcode. Even if you install Xcode from the app store you must configure command-line compilation by running:
xcode-select --install
- Python setup:
$ python setup.py build
$ python setup.py install
Leo_D517 OP t1_jd2g6pg wrote
Reply to comment by CheekProfessional146 in [Project] Machine Learning for Audio: A library for audio analysis, feature extraction, etc by Leo_D517
First, librosa is a very good audio feature library.
The difference between audioflux and librosa is that:
- Systematic and multi-dimensional feature extraction and combination can be flexibly used for various task research and analysis.
- High performance, core part C implementation, FFT hardware acceleration based on different platforms, convenient for large-scale data feature extraction.
- It supports the mobile end and meets the real-time calculation of audio stream at the mobile end.
Our team wants to do audio MIR related business at mobile end, all operations of feature extraction must be fast and cross-platform support for the mobile end.
For training, we used the librosa method to extract CQT-related features at that time. It took about 3 hours for 10000 sample data, which was really slow.
Here is a simple performance comparison
Server hardware:
- CPU: AMD Ryzen Threadripper 3970X 32-Core Processor
- Memory: 128GB
Each sample data is 128ms(sampling rate: 32000, data length: 4096).
The total time it takes to extract features from 1000 sample data.
Package | audioFlux | librosa | pyAudioAnalysis | python_speech_features |
---|---|---|---|---|
Mel | 0.777s | 2.967s | -- | -- |
MFCC | 0.797s | 2.963s | 0.805s | 2.150s |
CQT | 5.743s | 21.477s | -- | -- |
Chroma | 0.155s | 2.174s | 1.287s | -- |
Finally, audioflux has been developed for about half a year, and open source has only been more than two months. There must be some deficiencies and improvements. The team will continue to work hard to listen to community opinions and feedback.
Thank you for your participation and support. We hope that the follow-up of the project will be better and better.
Leo_D517 OP t1_jd7vszp wrote
Reply to comment by gootecks in [Project] Machine Learning for Audio: A library for audio analysis, feature extraction, etc by Leo_D517
Of course, you can use audioFLux to extract features and then build and train models for the sound effects audio that needs to be detected.
Then, real-time audio features are extracted from the audio stream obtained by the microphone, and a trained model is used for prediction.