一个简单任务的简单工具：从预先录制的演讲中删除填充声音（“ehm/嗯”）_python_Jupyter_代码

共13个文件

gitignore：3个

index：1个

dockerfile：1个

版权申诉

python

95 浏览量 2022-07-02 23:25:50 上传评论收藏 17.4MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

simple-ehm-main.zip （13个子文件）

simple-ehm-main

trainingset

.gitignore 13B

model

variables

variables.data-00000-of-00001 18.6MB

variables.index 2KB

saved_model.pb 165KB

simple_ehm.ipynb 922KB

convert.sh 78B

Dockerfile 215B

LICENSE 1KB

requirements.txt 80B

.gitignore 57B

README.md 3KB

simple_ehm-runnable.py 13KB

training_data

.gitignore 0B

# simple-ehm A simple tool for a simple task: remove filler sounds ("ehm") from pre-recorded speeches. AI powered. Istruzioni in italiano in fondo al documento. # Usage Basic invokation should be enough: `./simple_emh-runnable.py /path/to/video/file` This will generate a subtitle track (`.srt`) for debugging and the output video in the same folder as the original file. For more info read the help: `./simple_emh-runnable.py --help` You can also run simple-ehm in a dockerized environment. Build the image using `docker build -t simple .` then, instead of using `./simple_ehm-runnable.py` use `./convert.sh` # Contributing to the model There are two ways you can contribute to the model: ## Contribute to the dataset By sending me at least 30 1-second long WAV pcm_s16le mono 16kHz clips for each class (silence, speech, ehm) [easy] - You can convert your clips to the right format with ffmpeg: `ffmpeg -i input-file -c:a pcm_s16le -ac 1 -ar 16000 -filter:a "dynaudnorm" output.wav` - You can extract ehm(s) and silences **along with erroneously classified sounds** (false positives) by passing `--generate-training-data` as an invocation parameter. You can then use the latter to improve your training set! ## Contribute to the training - By implementing transfer training logic on this model's python notebook - By retraining the current model with your dataset and make a PR with the updated one # ITA # simple-ehm Un semplice strumento per un semplice compito: rimuovere gli "ehm" (suoni di riempimento) da discorsi pre-registrati. # Utilizzo L'invocazione base dovrebbe essere sufficiente: `./simple_emh-runnable.py /percorso/al/file/video` Questo genererò una traccia di sottotitoli (`.srt`) per fini diagnostici e il video tagliato nella stessa cartella del file originale. Per maggiori informazioni sui parametri accettati, leggi la guida: `./simple_emh-runnable.py --help` Puoi anche utilizzare simple-ehm in un ambiente dockerizzato, per fare ciò crea l'immagine `docker build -t simple .`, e dove useresti `./simple_ehm-runnable.py` utilizza invece `./convert.sh` (N.B. per usare `./convert.sh` i file devono essere spostati prima in questa cartella) # Contribuire al modello Ci sono due modi in cui puoi contribuire al modello: ## Contribuisci al dataset Inviandomi almeno 30 clip in formato WAV (pcm_s16le) mono con campionamento a 16kHz per ciascuna classe (silenzio, parlato, ehm) [facile] - Puoi convertire le tue clip nel formato corretto con ffmpeg: `ffmpeg -i input-file -c:a pcm_s16le -ac 1 -ar 16000 -filter:a "dynaudnorm" output.wav` - Puoi estrarre gli ehm(s) e i silenzi **anche quelli classificati erroneamente** (falsi positivi) passando `--generate-training-data` come parametro di invocazione. Puoi usare le clip classificate erroneamente per migliorare il tuo training set! ## Contribuisci al training - Implementando la logica di transfer training sul notebook python di questo modello, e - Eseguendo il retraining della rete esistente con il tuo dataset ed inviandomi il modello aggiornato.

评论收藏

内容反馈

版权申诉