The markdown parsing is broken/disabled for release notes. Sorry about that, I'm chasing the source of a crash that's been bringing this website down for the last couple of days.
## Overview
Minor update:
- With Metal, auto-fallback to CPU if device does not support Apple7 family
- Add [server](https://github.com/ggerganov/whisper.cpp/tree/master/examples/server) example
## What's Changed
* ISSUE-1329: replace " with ' so it doesn't try to execute code in backticks by @spullara in https://github.com/ggerganov/whisper.cpp/pull/1364
* sync : ggml (ggml-alloc + linker + gguf fixes) by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1501
* Fixed with_state methods, to use the correct state by @sandrohanea in https://github.com/ggerganov/whisper.cpp/pull/1519
* #1517 Redistribute CUDA DLLs by @tamo in https://github.com/ggerganov/whisper.cpp/pull/1522
* whisper : reuse whisper_decode_with_state by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1521
* sdl : fix audio callback by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1523
* update deprecated example by @MightyStud in https://github.com/ggerganov/whisper.cpp/pull/1529
* Super Simple Whisper Server by @felrock in https://github.com/ggerganov/whisper.cpp/pull/1380
* Close file after writing in server application by @felrock in https://github.com/ggerganov/whisper.cpp/pull/1533
* bench : multi-thread memcpy by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1534
* Change temp file name for server application by @felrock in https://github.com/ggerganov/whisper.cpp/pull/1535
* Fixed Makefile for MacOS ARM 64 Go bindings by @gleicon in https://github.com/ggerganov/whisper.cpp/pull/1530
* Fixed metal build on macos-latest by @sandrohanea in https://github.com/ggerganov/whisper.cpp/pull/1544
* fix(server): typo in temperature parameter by @Okabintaro in https://github.com/ggerganov/whisper.cpp/pull/1545
* Request to add a new function to get the full language name by @bradmit in https://github.com/ggerganov/whisper.cpp/pull/1546
* server : add --print-realtime param by @ecneladis in https://github.com/ggerganov/whisper.cpp/pull/1541
* cuda : sync some minor stuff from llama.cpp by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1548
* metal : add backend function to check device family support by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1547
## New Contributors
* @spullara made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1364
* @MightyStud made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1529
* @felrock made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1380
* @gleicon made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1530
* @Okabintaro made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1545
* @bradmit made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1546
* @ecneladis made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1541
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/v1.5.0...v1.5.1
## Overview
This major release includes the following changes:
- Full GPU processing of the Encoder and the Decoder with CUDA and Metal is now supported
- Efficient beam-search implementation via batched decoding and unified KV cache
- Full quantization support of all available `ggml` quantization types
- Support for grammar constrained sampling
- Support for Distil Whisper models
- Support for Whisper Large-v3
and more
### Full GPU support
On Apple Silicon, GPU support has been available to a large extend since [15 Sep](https://github.com/ggerganov/whisper.cpp/pull/1270). However, part of the Encoder was still being executed on the CPU due to lack of MSL kernels for the convolution operations. These kernels are now available resulting in additional speed-up of the Encoder in this release:

*[Encoder performance on Apple M1 Max - before and after](https://github.com/ggerganov/whisper.cpp/pull/1472#issuecomment-1806788526) (plot by @dreness)*
For NVIDIA hardware, the entire computation can now be offloaded to the GPU which results in significant performance boost. For detailed performance breakdown, checkout the Benchmarks section below.
The GPU processing on Apple Silicon is enabled by default, while for NVIDIA you need to build with `WHISPER_CUBLAS=1`:
```bash
# Apple Silicon
make
# NVIDIA
WHISPER_CUBLAS=1 make
```
Implementation: https://github.com/ggerganov/whisper.cpp/pull/1472
Special credits to: @FSSRepo, @slaren
### Batched decoding + efficient Beam Search
At last, `whisper.cpp` now supports efficient Beam Search decoding. The missing piece was the implementation of batched decoding, which now follows closely the [unified KV cache idea from llama.cpp](https://github.com/ggerganov/llama.cpp/pull/3228). On modern NVIDIA hardware, the performance with 5 beams is the same as 1 beam thanks to the large amount of computing power available. With Metal, the speed with 5 beams is a bit slower compared to 1 beam, but it is significantly faster compared to 5x times the time for single batch which was observed with the old naive implementation.
Beam Search is now enabled by default in `whisper.cpp` to match the OG implementation of OpenAI Whisper. For more performance details, checkout the Benchmarks section below.
Implementation: https://github.com/ggerganov/whisper.cpp/pull/1486
### Quantization support
All `ggml` [quantization types](https://github.com/ggerganov/whisper.cpp/blob/ccc85b4ff8d250d0f25ebcac2be0e4a23401c885/ggml.h#L309-L331) are now supported. Quantization mixtures for Whisper model can be implemented. It's still unclear how the quality is affected from the quantization - this is an interesting area which can be explored in the future.
### Grammar sampling
The decoder output can now be constrained with a [GBNF grammar](https://github.com/ggerganov/llama.cpp/blob/a6fc554e268634494f33b0de76f9dde650dd292f/grammars/README.md). This can be a useful technique for further improving the transcription quality in situations where the set of possible phrases are limited.
https://github.com/ggerganov/whisper.cpp/assets/377495/d24716e2-5e9c-441b-8c6b-395922dccbf4
Implementation: https://github.com/ggerganov/whisper.cpp/pull/1229
Special credits to @ejones
### Distil Whisper
Recently, Distil Whisper models have been released: https://huggingface.co/distil-whisper
`whisper.cpp` offers support for these models, although it still lacks full implementation of the proposed chunking strategy. Performance details for distilled models are included in the Benchmarks section below.
Implementation: https://github.com/ggerganov/whisper.cpp/pull/1424
### Whisper Large-v3
Recently, OpenAI released a new version 3 of the Large model: https://github.com/openai/whisper/pull/1761
Implementation: https://github.com/ggerganov/whisper.cpp/pull/1444
### Benchmarks
Below is a breakdown of the performance of `whisper.cpp` on Apple Silicon, NVIDIA and CPU. The tables show the Encoder and Decoder speed in `ms/tok`. The `Dec.` column corresponds to batch size 1. The `Bch5` column corresponds to batch size 5. The `PP` column corresponds to batch size 128.
For optimal Beam Search performance, the `Bch5` number should be 5 times smaller than `Dec.`
| Hw | Config | Model | Th | Enc. | Dec. | Bch5 | PP | Commit |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| M2 Ultra | METAL | tiny | 1 | 11.14 | 1.40 | 0.49 | 0.01 | ccc85b4 |
| M2 Ultra | METAL | tiny-q5_0 | 1 | 11.51 | 1.41 | 0.52 | 0.01 | ccc85b4 |
| M2 Ultra | METAL | tiny-q5_1 | 1 | 12.21 | 1.41 | 0.52 | 0.01 | ccc85b4 |
| M2 Ultra | METAL | base | 1 | 20.21 | 2.05 | 0.77 | 0.02 | ccc85b4 |
| M2 Ultra | METAL | base-q5_0 | 1 | 19.89 | 1.96 | 0.81 | 0.02 | ccc85b4 |
| M2 Ultra | METAL | base-q5_1 | 1 | 20.14 | 2.02 | 0.81 | 0.02 | ccc85b4 |
| M2 Ultra | METAL | small | 1 | 51.01 | 3.97 | 1.74 | 0.05 | ccc85b4 |
| M2 Ultra | METAL | small-q5_0 | 1 | 56.86 | 4.09 | 1.85 | 0.06 | ccc85b4 |
| M2 Ultra | METAL | small-q5_1 | 1 | 56.81 | 4.14 | 1.85 | 0.06 | ccc85b4 |
| M2 Ultra | METAL | medium | 1 | 141.21 | 8.47 | 3.98 | 0.13 | ccc85b4 |
| M2 Ultra | METAL | medium-q5_0 | 1 | 160.56 | 8.27 | 4.18 | 0.14 | ccc85b4 |
| M2 Ultra | METAL | medium-q5_1 | 1 | 160.52 | 8.40 | 4.15 | 0.14 | ccc85b4 |
| M2 Ultra | METAL | medium-dis | 1 | 128.14 | 1.13 | 0.43 | 0.02 | ccc85b4 |
| M2 Ultra | METAL | large-v2 | 1 | 248.73 | 11.96 | 6.08 | 0.22 | ccc85b4 |
| M2 Ultra | METAL | large-v2-q5_0 | 1 | 286.31 | 11.99 | 6.60 | 0.26 | ccc85b4 |
| M2 Ultra | METAL | large-v2-q5_1 | 1 | 284.56 | 12.42 | 6.47 | 0.26 | ccc85b4 |
| M2 Ultra | METAL | large-v2-dis | 1 | 224.31 | 1.26 | 0.49 | 0.02 | ccc85b4 |
| Hw | Config | Model | Th | Enc. | Dec. | Bch5 | PP | Commit |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| M2 Ultra | COREML METAL | tiny | 1 | 7.60 | 1.41 | 0.50 | 0.01 | ccc85b4 |
| M2 Ultra | COREML METAL | base | 1 | 11.90 | 2.07 | 0.78 | 0.02 | ccc85b4 |
| M2 Ultra | COREML METAL | small | 1 | 32.19 | 4.10 | 1.78 | 0.05 | ccc85b4 |
| M2 Ultra | COREML METAL | medium | 1 | 94.43 | 8.40 | 3.89 | 0.12 | ccc85b4 |
| M2 Ultra | COREML METAL | large-v2 | 1 | 179.78 | 12.12 | 6.07 | 0.22 | ccc85b4 |
| Hw | Config | Model | Th | Enc. | Dec. | Bch5 | PP | Commit |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| NVIDIA V100 | BLAS CUDA | tiny | 1 | 8.84 | 1.62 | 0.33 | 0.02 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | tiny-q5_0 | 1 | 8.43 | 1.19 | 0.31 | 0.02 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | tiny-q5_1 | 1 | 8.41 | 1.19 | 0.29 | 0.02 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | base | 1 | 14.79 | 2.31 | 0.46 | 0.03 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | base-q5_0 | 1 | 15.05 | 1.66 | 0.44 | 0.03 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | base-q5_1 | 1 | 15.01 | 1.68 | 0.46 | 0.03 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | small | 1 | 40.30 | 4.37 | 0.88 | 0.05 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | small-q5_0 | 1 | 41.17 | 3.11 | 0.94 | 0.05 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | small-q5_1 | 1 | 41.12 | 3.11 | 0.82 | 0.05 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | medium | 1 | 104.93 | 10.06 | 1.77 | 0.11 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | medium-q5_0 | 1 | 107.11 | 6.13 | 2.07 | 0.12 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | medium-q5_1 | 1 | 107.91 | 6.21 | 1.77 | 0.12 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | medium-dis | 1 | 103.45 | 1.11 | 0.24 | 0.02 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | large-v2 | 1 | 171.55 | 15.76 | 2.62 | 0.17 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | large-v2-q5_0 | 1 | 176.27 | 8.61 | 3.17 | 0.19 | ccc85b4 |
| NVIDIA V100 | BLAS CUDA | large-v2-q5_1 | 1 | 176.23 | 8.67 | 2.59 | 0.19 | ccc85b4 |
| Hw | Config | Model | Th | Enc. | Dec. | Bch5 | PP | Commit |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AMD Ryzen 9 5950X | AVX2 | tiny | 8 | 197.47 | 1.22 | 0.44 | 0.25 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | tiny-q5_0 | 8 | 222.92 | 0.87 | 0.45 | 0.30 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | tiny-q5_1 | 8 | 221.25 | 0.89 | 0.45 | 0.30 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | base | 8 | 427.14 | 3.11 | 0.88 | 0.43 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | base-q5_0 | 8 | 474.96 | 1.41 | 0.72 | 0.51 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | base-q5_1 | 8 | 485.05 | 1.48 | 0.73 | 0.52 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | small | 8 | 1470.51 | 11.70 | 2.89 | 1.21 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | small-q5_0 | 8 | 1700.43 | 5.48 | 1.98 | 1.41 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | small-q5_1 | 8 | 1719.03 | 5.79 | 2.02 | 1.42 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | medium | 8 | 4417.70 | 35.13 | 8.14 | 3.24 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | medium-q5_0 | 8 | 5335.77 | 17.44 | 5.35 | 3.92 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | medium-q5_1 | 8 | 5372.26 | 18.36 | 5.42 | 3.88 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | medium-dis | 8 | 4070.25 | 4.86 | 1.16 | 0.53 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | large-v2 | 8 | 8179.09 | 66.89 | 15.45 | 5.88 | ccc85b4 |
| AMD Ryzen 9 5950X | AVX2 | large-v2-dis | 8 | 7490.45 | 7.06 | 1.63 | 0.70 | ccc85b4 |
### API Changes
- Add `struct whisper_context_params`
- Add `whisper_log_set`
- Deprecate:
- `whisper_init_from_file`
- `whisper_init_from_buffer`
- `whisper_init`
- `whisper_init_from_file_no_state`
- `whisper_init_from_buffer_no_state`
- `whisper_init_no_state`
- Add:
- `whisper_init_from_file_with_params`
- `whisper_init_from_buffer_with_params`
- `whisper_init_with_params`
- `whisper_init_from_file_with_params_no_state`
- `whisper_init_from_buffer_with_params_no_state`
- `whisper_init_with_params_no_state`
- Diff of `struct whisper_full_params`
```diff
struct whisper_full_params {
enum whisper_sampling_strategy strategy;
@@ -338,6 +435,7 @@ extern "C" {
bool translate;
bool no_context; // do not use past transcription (if any) as initial prompt for the decoder
+ bool no_timestamps; // do not generate timestamps
bool single_segment; // force single segment output (useful for streaming)
bool print_special; // print special tokens (e.g. <SOT>, <EOT>, <BEG>, etc.)
bool print_progress; // print progress information
@@ -355,8 +453,12 @@ extern "C" {
// [EXPERIMENTAL] speed-up techniques
// note: these can significantly reduce the quality of the output
bool speed_up; // speed-up the audio by 2x using Phase Vocoder
+ bool debug_mode; // enable debug_mode provides extra info (eg. Dump log_mel)
int audio_ctx; // overwrite the audio context size (0 = use default)
+ // [EXPERIMENTAL] [TDRZ] tinydiarize
+ bool tdrz_enable; // enable tinydiarize speaker turn detection
+
// tokens to provide to the whisper decoder as initial prompt
// these are prepended to any existing text context from a previous call
const char * initial_prompt;
@@ -365,6 +467,7 @@ extern "C" {
// for auto-detection, set to nullptr, "" or "auto"
const char * language;
+ bool detect_language;
// common decoding parameters:
bool suppress_blank; // ref: https://github.com/openai/whisper/blob/f82bc59f5ea234d4b97fb2860842ed38519f7e65/whisper/decoding.py#L89
@@ -403,11 +506,24 @@ extern "C" {
whisper_encoder_begin_callback encoder_begin_callback;
void * encoder_begin_callback_user_data;
+ // called each time before ggml computation starts
+ whisper_abort_callback abort_callback;
+ void * abort_callback_user_data;
+
// called by each decoder to filter obtained logits
whisper_logits_filter_callback logits_filter_callback;
void * logits_filter_callback_user_data;
+
+ const whisper_grammar_element ** grammar_rules;
+ size_t n_grammar_rules;
+ size_t i_start_rule;
+ float grammar_penalty;
};
```
There might be some instability around the API, especially with the existing language bindings. I wasn't able to test everything, so expect some issues and feel free to submit PRs with any kind of fixes that you find.
## Highlights and what's next
A lot of the updates in these release are possible thanks to the many contributions in [llama.cpp](https://github.com/ggerganov/llama.cpp) - huge shoutout to all the contributors and collaborators there!
Regarding future updates to `whisper.cpp`, I'm looking forward to the following things:
- Add server example similar to the one in `llama.cpp`
- Try to improve Metal's batched decoding performance
- Look for some interesting applications of the grammar sampling functionality
---
- **Latest performance of the [talk-llama](https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk-llama) example**
https://github.com/ggerganov/whisper.cpp/assets/1991296/d97a3788-bf2a-4756-9a43-60c6b391649e
## What's Changed
* Fix quantize bug by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/842
* whisper.wasm : fix typo in readme by @BaffinLee in https://github.com/ggerganov/whisper.cpp/pull/832
* Adding --session support in examples/talk-llama by @herrera-luis in https://github.com/ggerganov/whisper.cpp/pull/845
* --detect-language mode by @CRD716 in https://github.com/ggerganov/whisper.cpp/pull/853
* talk-llama: updating session prompts load by @herrera-luis in https://github.com/ggerganov/whisper.cpp/pull/854
* CMake/Makefile : CLBlast support as in llama.cpp by @trholding in https://github.com/ggerganov/whisper.cpp/pull/862
* Instruction: Partial OpenCL GPU support via CLBlast by @trholding in https://github.com/ggerganov/whisper.cpp/pull/863
* Add cuBLAS build workflow and fix error causing lines in CMakeLists by @RelatedTitle in https://github.com/ggerganov/whisper.cpp/pull/867
* cmake : fix options disabling AVX and AVX2 flags by @blazingzephyr in https://github.com/ggerganov/whisper.cpp/pull/885
* Added large-v2. Added instructions on converting to GGML. Added --no-β¦ by @cjheath in https://github.com/ggerganov/whisper.cpp/pull/874
* talk-llama: only copy used KV cache in get / set state by @herrera-luis in https://github.com/ggerganov/whisper.cpp/pull/890
* Fix define used for COREML_ALLOW_FALLBACK by @jcsoo in https://github.com/ggerganov/whisper.cpp/pull/893
* coreml : fix memory leak by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/899
* whisper.objc : enable Core ML in example & fix segmentation fault by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/910
* Align --no-timestamps in help to actual behavior by @Miserlou in https://github.com/ggerganov/whisper.cpp/pull/908
* readme : improve Core ML model conversion guidance by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/915
* Added support of large-v1 model into CoreML by @abCods in https://github.com/ggerganov/whisper.cpp/pull/926
* Update of Hebrew Language Code: 'iw' to 'he' by @ttv20 in https://github.com/ggerganov/whisper.cpp/pull/935
* java bindings by @nalbion in https://github.com/ggerganov/whisper.cpp/pull/931
* ci: Build with any BLAS compatible library by @akharlamov in https://github.com/ggerganov/whisper.cpp/pull/927
* [DOCS] highlight openblas support in https://github.com/ggerganov/whisper.cpp/pull/956
* Update elevenlabs example to use official python API by @DGdev91 in https://github.com/ggerganov/whisper.cpp/pull/837
* Update README.md by @genevera in https://github.com/ggerganov/whisper.cpp/pull/964
* Feature/java bindings2 by @nalbion in https://github.com/ggerganov/whisper.cpp/pull/944
* Support decode wav file has 2 channels. by @geniusnut in https://github.com/ggerganov/whisper.cpp/pull/972
* README.md: Corrected syntax for markdown link by @LarryBattle in https://github.com/ggerganov/whisper.cpp/pull/995
* Make convert-pt-to-ggml.py backwards compatible with older vocab.json tokenizer files by @akashmjn in https://github.com/ggerganov/whisper.cpp/pull/1001
* Fixing Accidental 'exit(0)' and Ensuring Proper 'return 1' in `examples/main/main.cpp` `whisper_params_parse` by @faker2048 in https://github.com/ggerganov/whisper.cpp/pull/1002
* Fix for issue #876 by @burningion in https://github.com/ggerganov/whisper.cpp/pull/1012
* Make cuBLAS compilation compatible with x86 as well as aarch64 by @byte-6174 in https://github.com/ggerganov/whisper.cpp/pull/1015
* feat(golang): improve progress reporting and callback handling by @appleboy in https://github.com/ggerganov/whisper.cpp/pull/1024
* Add support for whisper_full_lang_id() to go bindings by @jaybinks in https://github.com/ggerganov/whisper.cpp/pull/1010
* Add alternative java binding to readme by @GiviMAD in https://github.com/ggerganov/whisper.cpp/pull/1029
* diarization: add diarization support for all current output types by @colinc in https://github.com/ggerganov/whisper.cpp/pull/1031
* Fix cd statements to allow spaces in model path by @roddurd in https://github.com/ggerganov/whisper.cpp/pull/1041
* adding ggml_to_pt script by @simonMoisselin in https://github.com/ggerganov/whisper.cpp/pull/1042
* whisper: Fix build with -Werror=undef by @philn in https://github.com/ggerganov/whisper.cpp/pull/1045
* Fix talk-llama build after ggml sync (commit 5feb0dffbae5). by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1049
* Do not use _GNU_SOURCE gratuitously. by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1027
* whisper : `split_on_word` no longer trims by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1046
* Updated 'quantize-all.sh' to quantize all downloaded models by @thefinaldegree in https://github.com/ggerganov/whisper.cpp/pull/1054
* Fix talk-llama build on macOS. by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1062
* whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize by @akashmjn in https://github.com/ggerganov/whisper.cpp/pull/1058
* Minor: updated readme by @mwarnaar in https://github.com/ggerganov/whisper.cpp/pull/1064
* OpenVINO support by @RyanMetcalfeInt8 in https://github.com/ggerganov/whisper.cpp/pull/1037
* go bindings: fix context.Process call in examples by @mvrilo in https://github.com/ggerganov/whisper.cpp/pull/1067
* go: Call SetDuration appropriately by @tmc in https://github.com/ggerganov/whisper.cpp/pull/1077
* Multi platforms CI by @alonfaraj in https://github.com/ggerganov/whisper.cpp/pull/1101
* Add Vim plugin by @AustinMroz in https://github.com/ggerganov/whisper.cpp/pull/1131
* chore: move progress calculation out of whisper.cpp by @geekodour in https://github.com/ggerganov/whisper.cpp/pull/1081
* expose api to let user control log output by @evmar in https://github.com/ggerganov/whisper.cpp/pull/1060
* Add a larger (30min) sample by @vadi2 in https://github.com/ggerganov/whisper.cpp/pull/1092
* Sync opencl compilation fix in ggml by @goncha in https://github.com/ggerganov/whisper.cpp/pull/1111
* README.md: Add OpenVINO support details by @RyanMetcalfeInt8 in https://github.com/ggerganov/whisper.cpp/pull/1112
* Fix MSVC compile error C3688 on non-unicode Windows by @goncha in https://github.com/ggerganov/whisper.cpp/pull/1110
* Now make tests can be called as make tests base.en by @Jerry-Master in https://github.com/ggerganov/whisper.cpp/pull/1113
* Go binding: Implement SetSplitOnWord by @xdrudis in https://github.com/ggerganov/whisper.cpp/pull/1114
* set NVCC -arch flag by cuda version by @alonfaraj in https://github.com/ggerganov/whisper.cpp/pull/1115
* Fix CLBlast build on MacOS by @iceychris in https://github.com/ggerganov/whisper.cpp/pull/1120
* Fixed the issue of OpenBLAS not being enabled on Windows. by @bobqianic in https://github.com/ggerganov/whisper.cpp/pull/1128
* whisper : fix visibility warning of struct whisper_full_params by declaring in advance by @IronBlood in https://github.com/ggerganov/whisper.cpp/pull/1124
* Fix MSVC compile error C3688 by @bobqianic in https://github.com/ggerganov/whisper.cpp/pull/1136
* Add tinydiarization support for streaming by @DMcConnell in https://github.com/ggerganov/whisper.cpp/pull/1137
* quantize : fix load vocab crash when len is 128 by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/1160
* Fix AVX etc. under GCC/CMake by @marmistrz in https://github.com/ggerganov/whisper.cpp/pull/1174
* Fix PowerPC build failures introduced in #1174 by @marmistrz in https://github.com/ggerganov/whisper.cpp/pull/1196
* Simplify Makefile by @alonfaraj in https://github.com/ggerganov/whisper.cpp/pull/1147
* Add precalculated values of sin/cos for speeding up FFT by @AlexandrGraschenkov in https://github.com/ggerganov/whisper.cpp/pull/1142
* Make build work on Linux machines supporting AVX1 not AVX2 by @lachesis in https://github.com/ggerganov/whisper.cpp/pull/1162
* Fix OpenBLAS detection under Arch Linux by @marmistrz in https://github.com/ggerganov/whisper.cpp/pull/1173
* Minor fixes by @csukuangfj in https://github.com/ggerganov/whisper.cpp/pull/1154
* New command line option by @jbyunes in https://github.com/ggerganov/whisper.cpp/pull/1205
* whisper.android : migrate from ndk-build to CMake by @JunkFood02 in https://github.com/ggerganov/whisper.cpp/pull/1204
* Significantly improve whisper.cpp inference quality by @bobqianic in https://github.com/ggerganov/whisper.cpp/pull/1148
* whisper : allow whisper_full from mel spectrogram - no audio by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1214
* ROCm Port by @ardfork in https://github.com/ggerganov/whisper.cpp/pull/1209
* Improvements to vim plugin and LSP server by @AustinMroz in https://github.com/ggerganov/whisper.cpp/pull/1144
* Detect SSSE3 by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1211
* ggml : fix compiling when SSE3 is available but not SSSE3 by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1210
* make : add support for building on DragonFlyBSD/NetBSD/OpenBSD by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1212
* make : use cpuinfo in MSYS2 to enable x86 ISA extensions on the host by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1216
* Fix CoreML memleak (fixes #1202) by @denersc in https://github.com/ggerganov/whisper.cpp/pull/1218
* whisper.android : fix cmake multiple libraries build by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/1224
* Fix compilation errors incurred by -Werror by @shivamidow in https://github.com/ggerganov/whisper.cpp/pull/1227
* ci : enable java package publishing by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1228
* fix cmake commands in README #1225 by @wizardforcel in https://github.com/ggerganov/whisper.cpp/pull/1231
* ggml : sync (ggml-alloc, GPU, eps, etc.) by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1220
* make : improve cpuinfo handling on x86 hosts by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1238
* ggml : sync latest llama.cpp (view_src + alloc improvements) by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1247
* Posixify pagesize. by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1251
* Fix detection of AVX2 on macOS by @didzis in https://github.com/ggerganov/whisper.cpp/pull/1250
* Address ARM's big.LITTLE arch by checking cpu info. by @Digipom in https://github.com/ggerganov/whisper.cpp/pull/1254
* Bump gradle plugin and dependencies + a lint pass by @Digipom in https://github.com/ggerganov/whisper.cpp/pull/1255
* Add quantized models to download-ggml-model.sh by @nchudleigh in https://github.com/ggerganov/whisper.cpp/pull/1235
* Do not use _GNU_SOURCE gratuitously. by @przemoc in https://github.com/ggerganov/whisper.cpp/pull/1129
* ci : upgrade gradle to 2.4.2 by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1263
* sync : ggml (HBM + Metal + style) by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1264
* ci : try to fix gradle action by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1265
* Fixed signing of java artifact using gradle by @nalbion in https://github.com/ggerganov/whisper.cpp/pull/1267
* Faster `beam_search` sampling by @bobqianic in https://github.com/ggerganov/whisper.cpp/pull/1243
* whisper : fix bench regression by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1275
* whisper : Metal and ggml-alloc support by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1270
* bench: fix missing include by @nekr0z in https://github.com/ggerganov/whisper.cpp/pull/1303
* ruby : fix build by add missing ggml-alloc by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/1305
* Update README.md. Adding missing options, remove `--speed-up`. by @Sogl in https://github.com/ggerganov/whisper.cpp/pull/1306
* Update README.md by @computerscienceiscool in https://github.com/ggerganov/whisper.cpp/pull/1290
* save the recorded audio to a file by @litongjava in https://github.com/ggerganov/whisper.cpp/pull/1310
* Python benchmark script by @nchudleigh in https://github.com/ggerganov/whisper.cpp/pull/1298
* Minor: fix example talk readme gpt-2 github url by @brunofaustino in https://github.com/ggerganov/whisper.cpp/pull/1334
* Missing speaker turn function in API by @didzis in https://github.com/ggerganov/whisper.cpp/pull/1330
* examples: Move wav_writer from stream.cpp to common.h by @bobqianic in https://github.com/ggerganov/whisper.cpp/pull/1317
* Better abort callback by @mkiol in https://github.com/ggerganov/whisper.cpp/pull/1335
* Add conversion scripts from HuggingFace models to CoreML by @AlienKevin in https://github.com/ggerganov/whisper.cpp/pull/1304
* Prefer pkg-config while looking for BLAS by @marmistrz in https://github.com/ggerganov/whisper.cpp/pull/1349
* Abort build if a feature was requested and could not be configured by @marmistrz in https://github.com/ggerganov/whisper.cpp/pull/1350
* Abort callback improvements by @mkiol in https://github.com/ggerganov/whisper.cpp/pull/1345
* Dockerfile for cublas by @joecryptotoo in https://github.com/ggerganov/whisper.cpp/pull/1286
* docs: fix typo by @jorismertz in https://github.com/ggerganov/whisper.cpp/pull/1362
* Expose the audio_ctx param through the Go binding by @JohanRaffin in https://github.com/ggerganov/whisper.cpp/pull/1368
* Clarify doc about where to compile from by @ai-at-home in https://github.com/ggerganov/whisper.cpp/pull/1400
* Faster download for models on windows using BitTransfer by @WhiteOlivierus in https://github.com/ggerganov/whisper.cpp/pull/1404
* JSON: allow outputting per-token data too by @akx in https://github.com/ggerganov/whisper.cpp/pull/1358
* Move up-to-date demo to top by @asadm in https://github.com/ggerganov/whisper.cpp/pull/1417
* Use absolute paths for the converted OpenVINO model by @bobqianic in https://github.com/ggerganov/whisper.cpp/pull/1356
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1422
* whisper : add support for new distilled Whisper models by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1424
* whisper : add context param for disable gpu by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/1293
* talk-llama : fix n_gpu_layers usage by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/1441
* talk-llama : fix n_gpu_layers usage again by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/1442
* Fix variable names in GitHub actions config by @iamthad in https://github.com/ggerganov/whisper.cpp/pull/1440
* Reset ctx->t_start_us when calling whisper_reset_timings() by @bjnortier in https://github.com/ggerganov/whisper.cpp/pull/1434
* Decouple Android example into a library and app module by @tobrun in https://github.com/ggerganov/whisper.cpp/pull/1445
* whisper : add support for large v3 by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1444
* Add support for Swift Package Manager by @sindresorhus in https://github.com/ggerganov/whisper.cpp/pull/1370
* Reset mel time when resetting timings by @bjnortier in https://github.com/ggerganov/whisper.cpp/pull/1452
* coreml: use the correct n_mel by @jxy in https://github.com/ggerganov/whisper.cpp/pull/1458
* models : Fix `n_mel` mismatch in convert-whisper-to-openvino.py by @bobqianic in https://github.com/ggerganov/whisper.cpp/pull/1459
* Add '-l auto' to talk-llama example by @kubaracek in https://github.com/ggerganov/whisper.cpp/pull/1467
* Return with error from whisper_encode_internal and whisper_decode_int⦠by @bjnortier in https://github.com/ggerganov/whisper.cpp/pull/1456
* whisper : add full CUDA and Metal offloading by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1472
* examples : Enhanced compatibility with older Android versions using Java by @litongjava in https://github.com/ggerganov/whisper.cpp/pull/1382
* Add n_gpu_layers option to talk-llama example by @rlapray in https://github.com/ggerganov/whisper.cpp/pull/1475
* whisper : add grammar-based sampling by @ejones in https://github.com/ggerganov/whisper.cpp/pull/1229
* java : use tiny.en for tests by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1484
* whisper : add batched decoding by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1486
* java : fix test by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1492
* whisper : make large version explicit + fix data size units by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/1493
## New Contributors
* @BaffinLee made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/832
* @herrera-luis made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/845
* @CRD716 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/853
* @trholding made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/862
* @RelatedTitle made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/867
* @blazingzephyr made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/885
* @cjheath made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/874
* @jcsoo made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/893
* @Miserlou made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/908
* @abCods made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/926
* @ttv20 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/935
* @nalbion made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/931
* @akharlamov made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/927
* @geniusnut made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/972
* @LarryBattle made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/995
* @akashmjn made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1001
* @faker2048 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1002
* @burningion made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1012
* @byte-6174 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1015
* @appleboy made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1024
* @jaybinks made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1010
* @GiviMAD made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1029
* @colinc made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1031
* @roddurd made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1041
* @simonMoisselin made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1042
* @philn made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1045
* @przemoc made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1049
* @thefinaldegree made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1054
* @mwarnaar made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1064
* @RyanMetcalfeInt8 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1037
* @mvrilo made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1067
* @tmc made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1077
* @alonfaraj made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1101
* @AustinMroz made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1131
* @geekodour made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1081
* @evmar made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1060
* @vadi2 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1092
* @goncha made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1111
* @Jerry-Master made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1113
* @xdrudis made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1114
* @iceychris made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1120
* @bobqianic made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1128
* @IronBlood made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1124
* @DMcConnell made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1137
* @marmistrz made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1174
* @AlexandrGraschenkov made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1142
* @lachesis made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1162
* @csukuangfj made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1154
* @jbyunes made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1205
* @JunkFood02 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1204
* @ardfork made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1209
* @denersc made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1218
* @shivamidow made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1227
* @wizardforcel made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1231
* @didzis made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1250
* @nchudleigh made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1235
* @nekr0z made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1303
* @Sogl made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1306
* @computerscienceiscool made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1290
* @litongjava made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1310
* @brunofaustino made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1334
* @mkiol made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1335
* @AlienKevin made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1304
* @joecryptotoo made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1286
* @jorismertz made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1362
* @JohanRaffin made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1368
* @ai-at-home made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1400
* @WhiteOlivierus made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1404
* @akx made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1358
* @asadm made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1417
* @iamthad made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1440
* @bjnortier made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1434
* @tobrun made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1445
* @sindresorhus made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1370
* @jxy made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1458
* @kubaracek made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1467
* @rlapray made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/1475
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/v1.4.0...v1.5.0
This is a minor release, the main reason for which is that there hasn't been an official release for a few months now and some small things have accumulated on the `master` branch that would be nice to be upstreamed. I am planning a major `v1.5.0` release with some new and long-waited functionality soon:
- Full CUDA offloading
- Efficient Beam-Search implementation
- Grammar support
The current version `v1.4.3` should be considered in beta as I haven't worked intensively on `whisper.cpp` recently and there might be some issues that made their way in the code. I'll try to polish things in the next days and prepare a stable `v1.5.0` release. In the meantime, any feedback will be highly appreciated.
***Detailed API changes, features and new contributor recognitions will be included in the `v1.5.0` release.***
## Overview
This is a new major release adding **integer quantization** and **partial GPU (NVIDIA)** support
### Integer quantization
This allows the `ggml` Whisper models to be converted from the default 16-bit floating point weights to 4, 5 or 8 bit integer weights.
The resulting quantized models are smaller in disk size and memory usage and can be processed faster on some architectures. The transcription quality is degraded to some extend - not quantified at the moment.
- Supported quantization modes: `Q4_0`, `Q4_1`, `Q4_2`, `Q5_0`, `Q5_1`, `Q8_0`
- Implementation details: https://github.com/ggerganov/whisper.cpp/pull/540
- Usage instructions: [README](https://github.com/ggerganov/whisper.cpp#quantization)
- All WASM examples now support `Q5` quantized models: https://whisper.ggerganov.com
Here is a quantitative evaluation of the different quantization modes applied to the [LLaMA](https://github.com/facebookresearch/llama) and [RWKV](https://github.com/BlinkDL/RWKV-LM) large language models. These results can give an impression about the expected quality, size and speed for quantized Whisper models:
#### LLaMA quantization (measured on M1 Pro)
| Model | Measure | F16 | Q4_0 | Q4_1 | Q4_2 | Q5_0 | Q5_1 | Q8_0 |
|------:|--------------|-------:|-------:|-------:|-------:|-------:|-------:|-------:|
| 7B | perplexity | 5.9565 | 6.2103 | 6.1286 | 6.1698 | 6.0139 | 5.9934 | 5.9571 |
| 7B | file size | 13.0G | 4.0G | 4.8G | 4.0G | 4.4G | 4.8G | 7.1G |
| 7B | ms/tok @ 4th | 128 | 56 | 61 | 84 | 91 | 95 | 75 |
| 7B | ms/tok @ 8th | 128 | 47 | 55 | 48 | 53 | 59 | 75 |
| 7B | bits/weight | 16.0 | 5.0 | 6.0 | 5.0 | 5.5 | 6.0 | 9.0 |
| 13B | perplexity | 5.2455 | 5.3748 | 5.3471 | 5.3433 | 5.2768 | 5.2582 | 5.2458 |
| 13B | file size | 25.0G | 7.6G | 9.1G | 7.6G | 8.4G | 9.1G | 14G |
| 13B | ms/tok @ 4th | 239 | 104 | 113 | 160 | 176 | 185 | 141 |
| 13B | ms/tok @ 8th | 240 | 85 | 99 | 97 | 108 | 117 | 147 |
| 13B | bits/weight | 16.0 | 5.0 | 6.0 | 5.0 | 5.5 | 6.0 | 9.0 |
ref: https://github.com/ggerganov/llama.cpp#quantization
#### RWKV quantization
| Format | Perplexity (169M) | Latency, ms (1.5B) | File size, GB (1.5B) |
|-----------|-------------------|--------------------|----------------------|
| `Q4_0` | 17.507 | *76* | **1.53** |
| `Q4_1` | 17.187 | **72** | 1.68 |
| `Q4_2` | 17.060 | 85 | **1.53** |
| `Q5_0` | 16.194 | 78 | *1.60* |
| `Q5_1` | 15.851 | 81 | 1.68 |
| `Q8_0` | *15.652* | 89 | 2.13 |
| `FP16` | **15.623** | 117 | 2.82 |
| `FP32` | **15.623** | 198 | 5.64 |
ref: https://github.com/ggerganov/ggml/issues/89#issuecomment-1528781992
This feature is possible thanks to the many contributions in the [llama.cpp](https://github.com/ggerganov/llama.cpp) project: https://github.com/users/ggerganov/projects/2
### GPU support via cuBLAS
Using cuBLAS results mainly in improved Encoder inference speed. I haven't done proper timings, but one can expect at least 2-3 times faster Encoder evaluation with modern NVIDIA GPU cards compared to CPU-only processing. Feel free to post your Encoder benchmarks in issue #89.
- Implementation details: https://github.com/ggerganov/whisper.cpp/pull/834
- Usage instructions: [README](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
This is another feature made possible by the [llama.cpp](https://github.com/ggerganov/llama.cpp) project. Special recognition to @slaren for putting almost all of this work together
---
This release remains in "beta" stage as I haven't verified that everything works as expected.
## What's Changed
* Updated escape_double_quotes() Function by @tauseefmohammed2 in https://github.com/ggerganov/whisper.cpp/pull/776
* examples : add missing #include <cstdint> by @pH5 in https://github.com/ggerganov/whisper.cpp/pull/798
* Flush upon finishing inference by @tarasglek in https://github.com/ggerganov/whisper.cpp/pull/811
* Escape quotes in csv output by @laytan in https://github.com/ggerganov/whisper.cpp/pull/815
* C++11style by @wuyudi in https://github.com/ggerganov/whisper.cpp/pull/768
* Optionally allow a Core ML build of Whisper to work with or without Core ML models by @Canis-UK in https://github.com/ggerganov/whisper.cpp/pull/812
* add some tips about in the readme of the android project folder by @Zolliner in https://github.com/ggerganov/whisper.cpp/pull/816
* whisper: Use correct seek_end when offset is used by @ThijsRay in https://github.com/ggerganov/whisper.cpp/pull/833
* ggml : fix 32-bit ARM NEON by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/836
* Add CUDA support via cuBLAS by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/834
* Integer quantisation support by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/540
## New Contributors
* @tauseefmohammed2 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/776
* @pH5 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/798
* @tarasglek made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/811
* @laytan made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/815
* @wuyudi made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/768
* @Canis-UK made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/812
* @Zolliner made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/816
* @ThijsRay made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/833
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/v1.3.0...v1.4.0
## Overview
This release should be considered in Beta stage, since I haven't done a lot of testing and I am not sure if I didn't break something.
But overall, I believe both the performance and the quality are improved.
- Added Core ML support #566
- Restored decoding fallbacks with default size of 2 instead of 5 (f19e23fbd108ec3ac458c7a19b31c930719e7a94)
- Pad the audio with zeros instead of the spectrogram (5108b30e6daf361c856abb6b86e5038500bdbeb1)
- Added [talk-llama](https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk-llama) example
- Added `whisper_state` which allows parallel transcriptions with a single model in memory (#523)
The C-style API has been extended significantly to support the new `whisper_state`, but in general should be backwards compatible.
The only breaking change is in the callbacks signatures.
Please provide feedback in the discussion if you observe any issues.
The next release `v1.4.0` will follow up relatively soon and will provide 4-bit integer quantization support.
## What's Changed
* update csv output format to match OpenAI's Whisper dataframe output by @hykelvinlee42 in https://github.com/ggerganov/whisper.cpp/pull/552
* Go binding: NewContext now returns a clean context by @polarmoon in https://github.com/ggerganov/whisper.cpp/pull/537
* Added whisper state + default state on the whisper_context by @sandrohanea in https://github.com/ggerganov/whisper.cpp/pull/523
* whisper.android: Enable fp16 instrinsics (FP16_VA) which is supported by ARMv8.2 or later. by @tinoue in https://github.com/ggerganov/whisper.cpp/pull/572
* Add quality comparison helper by @venkr in https://github.com/ggerganov/whisper.cpp/pull/569
* whisper.android: Support benchmark for Android example. by @tinoue in https://github.com/ggerganov/whisper.cpp/pull/542
* Fix MUSL Linux build by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/576
* Change default encoding to UTF-8 by @Kamilake in https://github.com/ggerganov/whisper.cpp/pull/605
* Provide option for creating JSON output by @tuxpoldo in https://github.com/ggerganov/whisper.cpp/pull/615
* readme : add react-native bindings by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/619
* Fixed language auto-detection for state provided processing. by @sandrohanea in https://github.com/ggerganov/whisper.cpp/pull/627
* xcodeproj : add `-O3 -DNDEBUG` in release mode by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/640
* Nodejs Addon blocking main thread. Implemented Napi::AsyncWorker by @LucasZNK in https://github.com/ggerganov/whisper.cpp/pull/642
* Include link to R wrapper in README by @jwijffels in https://github.com/ggerganov/whisper.cpp/pull/626
* Add a cmake flag to disable F16C by @a5huynh in https://github.com/ggerganov/whisper.cpp/pull/628
* Add talk-llama example by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/664
* Add Alpaca support to talk-llama example by @ejones in https://github.com/ggerganov/whisper.cpp/pull/668
* Update README.md by @razodactyl in https://github.com/ggerganov/whisper.cpp/pull/682
* issue #470 - working 32-bit ARM by @clach04 in https://github.com/ggerganov/whisper.cpp/pull/486
* whisper : add initial_prompt param by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/645
* fix typo in JSON output by @egorFiNE in https://github.com/ggerganov/whisper.cpp/pull/648
* Fix shell script ./models/download-ggml-model.sh to handle spaces and special characters in paths by @be-next in https://github.com/ggerganov/whisper.cpp/pull/677
* Fixed test to new async implementation by @LucasZNK in https://github.com/ggerganov/whisper.cpp/pull/686
* Minor: fixing usage message for talk-llama by @InconsolableCellist in https://github.com/ggerganov/whisper.cpp/pull/687
* Small typo by @ZiggerZZ in https://github.com/ggerganov/whisper.cpp/pull/688
* feat: add progress callback by @pajowu in https://github.com/ggerganov/whisper.cpp/pull/600
* ggml : fix q4_1 dot product types by @novag in https://github.com/ggerganov/whisper.cpp/pull/759
* Exposed various parts to the Go Interface by @bmurray in https://github.com/ggerganov/whisper.cpp/pull/697
* Adds shell command example for --print-colors by @bocytko in https://github.com/ggerganov/whisper.cpp/pull/710
* Makefile: disable avx in case f16c is not available by @duthils in https://github.com/ggerganov/whisper.cpp/pull/706
* Making the quick start instructions clearer. by @Onlyartist9 in https://github.com/ggerganov/whisper.cpp/pull/716
* Add lrc output support by @WhichWho in https://github.com/ggerganov/whisper.cpp/pull/718
* Corrects default speak.sh path in talk-llama by @mab122 in https://github.com/ggerganov/whisper.cpp/pull/720
* Add msvc compiler args /utf-8 fix error C3688 by @WhichWho in https://github.com/ggerganov/whisper.cpp/pull/721
* Changed convert-pt-to-ggml.py to use .tiktoken tokenizer files by @ivan-gorin in https://github.com/ggerganov/whisper.cpp/pull/725
* talk/talk-llama: add basic example script for eleven-labs tts by @DGdev91 in https://github.com/ggerganov/whisper.cpp/pull/728
* readme : add Unity3d bindings by @Macoron in https://github.com/ggerganov/whisper.cpp/pull/733
* Update stream.cpp by @AliAlameh in https://github.com/ggerganov/whisper.cpp/pull/501
* Fix typos in whisper.h by @GitAritron in https://github.com/ggerganov/whisper.cpp/pull/737
* Update LICENSE by @masguit42 in https://github.com/ggerganov/whisper.cpp/pull/739
* fix potential memory leaks by @baderouaich in https://github.com/ggerganov/whisper.cpp/pull/740
* readme: Add alternate swift bindings by @exPHAT in https://github.com/ggerganov/whisper.cpp/pull/755
* Fix the bug related to word splitting errors in the "tokenize" function. by @AfryMask in https://github.com/ggerganov/whisper.cpp/pull/760
* Do not launch threads for `log_mel_spectrogram` when singlethreaded by @maxilevi in https://github.com/ggerganov/whisper.cpp/pull/763
* Core ML support by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/566
* ggml : fix build on whisper.android (ARM_NEON) by @jhen0409 in https://github.com/ggerganov/whisper.cpp/pull/764
## New Contributors
* @hykelvinlee42 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/552
* @tinoue made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/572
* @venkr made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/569
* @Kamilake made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/605
* @tuxpoldo made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/615
* @jhen0409 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/619
* @LucasZNK made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/642
* @jwijffels made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/626
* @a5huynh made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/628
* @ejones made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/668
* @razodactyl made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/682
* @clach04 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/486
* @egorFiNE made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/648
* @be-next made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/677
* @InconsolableCellist made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/687
* @ZiggerZZ made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/688
* @pajowu made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/600
* @novag made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/759
* @bmurray made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/697
* @bocytko made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/710
* @duthils made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/706
* @Onlyartist9 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/716
* @WhichWho made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/718
* @mab122 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/720
* @ivan-gorin made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/725
* @DGdev91 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/728
* @Macoron made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/733
* @AliAlameh made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/501
* @GitAritron made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/737
* @masguit42 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/739
* @baderouaich made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/740
* @exPHAT made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/755
* @AfryMask made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/760
* @maxilevi made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/763
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/v1.2.1...v1.3.0
## Overview
This is a minor release. The main reason for it is a critical bug fix that causes the software to crash randomly when the language auto-detect option is used (i.e. `whisper_lang_auto_detect()`).
Other than that, the release includes refactoring of the examples, ruby bindings and some minor changes to the C API.
You can provide feedback in the existing [v1.2.0 discussion](https://github.com/ggerganov/whisper.cpp/discussions/467).
## What's Changed
#### Core `ggml` / `whisper`
* `whisper` : whisper : add "split_on_word" flag when using using "max_len" option by @mightymatth in #455 and @boolemancer in https://github.com/ggerganov/whisper.cpp/pull/476
* `whisper` : add whisper_full_lang_id() for getting the context lang by @kamranjon in https://github.com/ggerganov/whisper.cpp/pull/461
* `whisper` : fixed Beam Search Strategy and exposed whisper_pcm_to_mel_phase_vocoder by @sandrohanea in https://github.com/ggerganov/whisper.cpp/pull/474
* `whisper` : suppress non-speech-related token outputs by @shibukazu in https://github.com/ggerganov/whisper.cpp/pull/473
* `cmake` : install whisper.h header by @aviks in https://github.com/ggerganov/whisper.cpp/pull/485
* `whisper` : fix signedness compiler warning by @shikokuchuo in https://github.com/ggerganov/whisper.cpp/pull/506
* `whisper` : by default disable non-speech tokens suppression #473
* `whisper` : add API for applying custom logits filters during decoding 0d229163bbea769c7a3e0e500e45850c9a6e2e42
* `whisper` : fix uninitialized `exp_n_audio_ctx` by @Finnvoor in https://github.com/ggerganov/whisper.cpp/pull/520
#### Bindings
* `bindings` : add Ruby by @taf2 in https://github.com/ggerganov/whisper.cpp/pull/500
* `readme` : add .NET repos (#303)
* `readme` : add cython bindings (#9)
* `readme` : add pybind11 bindings by @aarnphm in https://github.com/ggerganov/whisper.cpp/pull/538
#### Examples
* `ci` : add node addon test and optimize compilation configuration by @chenqianhe in https://github.com/ggerganov/whisper.cpp/pull/468
* `yt-wsp.sh` : add unique filename generation by @genevera in https://github.com/ggerganov/whisper.cpp/pull/495
* `examples` : refactor in order to reuse code and reduce duplication by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/482
* `main` : fix stdin pipe stream by @conradg in https://github.com/ggerganov/whisper.cpp/pull/503
* `make` : add "-mcpu=native" when building for aarch64 (#532)
#### C-style API
* Add `whisper_pcm_to_mel_phase_vocoder()`
* Add `*(whisper_logits_filter_callback)()`
* Change `struct whisper_full_params`
* Add `whisper_full_lang_id()`
## New Contributors
* @mightymatth made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/455
* @kamranjon made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/461
* @sandrohanea made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/474
* @shibukazu made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/473
* @genevera made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/495
* @shikokuchuo made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/506
* @conradg made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/503
* @taf2 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/500
* @Finnvoor made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/520
* @aarnphm made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/538
* @FlippFuzz made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/532
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/v1.2.0...v1.2.1
## Highlights
Recently, I have been making progress on adding integer quantisation support in the `ggml` tensor library. This will eventually allow to use quantised models which require less memory and will hopefully run faster. I think the next major release `v1.3.0` will officially add quantisation support. For now, you can keep track of the progress in #540
---
- **ποΈ MacWhisper by @jordibruin powered by whisper.cpp**
https://goodsnooze.gumroad.com/l/macwhisper
<div align="center">
<a href="https://goodsnooze.gumroad.com/l/macwhisper"><img width="1663" alt="image" src="https://user-images.githubusercontent.com/1991296/223670514-5b482ec2-bee3-44c9-b90f-724da750cdf3.png"></a>
</div>
## Overview
In this release we significantly reduce the memory usage during inference by introducing "scratch" buffers to `ggml`.
The new memory requirements per model are as follows:
| Model | Disk | Mem (Old) | Mem (New) |
| --- | --- | --- | --- |
| tiny | 75 MB | ~390 MB | ~125 MB |
| base | 142 MB | ~500 MB | ~210 MB |
| small | 466 MB | ~1.0 GB | ~600 MB |
| medium | 1.5 GB | ~2.6 GB | ~1.7 GB |
| large | 2.9 GB | ~4.7 GB | ~3.3 GB |
It's a simple idea that instead of creating a new memory buffer for each new tensor in the computation, we reuse the memory of old tensors that are no longer needed. The implementation is in PR #431. It's not very clean - I think there is some better way to do this, but for now it will work.
Additionally, there might be some inference speed improvements on Apple Silicon in the Decoder part of the transformer. I haven't done proper benchmarks, but seems there is about ~30% performance boost. The results are identical to `v1.1.1`.
## What's Changed
#### Core `ggml` / `whisper`
* `whisper` : PPC64 big-endian support by @fitzsim in https://github.com/ggerganov/whisper.cpp/pull/398
* `whisper` : condition sampled timestamp tokens to be monotonically increasing by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/425
* `wasm` : fix typo in helper.js by @bhbs in https://github.com/ggerganov/whisper.cpp/pull/459
* `ggml`/`whisper` : reduce memory usage during inference by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/431
#### Bindings
* `ci` : run workflows on pull requests + bindings depend on .h by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/446
* `go` : added wrappers to reset and print timings by @glaslos in https://github.com/ggerganov/whisper.cpp/pull/436
* `go` : add WhisperLangAutoDetect method to go binding by @RobinXL in https://github.com/ggerganov/whisper.cpp/pull/451
* `go` : add wrapper for system info by @glaslos in https://github.com/ggerganov/whisper.cpp/pull/456
* `go` : support "auto" as an option when set language by @polarmoon in https://github.com/ggerganov/whisper.cpp/pull/462
#### Examples
* `whisper.wasm` : add labels for easier radio selection by @kokes in https://github.com/ggerganov/whisper.cpp/pull/435
* `livestream.sh` : run main with model arg instead of default by @EricTendian in https://github.com/ggerganov/whisper.cpp/pull/453
* `main` : CSV format export trimmed spaces fix by @alex-bacart in https://github.com/ggerganov/whisper.cpp/pull/444
* `addon.node` : using whisper as a Node.js addon by @chenqianhe in https://github.com/ggerganov/whisper.cpp/pull/443
## New Contributors
* @kokes made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/435
* @glaslos made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/436
* @EricTendian made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/453
* @RobinXL made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/451
* @alex-bacart made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/444
* @bhbs made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/459
* @polarmoon made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/462
* @chenqianhe made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/443
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/v1.1.1...v1.2.0
## Highlights
I'll use these release notes to write some random thoughts about the project - sort of a short blog post.
I'm really happy with how `whisper.cpp` turned out to be so far. There is a very positive reception in the ML community - most people seem to be excited by the simplicity of the implementation and the fact that it is quite self-contained. I receive a lot of questions about the project and about various ideas that it can be applied to. I really enjoy it and I try to respond to everyone!
I also find it very satisfying that there are so many contributions already happening by so many people. To me this illustrates the power of open-source collaboration. The contributions not only improve the functionality and the quality of the code, but also help to generate various new ideas and approaches to explore.
Another interesting thing is that the project keeps on giving. Every time I start to think that now is a good time to put it in the background for a while and focus on other stuff, some new cool idea pops up and I can't help but start working on it. Having this custom implementation allows me to interact with the model on a lower level which opens some interesting ways to explore it.
So far the development has been focused on improving the performance, expanding the platform coverage and having robust decoding strategies with a variety of examples. During this time, there have been several ideas that accumulated over-time which I find interesting to explore (diarization, token-level timestamps, improved timestamp accuracy, etc). I think I'll try to focus more on these in the future and see if I can achieve something interesting.
---
- **Windows port of `whisper.cpp` utilising vendor-agnostic GPGPU based on DirectCompute by @Const-me**
https://github.com/Const-me/Whisper
---
- **"The New Yorker" article featuring `whisper.cpp`**
<div align="center">
<h2><a href="https://www.newyorker.com/tech/annals-of-technology/whispers-of-ais-modular-future">Whispers of A.I.βs Modular Future</a></h2>
<a href="https://www.newyorker.com/tech/annals-of-technology/whispers-of-ais-modular-future"><img width="1663" alt="image" src="https://media.newyorker.com/photos/63d93e688b2aff35d30ef8e2/master/w_2560,c_limit/Somers_final.jpg"></a>
</div>
## Overview
Since the [v1.1.0](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.1.0) pre-release there have been several reports of improved transcription quality.
Together with my observations, I think we can declare version `v1.1.1` as "stable".
There were actually a couple of bug-fixes implemented since `v1.1.0`, so make sure to update to `v1.1.1` for optimal results.
Another update is that the prototype for [v1.2.0](https://github.com/ggerganov/whisper.cpp/discussions/126) is almost ready: https://github.com/ggerganov/whisper.cpp/pull/431
Initial results indicate that the memory usage can be reduced by a factor of 2-3 for the smaller models.
You can provide feedback in the existing [v1.1.0 discussion](https://github.com/ggerganov/whisper.cpp/discussions/408).
## What's Changed
#### Core `ggml` / `whisper`
* `whisper` : perform entropy check only when we have at least 32 tokens 1a91c19af929d6dc614a9f3b03026fb23be002a6
* `whisper` : fix condition for providing past prompt (critical) 78f166174f126345ed87cc8f6941af1905c4a0f2
#### Bindings
* `go` : remove `sample_best` and `sample_timestamp` bindings by @Trojan295 in https://github.com/ggerganov/whisper.cpp/pull/409
#### Examples
* `main` : re-enable temperature fallback f583e2d2f5a60e6ebf5bb2819ba4c4d348d41ea2
* `main` : add an option to accept optional output filenames by @garychia in https://github.com/ggerganov/whisper.cpp/pull/424
* `whisper.android` : use AssetManager for Android by @Digipom in https://github.com/ggerganov/whisper.cpp/pull/415
* `whisper.wasm` : add small and small.en models 206fc93396936725bd362c93796cfdc8a87f8509
* `bench` : add memcpy and ggml_mul_mat benchmarks (experimental) 1290fc64572f434f2f36721d2e2b0913cec0178a
## New Contributors
* @Trojan295 made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/409
* @garychia made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/424
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/v1.1.0...v1.1.1
## Overview
The major change in this pre-release is the improved decoding implementation in `whisper.cpp`:
- Support for average logprob and entropy based criteria for fallback
- Support for temperature `T > 0`
- Improved Greedy decoder via `best_of` parameter for `T > 0`
- Add beam search decoding (a.k.a `beam_size`)
More information about the decoding changes can be found in #291
Additionally, there are a few performance improvements for Apple Silicon, WASM and non-F16C platforms.
Support for POWER9 architectures has been added.
The reason that this is a pre-release and not an official release is that the new implementation has not been sufficiently tested yet and the existing bindings for other languages have not been updated to support the API changes. The official release `1.1.x` will be created when there is enough feedback about the new decoding implementation and when the bindings have been updated. So make sure to send your feedback in the [discussion](https://github.com/ggerganov/whisper.cpp/discussions/408) created for this pre-release. For now, the `1.0.4` release should be considered more stable.
## What's Changed
#### Core `ggml` / `whisper`
* `ggml` : POWER9 support by @fitzsim in #320, #349, #369
* `ggml` : simplify the SIMD code by @ggerganov in #324
* `ggml` : add SSE3 and fp16 conversion lookup table by @abitofevrything in #368
* `ggml` : utilise Accelerate's vDSP for some computations d51fc3ee0a0038cdf1522ca3d58b58299de41eb8
* `ggml` : speed-up softmax compute via Accelerate and loop unrolling d61d55cd4b9fe77511c8eea28d0220ce552f7008
* `ggml` : do not start extra threads when using BLAS d347a59a5f224f6a5ab0084ec95715451972d3b0
* `whisper` : do sample_to_timestamp calculation with 64 bit precision to avoid overflow by @boolemancer in #388
* `whisper` : various code clean-up and improvements by @asmaloney in #317 #318 #319 #322 etc
* `whisper` : improve decoding by @ggerganov in #291
* `whisper` : account for speed_up flag for short audio #405
#### C-style API
* Add loader class to allow loading from buffer and others by @prsyahmi in https://github.com/ggerganov/whisper.cpp/pull/353
* Add `whisper_token_data::plog`
* Add `whisper_init_from_file()`
* Add `whisper_init_from_buffer()`
* Change `whisper_init()`
* Remove `whisper_sample_best()`
* Remove `whisper_sample_timestamp()`
* Add `whisper_n_audio_ctx()`
* Add `whisper_get_logits()`
* Remove `whisper_get_probs()`
* Change `struct whisper_full_params`
#### Bindings
* Golang bindings by @djthorpe in #287, #379, #384
#### Examples
* `whisper.android` : remove android ABI constraint by @Digipom in #301
* `whisper.swiftui` : SwiftUI example by @Digipom in #308
* `main` : add `-ocsv`, aka `--output-csv` for writing CSV file containing millisecond timestamps by @NielsMayer in #340
* `command` : refactor to split command list & general transcription modes by @asmaloney in #331
* `command` : always-prompt mode by @dnhkng in #383
* `stream` : fix data race on bool + avoid division-by-zero a466c3404dc62dc221061bb37fb8f78741d749b8
* `stream` : fix a bug that inserted a lot of empty audio at the start a6dbd9188b13378dc36e2c669b9a22e17b4201d1
* `bench.wasm` : print system info fafd78945d5a7ea11ffa31fa6c05dd6593b7d031
## New Contributors
* @djthorpe made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/287
* @0xmohit made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/296
* @asmaloney made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/298
* @fitzsim made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/320
* @NielsMayer made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/340
* @aviks made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/345
* @eltociear made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/346
* @abitofevrything made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/368
* @Mike-Bell made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/381
* @dnhkng made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/383
* @prsyahmi made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/353
* @ianb made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/391
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/v1.0.4...v1.1.0
## Highlights
- **Sample SwiftUI application [example/whisper.swiftui](https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.swiftui)**
<img width="1663" alt="image" src="https://user-images.githubusercontent.com/1991296/212539216-0aef65e4-f882-480a-8358-0f816838fd52.png">
## What's Changed
#### Core `ggml` / `whisper`
* Make `ggml` compatible with c99 9955fa4ed7cc694d5d47fe0bb5f0d02066f9cbac | 0f117594066a213cc3cc9261c8906f316e6fb153
* Fix UB causing asserts in Debug when reading the model vocabulary 124c718c73f915f3e4235ae2af8841356e76177d
* Minor improvements in the Greedy decoding strategy 6a7c82501e3794724ba80bfb9a983810af036803
* Add Windows build without OpenBLAS by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/282
* Add `whisper_tokenize()` - basic text tokenization bf69b669a00e457b6bfa69b97f1fdf2578d3e403
* Language auto-detect option by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/286
* Add AVX,AVX2 support for `ggml_vec_scale_f32` by @katsu560 in https://github.com/ggerganov/whisper.cpp/pull/285
* Implement extra cases for `ggml_compute_forward_dup_f16()` a7047b2a28a8eccb94318eca8a3207894d3822c7
* Added Roadmap and updated F.A.Q. discussion #126
#### C-style API
* Add `whisper_tokenize()`
* Add `whisper_lang_max_id()`
* Add `whisper_lang_str()`
* Add `whisper_lang_auto_detect()`
* Add `whisper_token_lang()`
#### Examples
* Improve prompting in "talk" example a613f16aec81b7715cdbd4386ba62ab2ff1216b3
* Add "sliding window" mode to "stream" example b0f8013eb9f371b500abf1e3c506399ce7f59b11
* Add Android sample by @Digipom in https://github.com/ggerganov/whisper.cpp/pull/277
* Guided mode for the "command" example by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/271
* Example "main" supports `--prompt` option b8065d90f5fdcdb445a8fb3f4717cba54c332cac
* Example "main" supports `--print-progress` option 32fbc8cd04912904cf84af7c5bd0e0e711a6f021
* Example "main" supports `--lang auto` option fba10a4c68f0533a339174ef81c6a18ea228d331
## New Contributors
* @Digipom made their first contribution in https://github.com/ggerganov/whisper.cpp/pull/277
**Full Changelog**: https://github.com/ggerganov/whisper.cpp/compare/1.0.3...1.0.4
## Highlights
- **Sample Android application [example/whisper.android](https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.android)**
<p align="center">
<img width="629" alt="image" src="https://user-images.githubusercontent.com/1991296/208256401-7ebab53f-b788-4b15-8860-71825ef578c4.png">
<img width="200" alt="image" src="https://user-images.githubusercontent.com/1991296/208154256-82d972dc-221b-48c4-bfcb-36ce68602f93.png">
</p>
- **General-purpose, short voice command detection on Raspberry Pi 4 using [example/command](https://github.com/ggerganov/whisper.cpp/tree/master/examples/command)**:
https://user-images.githubusercontent.com/1991296/208255185-6e9d60ea-4bc8-4b64-b731-8ca9f3b7333b.mp4