Comments

You must log in or register to comment.

simpleuserhere OP t1_jcpo1px wrote

I have tested Alpaca 7B model on Android (Google Pixel 7).

https://github.com/rupeshs/alpaca.cpp

49

schorhr t1_jcqwzek wrote

That's amazing!

Thank you for that link. With my old laptop and slow internet connection I'm struggling downloading visual studio and getting everything to work. I do have weights but still figuring out why building fails. Is there any way to download a prebuilt version?

8

simpleuserhere OP t1_jcrfjsh wrote

Thanks,What error are you getting? With Vs compiler and cmake we can easily build it.

2

schorhr t1_jct3v62 wrote

Thanks for your reply!

I have not used vs and cmake before, so I am probably making all newbie mistakes. I've sorted out that some paths where not set, and that C:\mingw-32\bin\make.exe doesn't exist but it's now minigw-make.exe.

Now I get the error that

   'C:/MinGW-32/bin/make.exe' '-?'

  failed with:

   C:/MinGW-32/bin/make.exe: invalid option -- ?

And from the few things I've found on-line I gathered it's because the mingw version doesn't support the option, but I should use Vs instead. I am a bit lost. Every time I manage to fix one issue, there's another one. :-)

2

simpleuserhere OP t1_jct4k2z wrote

I have updated readme with Windows build instructions,please check https://github.com/rupeshs/alpaca.cpp#windows

2

schorhr t1_jct58nc wrote

Thanks!

Both of the instructions (for Android which I'm attempting, but also the Windows instructions) result with the > C:/MinGW-32/bin/make.exe: invalid option -- ? error. I can't seem to figure out what make version I should use instead, or how to edit that.

1

simpleuserhere OP t1_jct9btk wrote

For Android build please use Linux ( tested with Ubuntu 20.04)

2

schorhr t1_jctb6tz wrote

Okay. I don't have the capacity right now (old laptop, disk too small to really use a second OS). I appreciate the help! I will once I get a new computer.

2

ninjasaid13 t1_jcu1odb wrote

I have a problem with

C:\Users\****\source\repos\alpaca.cpp\build>make chat
make: *** No rule to make target 'chat'.  Stop.

and

C:\Users\****\source\repos\alpaca.cpp>make chat
I llama.cpp build info: I UNAME_S:  CYGWIN_NT-10.0 I 
UNAME_P:  unknown I UNAME_M:  x86_64 I CFLAGS:   -I.              
-O3 -DNDEBUG -std=c11   -fPIC -mfma -mf16c -mavx -
mavx2 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -
std=c++11 -fPIC I LDFLAGS: I CC:       cc (GCC) 
10.2.0 I CXX:      g++ (GCC) 10.2.0
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -
mfma -mf16c -mavx -mavx2   -c ggml.c -o ggml.o g++ -
I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -c 
utils.cpp -o utils.o g++ -I. -I./examples -O3 -
DNDEBUG -std=c++11 -fPIC chat.cpp ggml.o utils.o -o 
chat chat.cpp: In function 'int main(int, char**)': 
chat.cpp:883:26: error: aggregate 'main(int, 
char**)::sigaction sigint_action' has incomplete type 
and cannot be defined 883 |         struct sigaction 
sigint_action; |                          
~~~~~~~~~~~~ chat.cpp:885:9: error: 'sigemptyset' was 
not declared in this scope 885 |         sigemptyset 
(&sigint_action.sa_mask); |         ~~~~~~~~~~ 
chat.cpp:887:47: error: invalid use of incomplete 
type 'struct main(int, char**)::sigaction' 887 |         
sigaction(SIGINT, &sigint_action, NULL); |                                               
^ chat.cpp:883:16: note: forward declaration of 
'struct main(int, char**)::sigaction' 883 |         
struct sigaction sigint_action; |                
~~~~~~~~ make: *** [Makefile:195: chat] Error 1

using windows.

1

simpleuserhere OP t1_jcu2ikl wrote

For Windows you need Visual C++ compiler, so install Visual Studio C++ 2019 build tools, follow the instruction here https://github.com/rupeshs/alpaca.cpp#windows

2

ninjasaid13 t1_jcu9nfv wrote

I believe I already have the build.

I still get this error

C:\Users\****\Downloads\alpaca\alpaca.cpp>make chat
I llama.cpp build info: I UNAME_S:  CYGWIN_NT-10.0 I UNAME_P:  
unknown I UNAME_M:  x86_64 I CFLAGS:   -I.              -O3 -
DNDEBUG -std=c11   -fPIC -mfma -mf16c -mavx -mavx2 I 
CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC I 
LDFLAGS: I CC:       cc (GCC) 10.2.0 I CXX:      g++ (GCC) 
10.2.0
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp 
ggml.o utils.o -o chat chat.cpp: In function 'int main(int, 
char**)': chat.cpp:883:26: error: aggregate 'main(int, 
char**)::sigaction sigint_action' has incomplete type and 
cannot be defined 883 |         struct sigaction 
sigint_action; |                          ~~~~~~~~~~~~ 
chat.cpp:885:9: error: 'sigemptyset' was not declared in this 
scope 885 |         sigemptyset (&sigint_action.sa_mask); |         
~~~~~~~~~~ chat.cpp:887:47: error: invalid use of incomplete 
type 'struct main(int, char**)::sigaction' 887 |         
sigaction(SIGINT, &sigint_action, NULL); |                                               
^ chat.cpp:883:16: note: forward declaration of 'struct 
main(int, char**)::sigaction' 883 |         struct sigaction 
sigint_action; |                ~~~~~~~~ make: *** [Makefile:195: chat] Error 1
1

simpleuserhere OP t1_jcu9x05 wrote

Are you using cygwin?

1

ninjasaid13 t1_jcuajwh wrote

yes I have cygwin.

2

simpleuserhere OP t1_jcubta9 wrote

I haven't tried cygwin for Alpaca.cpp.

1

ninjasaid13 t1_jcubyue wrote

so it won't work? do I need to install MinGW?

1

simpleuserhere OP t1_jcuc25e wrote

Yes,

1

ninjasaid13 t1_jcufsqf wrote

I'm getting a new error

C:\Users\ninja\source\repos\alpaca.cpp>make chat
process_begin: CreateProcess(NULL, uname -s, ...) failed. 
process_begin: CreateProcess(NULL, uname -p, ...) failed. 
process_begin: CreateProcess(NULL, uname -m, ...) failed. 
'cc' is not recognized as an internal or external command, 
operable program or batch file. 'g++' is not recognized as an 
internal or external command, operable program or batch file. 
I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I 
CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -
mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -
DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX:
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp 
ggml.o utils.o -o chat process_begin: CreateProcess(NULL, g++ 
-I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp 
ggml.o utils.o -o chat, ...) failed. make (e=2): The system 
cannot find the file specified. Makefile:195: recipe for 
target 'chat' failed make: *** [chat] Error 2
1

Art10001 t1_jcwg7bv wrote

Try installing MSYS2.

1

ninjasaid13 t1_jcwwgt5 wrote

now what?

1

Art10001 t1_jcy2jck wrote

I was asleep, my apologies for not replying earlier.

Run pacman -Syu then pacman -Sy build-essential then cd to the build directory and follow the instructions

1

Meddhouib10 t1_jcptalr wrote

What are the techniques to male such large models run on low ressources ?

15

simpleuserhere OP t1_jcpttav wrote

This model is 4 bit quantized,so it will take less RAM (model size around 4GB)

27

timedacorn369 t1_jcqg4v6 wrote

What is the performance hit with various levels of quantization??

10

starstruckmon t1_jcrbf0m wrote

You can see some benchmarks here

https://github.com/qwopqwop200/GPTQ-for-LLaMa

11

Taenk t1_jcs53iw wrote

The results for LLaMA-33B quantised to 3bit are rather interesting. That would be an extremely potent LLM capable of running on consumer hardware. Pity that there are no test results for the 2bit version.

3

starstruckmon t1_jcswg1g wrote

I've heard from some experienced testers that the 33B model is shockingly bad compared to even the 13B one. Despite what the benchmarks say. That we should either use the 65B one ( very good apparently ) or stick to 13B/7B. Not because of any technical reason but random luck/chance involved with training these models and the resultant quality.

I wonder if there's any truth to it. If you've tested it yourself, I'd love to hear what you thought.

5

Taenk t1_jctdmvi wrote

I haven’t tried the larger models unfortunately. However I wonder how the model could be „shockingly bad“ despite having almost three times the parameter count.

2

starstruckmon t1_jcte34d wrote

🤷

Sometimes models just come out crap. Like BLOOM which has almost the same number of parameters as GPT3, but is absolute garbage in any practical use case. Like a kid from two smart parents that turns out dumb. Just blind chance.

Or they could be wrong. 🤷

3

baffo32 t1_jcronvh wrote

- offloading and accelerating (moving some parts to memory mapped disk or gpu ram, this can also make for quicker loading)

- pruning (removing parts of the model that didn’t end up impacting outputs after training)

- further quantization below 4 bits

- distilling to a mixture of experts?

- factoring and distilling parts out into heuristic algorithms?

- finetuning to specific tasks (e.g. distilling/pruning out all information related to non-relevant languages or domains) this would likely make it very small

EDIT:

- numerous techniques published in papers over the past few years

- distilling into an architecture not limited by e.g. a constraint of being feed forward

3

Art10001 t1_jcwfyw8 wrote

I heard MoE is bad. I have no sources sadly.

1

baffo32 t1_jcxqr2i wrote

i visited cvpr last year and people were saying that moe was what mostly was being used; i haven’t tried these things myself though

1

legendofbrando t1_jcpybhl wrote

Anyone gotten it to run on iOS?

4

1stuserhere t1_jcuyofc wrote

How fast is the model on android, u/simpleuserhere?

1

pkuba208 t1_jcvmhhm wrote

Depends on the hardware

1

Art10001 t1_jcwg5zg wrote

You can really see how phones defeat 10 year old computers, as revealed by their Geekbench 5 scores.

1

pkuba208 t1_jcx3d9i wrote

Well... I run this model on a raspberry pi 4B, but you will need AT LEAST 8gb ram

1

Art10001 t1_jcy2sb5 wrote

Raspberry Pi 4 is far slower than modern phones.

Also there was somebody else saying it probably actually uses 4/6 GB.

1

pkuba208 t1_jcy717u wrote

I know, but android uses 3-4gb ram itself. I run it myself, so I know that it uses from 6-7 gb of ram on the smallest model currently with 4bit quantization

1

Art10001 t1_jcy7rqs wrote

Yes, that's why it was tried in a Pixel 7 which has 8 GB of RAM and maybe even swap.

1

pkuba208 t1_jcy83gf wrote

I use swap too. For now, it can only run on flagships tho. You have to have at least 8gb of ram, because running it directly on let's say 3gb(3gb used by system) ram and 3-5gb SWAP may not even be possible and if it is, then it will be very slow and prone to crashing

1

1stuserhere t1_jcxyj1o wrote

pixel 6 or 7 (or other modern phones from last 2-3 years)

1

pkuba208 t1_jcy7nxg wrote

Should be faster than 1 word per second. Judging by the fact, that modern PC's run it at 5 words per second and a raspberry pi 4b runs it at 1 word per second, it should run somewhere near the 2.5 words per second mark

1

Board_Stock t1_jczly8z wrote

hello, I've recently run the alpaca.cpp on my laptop, but I want to give it a context window so that it can remember conversations, and make it voice activated using python. Can someone guide me on this?

1

ommerike t1_jddjvvn wrote

Is there an APK out there to side load? Would be fun to try on my Pixel 6 Pro without becoming an expert on how to go through the motions of the make stuff...

1