simpleuserhere OP t1_jcpo1px wrote on March 18, 2023 at 4:24 PM

I have tested Alpaca 7B model on Android (Google Pixel 7).

schorhr t1_jcqwzek wrote on March 18, 2023 at 9:37 PM

That's amazing!

Thank you for that link. With my old laptop and slow internet connection I'm struggling downloading visual studio and getting everything to work. I do have weights but still figuring out why building fails. Is there any way to download a prebuilt version?

simpleuserhere OP t1_jcrfjsh wrote on March 18, 2023 at 11:59 PM

Thanks,What error are you getting? With Vs compiler and cmake we can easily build it.

schorhr t1_jct3v62 wrote on March 19, 2023 at 10:41 AM

Thanks for your reply!

I have not used vs and cmake before, so I am probably making all newbie mistakes. I've sorted out that some paths where not set, and that C:\mingw-32\bin\make.exe doesn't exist but it's now minigw-make.exe.

Now I get the error that

   'C:/MinGW-32/bin/make.exe' '-?'

  failed with:

   C:/MinGW-32/bin/make.exe: invalid option -- ?

And from the few things I've found on-line I gathered it's because the mingw version doesn't support the option, but I should use Vs instead. I am a bit lost. Every time I manage to fix one issue, there's another one. :-)

simpleuserhere OP t1_jct4k2z wrote on March 19, 2023 at 10:51 AM

I have updated readme with Windows build instructions,please check https://github.com/rupeshs/alpaca.cpp#windows

schorhr t1_jct58nc wrote on March 19, 2023 at 11:00 AM

Thanks!

Both of the instructions (for Android which I'm attempting, but also the Windows instructions) result with the > C:/MinGW-32/bin/make.exe: invalid option -- ? error. I can't seem to figure out what make version I should use instead, or how to edit that.

simpleuserhere OP t1_jct9btk wrote on March 19, 2023 at 11:51 AM

For Android build please use Linux ( tested with Ubuntu 20.04)

schorhr t1_jctb6tz wrote on March 19, 2023 at 12:12 PM

Okay. I don't have the capacity right now (old laptop, disk too small to really use a second OS). I appreciate the help! I will once I get a new computer.

ninjasaid13 t1_jcu1odb wrote on March 19, 2023 at 3:49 PM

I have a problem with

C:\Users\****\source\repos\alpaca.cpp\build&gt;make chat
make: *** No rule to make target 'chat'.  Stop.

and

C:\Users\****\source\repos\alpaca.cpp&gt;make chat
I llama.cpp build info: I UNAME_S:  CYGWIN_NT-10.0 I 
UNAME_P:  unknown I UNAME_M:  x86_64 I CFLAGS:   -I.              
-O3 -DNDEBUG -std=c11   -fPIC -mfma -mf16c -mavx -
mavx2 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -
std=c++11 -fPIC I LDFLAGS: I CC:       cc (GCC) 
10.2.0 I CXX:      g++ (GCC) 10.2.0
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -
mfma -mf16c -mavx -mavx2   -c ggml.c -o ggml.o g++ -
I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -c 
utils.cpp -o utils.o g++ -I. -I./examples -O3 -
DNDEBUG -std=c++11 -fPIC chat.cpp ggml.o utils.o -o 
chat chat.cpp: In function 'int main(int, char**)': 
chat.cpp:883:26: error: aggregate 'main(int, 
char**)::sigaction sigint_action' has incomplete type 
and cannot be defined 883 |         struct sigaction 
sigint_action; |                          
~~~~~~~~~~~~ chat.cpp:885:9: error: 'sigemptyset' was 
not declared in this scope 885 |         sigemptyset 
(&amp;sigint_action.sa_mask); |         ~~~~~~~~~~ 
chat.cpp:887:47: error: invalid use of incomplete 
type 'struct main(int, char**)::sigaction' 887 |         
sigaction(SIGINT, &amp;sigint_action, NULL); |                                               
^ chat.cpp:883:16: note: forward declaration of 
'struct main(int, char**)::sigaction' 883 |         
struct sigaction sigint_action; |                
~~~~~~~~ make: *** [Makefile:195: chat] Error 1

using windows.

simpleuserhere OP t1_jcu2ikl wrote on March 19, 2023 at 3:55 PM

For Windows you need Visual C++ compiler, so install Visual Studio C++ 2019 build tools, follow the instruction here https://github.com/rupeshs/alpaca.cpp#windows

ninjasaid13 t1_jcu9nfv wrote on March 19, 2023 at 4:45 PM

I believe I already have the build.

I still get this error

C:\Users\****\Downloads\alpaca\alpaca.cpp&gt;make chat
I llama.cpp build info: I UNAME_S:  CYGWIN_NT-10.0 I UNAME_P:  
unknown I UNAME_M:  x86_64 I CFLAGS:   -I.              -O3 -
DNDEBUG -std=c11   -fPIC -mfma -mf16c -mavx -mavx2 I 
CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC I 
LDFLAGS: I CC:       cc (GCC) 10.2.0 I CXX:      g++ (GCC) 
10.2.0
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp 
ggml.o utils.o -o chat chat.cpp: In function 'int main(int, 
char**)': chat.cpp:883:26: error: aggregate 'main(int, 
char**)::sigaction sigint_action' has incomplete type and 
cannot be defined 883 |         struct sigaction 
sigint_action; |                          ~~~~~~~~~~~~ 
chat.cpp:885:9: error: 'sigemptyset' was not declared in this 
scope 885 |         sigemptyset (&amp;sigint_action.sa_mask); |         
~~~~~~~~~~ chat.cpp:887:47: error: invalid use of incomplete 
type 'struct main(int, char**)::sigaction' 887 |         
sigaction(SIGINT, &amp;sigint_action, NULL); |                                               
^ chat.cpp:883:16: note: forward declaration of 'struct 
main(int, char**)::sigaction' 883 |         struct sigaction 
sigint_action; |                ~~~~~~~~ make: *** [Makefile:195: chat] Error 1

simpleuserhere OP t1_jcu9x05 wrote on March 19, 2023 at 4:47 PM

Are you using cygwin?

ninjasaid13 t1_jcuajwh wrote on March 19, 2023 at 4:51 PM

yes I have cygwin.

simpleuserhere OP t1_jcubta9 wrote on March 19, 2023 at 5:00 PM

I haven't tried cygwin for Alpaca.cpp.

ninjasaid13 t1_jcubyue wrote on March 19, 2023 at 5:01 PM

so it won't work? do I need to install MinGW?

simpleuserhere OP t1_jcuc25e wrote on March 19, 2023 at 5:01 PM

Yes,

ninjasaid13 t1_jcufsqf wrote on March 19, 2023 at 5:26 PM

I'm getting a new error

C:\Users\ninja\source\repos\alpaca.cpp&gt;make chat
process_begin: CreateProcess(NULL, uname -s, ...) failed. 
process_begin: CreateProcess(NULL, uname -p, ...) failed. 
process_begin: CreateProcess(NULL, uname -m, ...) failed. 
'cc' is not recognized as an internal or external command, 
operable program or batch file. 'g++' is not recognized as an 
internal or external command, operable program or batch file. 
I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I 
CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -
mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -
DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX:
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp 
ggml.o utils.o -o chat process_begin: CreateProcess(NULL, g++ 
-I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp 
ggml.o utils.o -o chat, ...) failed. make (e=2): The system 
cannot find the file specified. Makefile:195: recipe for 
target 'chat' failed make: *** [chat] Error 2

Art10001 t1_jcwg7bv wrote on March 20, 2023 at 2:06 AM

Try installing MSYS2.

ninjasaid13 t1_jcwwgt5 wrote on March 20, 2023 at 4:31 AM

now what?

Art10001 t1_jcy2jck wrote on March 20, 2023 at 1:15 PM

I was asleep, my apologies for not replying earlier.

Run pacman -Syu then pacman -Sy build-essential then cd to the build directory and follow the instructions

votegoat t1_jcq47v6 wrote on March 18, 2023 at 6:13 PM

commenting to save for later

Prymu t1_jcqclnb wrote on March 18, 2023 at 7:10 PM

You know that (new) reddit has a save feature

light24bulbs t1_jcqco8i wrote on March 18, 2023 at 7:11 PM

Old reddit does too

Meddhouib10 t1_jcptalr wrote on March 18, 2023 at 4:59 PM

What are the techniques to male such large models run on low ressources ?

simpleuserhere OP t1_jcpttav wrote on March 18, 2023 at 5:02 PM

This model is 4 bit quantized,so it will take less RAM (model size around 4GB)

timedacorn369 t1_jcqg4v6 wrote on March 18, 2023 at 7:35 PM

What is the performance hit with various levels of quantization??

starstruckmon t1_jcrbf0m wrote on March 18, 2023 at 11:27 PM

You can see some benchmarks here

https://github.com/qwopqwop200/GPTQ-for-LLaMa

Taenk t1_jcs53iw wrote on March 19, 2023 at 3:23 AM

The results for LLaMA-33B quantised to 3bit are rather interesting. That would be an extremely potent LLM capable of running on consumer hardware. Pity that there are no test results for the 2bit version.

starstruckmon t1_jcswg1g wrote on March 19, 2023 at 8:54 AM

I've heard from some experienced testers that the 33B model is shockingly bad compared to even the 13B one. Despite what the benchmarks say. That we should either use the 65B one ( very good apparently ) or stick to 13B/7B. Not because of any technical reason but random luck/chance involved with training these models and the resultant quality.

I wonder if there's any truth to it. If you've tested it yourself, I'd love to hear what you thought.

Taenk t1_jctdmvi wrote on March 19, 2023 at 12:38 PM

I haven’t tried the larger models unfortunately. However I wonder how the model could be „shockingly bad“ despite having almost three times the parameter count.

starstruckmon t1_jcte34d wrote on March 19, 2023 at 12:42 PM

🤷

Sometimes models just come out crap. Like BLOOM which has almost the same number of parameters as GPT3, but is absolute garbage in any practical use case. Like a kid from two smart parents that turns out dumb. Just blind chance.

Or they could be wrong. 🤷

[deleted] t1_jcrsk06 wrote on March 19, 2023 at 1:39 AM

[deleted]

[deleted] t1_jcqj70f wrote on March 18, 2023 at 7:57 PM

[removed]

[deleted] t1_jcr18u4 wrote on March 18, 2023 at 10:10 PM

[deleted]

baffo32 t1_jcronvh wrote on March 19, 2023 at 1:08 AM

- offloading and accelerating (moving some parts to memory mapped disk or gpu ram, this can also make for quicker loading)

- pruning (removing parts of the model that didn’t end up impacting outputs after training)

- further quantization below 4 bits

- distilling to a mixture of experts?

- factoring and distilling parts out into heuristic algorithms?

- finetuning to specific tasks (e.g. distilling/pruning out all information related to non-relevant languages or domains) this would likely make it very small

EDIT:

- numerous techniques published in papers over the past few years

- distilling into an architecture not limited by e.g. a constraint of being feed forward

Art10001 t1_jcwfyw8 wrote on March 20, 2023 at 2:04 AM

I heard MoE is bad. I have no sources sadly.

baffo32 t1_jcxqr2i wrote on March 20, 2023 at 11:21 AM

i visited cvpr last year and people were saying that moe was what mostly was being used; i haven’t tried these things myself though

legendofbrando t1_jcpybhl wrote on March 18, 2023 at 5:33 PM

Anyone gotten it to run on iOS?

Taenk t1_jcs5eon wrote on March 19, 2023 at 3:26 AM

A proper port to the neural engine would be especially interesting. There was one by Apple for Stable Diffusion.

360macky t1_jcsanpu wrote on March 19, 2023 at 4:15 AM

Thanks!

Art10001 t1_jcwg2pl wrote on March 20, 2023 at 2:05 AM

CoreML.

Please, review this link and associated research paper: https://github.com/apple/ml-ane-transformers

[deleted] t1_jcq6jko wrote on March 18, 2023 at 6:28 PM

[deleted]

Pale-Dentist330 t1_jcr3e9m wrote on March 18, 2023 at 10:26 PM

Can you add the steps here?

simpleuserhere OP t1_jcreufr wrote on March 18, 2023 at 11:53 PM

Hi, please check this branch https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support

1stuserhere t1_jcuyofc wrote on March 19, 2023 at 7:30 PM

How fast is the model on android, u/simpleuserhere?

pkuba208 t1_jcvmhhm wrote on March 19, 2023 at 10:13 PM

Depends on the hardware

Art10001 t1_jcwg5zg wrote on March 20, 2023 at 2:06 AM

You can really see how phones defeat 10 year old computers, as revealed by their Geekbench 5 scores.

pkuba208 t1_jcx3d9i wrote on March 20, 2023 at 5:53 AM

Well... I run this model on a raspberry pi 4B, but you will need AT LEAST 8gb ram

Art10001 t1_jcy2sb5 wrote on March 20, 2023 at 1:17 PM

Raspberry Pi 4 is far slower than modern phones.

Also there was somebody else saying it probably actually uses 4/6 GB.

pkuba208 t1_jcy717u wrote on March 20, 2023 at 1:51 PM

I know, but android uses 3-4gb ram itself. I run it myself, so I know that it uses from 6-7 gb of ram on the smallest model currently with 4bit quantization

Art10001 t1_jcy7rqs wrote on March 20, 2023 at 1:56 PM

Yes, that's why it was tried in a Pixel 7 which has 8 GB of RAM and maybe even swap.

pkuba208 t1_jcy83gf wrote on March 20, 2023 at 1:59 PM

I use swap too. For now, it can only run on flagships tho. You have to have at least 8gb of ram, because running it directly on let's say 3gb(3gb used by system) ram and 3-5gb SWAP may not even be possible and if it is, then it will be very slow and prone to crashing

1stuserhere t1_jcxyj1o wrote on March 20, 2023 at 12:41 PM

pixel 6 or 7 (or other modern phones from last 2-3 years)

pkuba208 t1_jcy7nxg wrote on March 20, 2023 at 1:55 PM

Should be faster than 1 word per second. Judging by the fact, that modern PC's run it at 5 words per second and a raspberry pi 4b runs it at 1 word per second, it should run somewhere near the 2.5 words per second mark

Board_Stock t1_jczly8z wrote on March 20, 2023 at 7:33 PM

hello, I've recently run the alpaca.cpp on my laptop, but I want to give it a context window so that it can remember conversations, and make it voice activated using python. Can someone guide me on this?

ommerike t1_jddjvvn wrote on March 23, 2023 at 4:59 PM

Is there an APK out there to side load? Would be fun to try on my Pixel 6 Pro without becoming an expert on how to go through the motions of the make stuff...

Comments