![GPU-Accelerated LLM on a $100 Orange Pi](https://log4dev.com/articles/gpu-accelerated-llm-on-a-100-orange-pi-63/f857d8368ac9c745c2ce77e9ccd7cf11c75165b877eb3a993e5c4c6b46ca611d453e27e6adafdadefd9f7c6c136d426b01f31e4bbe30f0bb23897851d0ad2890.jpg)
GPU-Accelerated LLM on a $100 Orange Pi
TL;DR This post shows GPU-accelerated LLM running smoothly on an embedded device at a reasonable speed. More specifically, on a $100 Orange Pi 5 with Mali GPU, we achieve 2.5 tok/sec for Llama2-7b and 5 tok/sec for RedPajama-3b through Machine Learning Compilation (MLC) techniques. Additionally, we are able to run a Llama-2 13b model at 1.5 tok/sec on a 16GB version of the Orange Pi 5+ under $150. Background Progress in open language models has been catalyzing innovation across question-answering, translation, and creative tasks....