Sparse connectivity is ubiquitous in biological neural networks. In the machine learning field, it has a long history to train dense neural networks then prune them for sparse connectivity. Recently, this research direction was extended to pruning and sparsifying large language models (LLMs), for which many numerical methods have been invented. Previously, we adopted one of such methods to prune the Llama-2 7B model, in order to develop small-sized language models for on-device AI assistants. We figured out that its sparsity limit is 60%, at which the pruned Llama-2 7B model could generate fluent and helpful answers to daily queries despite occasional factual inaccuracies. In this article, we extended previous pruning and evaluation methods to the newly-released Llama-3 8B model. We found that this Llama 3 model has a sparsity limit of 53%, at which it produced fewer factual inaccuracies than Llama 2 at the sparsity limits but required more parameters. We hypothesized that the accuracy improvement is due to the longer context window and larger training dataset used in Llama 3, while the increased parameter requirement is due to its versatility in generating both text and code output. Based on these results and insights, we proposed future directions to boost the model performance after pruning, paving the way for lightweight LLMs as on-device AI assistants.