Text classification is a fundamental task in NLP that is used in several real-life tasks and applications. Large pre-trained language models such as BERT achieve state-of-the-art performance in several NLP tasks including text classification tasks. Although BERT boosts text classification performance, the common way of using it for classification lacks many aspects of its advantages. This work rethinks the way of using BERT final layer and hidden layers embeddings by proposing different aggregation architectures for text classification tasks such as sentiment analysis and sarcasm detection. This research also proposes different approaches for using BERT as a feature extractor without fine-tuning whose performance surpasses its fine-tuning counterpart. It also proposes promising multi-task learning aggregation architectures to improve the performance of the related classification problems. The experiments of the different architectures show that freezing BERT can outperform fine-tuning it for sentiment analysis. The experiments also show that multi-task learning while freezing BERT boosts the performance of yet hard tasks such as sarcasm detection. The best-performing models achieved new state-of-the-art performance on the ArSarcasm-v2 dataset for Arabic sarcasm detection and sentiment analysis. For multi-task learning and freezing BERT, a new SOTA F1-score of 64.41 was achieved for the sarcasm detection with a 3.47% improvement and near SOTA FPN of 75.78 for the sentiment classification. For single-task learning, a new SOTA FPN of 75.26 was achieved for the sentiment with a 1.81% improvement.