This research compares the performance of deep learning models and large language models in a time series traffic flow prediction. The evaluation results reveal distinct model performance differences based on dataset selection and input length. Deep learning models show similar performance between different datasets, while large language models exhibit significant improvements when trained on post-COVID-19 data. Regarding input length, longer sequences generally enhance predictions. Large language models consistently reduce error metrics as input length increases, while deep learning models maintain a relatively stable trend. These findings highlight the superior capacity of large language models to process and utilise long-term dependencies in time series prediction.