One of the main applications of artificial intelligence is writing computer code. A recent study examining ChatGPT’s performance in this task found it to be quite proficient.
Study Overview and Findings
The study, published in the June issue of IEEE Transactions on Software Engineering, tested GPT-3.5 on 728 coding tasks from the LeetCode benchmarking platform. These tasks were conducted in five programming languages: C, C++, Java, JavaScript, and Python.
On problems that existed in LeetCode until 2021, ChatGPT solved easy problems 89% of the time, medium problems 71% of the time, and hard problems 40% of the time. However, when testing on problems that appeared in LeetCode after 2021, the results for solving easy, medium, and difficult problems decreased to 52%, 40%, and 0.66%, respectively. It is important to note that ChatGPT was initially trained on data up to 2021, and its knowledge base was only expanded at the end of 2023.
“When it comes to algorithm issues post-2021, ChatGPT’s ability to generate functionally correct code suffers. Sometimes it cannot understand the meaning of questions even for easy-level problems,” says Yutian Tang, a lecturer at the University of Glasgow who took part in the study. “A reasonable hypothesis for why ChatGPT is better at addressing algorithm problems through 2021 is that these problems occur frequently in the training dataset.”
Advantages and Limitations
The researchers also note that ChatGPT is better at correcting human errors than its own and can generate code with less runtime and memory 50% of the time compared to a human, adds NIX Solutions. The code generated by ChatGPT contained a decent number of errors, although “many of them were easily fixable,” IEEE Spectrum writes. “The generated C code was the most complex, followed by C++ and Python, which was similar in complexity to human-written code.”
We’ll keep you updated on further developments in AI coding capabilities as more research emerges.