The link between artificial intelligence (AI) and software engineering

The link between artificial intelligence (AI) and software engineering

News - 24 May 2023 - Webredactie Communication

Developments are rapid around data, algorithms, machine learning and artificial intelligence (AI), especially since the launch of ChatGPT late last year. Software engineering is highly relevant here, because AI systems are essentially made up of software, and also because the two fields influence each other. A conversation about the relationship between AI and software with Geert-Jan Houben, pro-vice rector of AI, Data and Digitalisation, leader of the TU Delft AI Initiative and professor of Web Information Systems, and Arie van Deursen, professor of Software Engineering at TU Delft's Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS).

How is software concerned with data, algorithms, machine learning, deep learning, ChatGPT and AI?

Arie van Deursen: Classic software development involved telling a machine, step by step, what it should do. With large amounts of data, machine learning has taken off, and we can now learn a lot from data, such as patterns and behaviour.
Geert-Jan Houben: You hear a lot about learning from data, and about algorithms and their need for data. But all of that ultimately runs in software systems, which have to be made for the application. The fact that data and machine learning are now in software on a large scale does not change this. Ultimately, you still have to make good software systems. They may be different from before, but the concepts of software engineering can be applied when engineering new AI- or machine learning-based software. After all, it's still a process of designing, building, testing and releasing software for end users and clients – and that's what we have been doing at TU Delft for years.

 

The concepts of software engineering can be applied when engineering new AI- or machine-learning-based software. After all, it's still a process of designing, building, testing and releasing software for end users and clients – and that's what we have been doing at TU Delft for years.

Geert-Jan Houben

How has software engineering been changed by the availability of data and AI?

Arie vvan Deursen: Originally, an algorithm involved knowing exactly how to solve a problem and writing it out in a series of instructions. In other words, a kind of recipe. Nowadays, when we use the word ‘algorithm’ we mean a machine-learning algorithm, trained using all kinds of data. We call that a model, which can algorithmically make certain recommendations.

So instead of using a step-by-step recipe in advance, AI and machine learning enable self-learning algorithms. Do you have an example?

Arie van Deursen: Think of product recommendations in an online bookstore. Previously, you would write an algorithm, for example recommending the young adult category for someone under 25 and novels for someone older. You then translated that algorithm into a programming language, used to build software that followed exactly this recipe. Nowadays, the availability of a lot of data allows for self-learning. You can look at customers' past behaviour: which books they look at, mark as favourites or buy. Recommendations can then be based on that.

Software changes used to be rolled out in new releases. Is that still necessary with self-learning algorithms?

Arie van Deursen: In fact, existing software does still have to be renewed. With software engineering you can apply learning algorithms to some extent, but not all software has something to learn. Sometimes there is just legislation that tells you how something should be done, and then we have to programme it exactly that way. So you still need that more classical, static part in software engineering.

Geert-Jan Houben: It's not a question of classical software versus machine-learned software, or even software versus AI. Software can contain pre-programmed elements and machine-learned elements, and therein lies the design challenge: how do you make sure that the whole thing still functions properly and can be understood? Take a self-driving car. Something like how the steering wheel should react can be programmed in advance. But there is also a part that might be too complicated to programme in advance. In that case you can choose to let the car's software actually collect data and start learning. Both approaches are in there.

It's not a question of classical software versus machine-learned software, or even software versus AI. Software has pre-programmed elements and machine-learned elements, and therein lies the design challenge: how do you make sure that the whole thing still functions properly and can be understood?

Geert-Jan Houben

Is ChatGPT and generative AI overall just hype, or is something revolutionary going on?

Arie van Deursen: The large language model on which ChatGPT is based, GPT-3 and by now already GPT-4, emerged from a so-called Transformer paper from 2017. One could say that this is when the generative AI revolution started, especially in data science. Since November last year, ChatGPT has been very widely accessible to the general public, with a pleasant chat interface. As a result, everyone can see how big the revolution is.

Geert-Jan Houben: There are two aspects that now come together: on the one hand great language models, and on the other the interface that makes an application attractive. Instead of getting a list of 10 or 20 answers, you now get one. It’s written in a quasi-human form, and can build on previous answers. Great strides are being made on both aspects, and what exactly they can mean we have yet to discover.

How does all of this relate to software engineering?

Arie van Deusen: Programming languages appear to be very similar to ordinary languages. They satisfy the same statistical properties as natural language. All of the techniques in language models can also be applied to source code. After all, when writing software there is also a pattern. And that, in turn, can be machine-learned. This is what we are researching at TU Delft. We look at how large language models can help software developers to be more productive. For tasks that happen 90% of the time, these kinds of language models work well enough (think of AI programming assistants such as Github CoPilot and ChatGPT). There is, however, the problem we call ‘hallucinating’ with these kinds of language models. Answers given by AI tools sound very sensible, but are sometimes incorrect or incomplete. The future lies in combining these kinds of language models with ways to verify the outcomes.

Programming languages appear to be very similar to ordinary languages. They satisfy the same statistical properties as natural language. All of the techniques in language models can also be applied to source code.

Arie van Deursen

Geert-Jan Houben: This reminds me of the development that internet search has gone through. First, we searched by words or terms to find information. Then we discovered that words alone didn't always get you to the right place. For example, searching for ‘apple’ returned both the fruit and the computer: you missed the meaning behind the word. The same seems to be happening now with large language models, which are first based on text representation. We will have to discover where that works and where it doesn't.

TU Delft works with many organisations testing and researching what’s going on in practice. Arie, what are you looking into?

Arie vvan Deursen: With the AI for FinTech research project, we are looking at explainability, integration of different data sources and software engineering at ING. 50,000 people work at ING, including 15,000 software developers. Among other things, we are looking at using AI to test the software systems that ING builds – and how we can ensure that testing takes less time and energy.

Software of large companies, government agencies, or implementing organisations consists of millions of lines of code. If a developer needs to change a piece of code, it can help to find the person who has worked on it in the past, for example to perform a code review. We placed organisations’ data on who did what and when (and who was present) into a model, which then learned who would be best to engage and assess. We know from previous research that it helps to give that person a 'nudge', reminding them to review the piece of code. This speeds up the modification process. Besides doing research with organisations such as ING and Microsoft on improving software development processes, we also look at keeping software systems up and running in those processes. In doing so, we draw on run-time data, as well as incident data.

Geert-Jan Houben: The great thing about this kind of collaboration is that you can test real-world problems, and learn from them. It’s how we can promote both science and its practical impacts – the modern way of doing science. The result is a set of insights and recommendations that improves software and accelerates and enhances the software development process.

About that 'nudge': does software engineering involve more than technology?

Arie van Deursen: Zeker. Definitely. At TU Delft, we do empirical software development. That's about building software on the one hand, but also about understanding why things are sometimes difficult. We investigate this by analysing data, but also by interviewing people. So while the process is about something technical, like software, it's also mostly about people and how to improve processes at scale.

AI is increasingly automating the more predictable tasks, and the more human-centric tasks are becoming more intensive. With pre-programmed software and machine learning software working well together. I think this makes software development more interesting.

Arie van Deursen

What do developments like generative AI mean for software engineering education at TU Delft?

Arie van Deursen: Programming will remain, but in the future you’ll be helped more and more by AI programming assistants. We need to train students to work sensibly with those kinds of tools. Students start with simple programming tasks, which they complete themselves. After all, if they start with help from Github CoPilot or ChatGPT they won't learn the basics of programming. Eventually they should be able to work with such tools, however, and extract value from the answers.

Geert-Jan Houben: To enable students and professionals to use this wisely, we need to research it. By looking at how ChatGPT makes suggestions, we can investigate what exactly is happening and what such a large language model bases its answers on. Tips and tricks can then follow from our research, showing how to use these kinds of tools.

 

Where do you see AI and software engineering going next?

Arie van Deursen: Software development depends on humans, whose work revolves a lot around communication and natural language. AI is increasingly automating the more predictable tasks, and the more human-centric tasks are becoming more intensive. Ultimately, you want software that fits both users’ and clients’ needs, with pre-programmed software and machine learning software working well together. I think this makes software development more interesting. Building simple systems is becoming easier and easier, and in that sense it is even democratising: soon anyone will be able to create software with a natural language as an interface. That is a higher goal we have always pursued, and a goal that I like.

More about AI at TU Delft

At Delft University of Technology, we believe that AI technology is vital to create a more sustainable, safer and healthier future. We research, design and engineer AI technology and study its application in society. AI technology plays a key role in each of our eight faculties and is an integral part of the education of our students. Through AI education, research and innovation we create impact for a better society. Visit our website to find out what is happening in AI research, education and innovation in AI, Data & Digitalisation: www.tudelft.nl/ai