From Web Data to Information

The Web Information Systems (WIS) research group in the Software Technology (ST) department at Delft University of Technology (TU Delft) concentrates in its research on engineering and science of the Web. The research specifically considers the role of Web data in the engineering of Web-based information systems.

Core focus:

In the Web Information Systems group, we aim at making web information system engineering more effective in processing, retrieving and interpreting human‐generated web data. Extracting semantics from people's actions and behaviors on the web enables systems that are personalized and adaptive. Our first research objective is to understand what human‐generated web data represents in terms of people's actions, interests, intents, and behaviors on the web. Through data science and web science research, we develop theories on how web data can provide insights about system users. Our second objective is to develop new solutions to meet the fundamental challenges in how systems effectively attribute semantics to human‐generated data, given the size and dynamic nature of the web. We implement and analyze approaches for augmenting web systems with the capability to process, retrieve, and interpret web data, and subsequently adapt to their users.

In this context, we investigate three research topics

  • Web‐based user modeling: How can human‐generated data that are retrieved and semantically enriched from the web, be used to enhance systems’ interpretation of people’s actions for user-adaptation?
  • Web data processing and retrieval: How to efficiently process data at web‐scale and design effective information retrieval mechanisms for domain‐specific web tasks?
  • Human‐enhanced web systems: What are effective ways to augment web‐based systems with automated, large‐scale human computation and interaction?

This research includes topics in processing, retrieving, and interpreting Web data, with focus on the special properties of Web data: 1) a large portion of Web data is human-made, e.g. in social networks, natural language, or Twitter streams, and this brings scientific challenges in how to effectively attribute meaning to Web data and how for systems interface with users, e.g. with conversations or explanations; 2) the size of the Web brings challenges in how to efficiently store, index, and analyze data at Web scale. WIS researchers and students strive to advance the state-of-the-art in relevant disciplines like user modeling, Web science, information retrieval, natural language processing, Web engineering, Web data management, user interaction, and crowdsourcing.

At WIS, we are enthusiastic about doing science on the Web and in particular the Social Web, and we love to investigate the power of Web data technologies.

Research within WIS is conducted along four research lines:

  • Data Management
    • Data Management is one of the central challenges in developing modern software systems. The need for more sophisticated Data Management is even more emphasized in the current times of Artificial Intelligence and Big Data-based systems which have even more demanding data requirements than traditional Data Management had to consider. At Web Information Systems, led by assistant professors Christoph Lofi and Asterios Katsifodimos in collaboration with prof. Geert-Jan Houben, we focus on two core aspects of modern Data Management: a) Data Engineering, and b) Scalable Data Management.
      • In Data Engineering, we focus on preparing data for its deployment or usage in a complex AI/data-driven system. This covers for example discovering data, cleaning data, transforming data, or integrating data from heterogenous sources. Also, there is a focus on (domain-specific) meta-data creation and management. Furthermore, aspects of data biases and potentially arising societal issues like misrepresentation and unfairness become focus area. Data Engineering topics are often seen in the context of their application domains, like Digital Humanities, medicine, but also business application like banking.
      • For Scalable Data Management, the focus is on how to cope with the ever-increasing demand for storage and processing power by scaling data operations. This covers for example methods for stream-processing but also flexible distribution schemes or the deployment of scalable AI-models.
    • Keywords: Data Management, Data Engineering, Data Integration, Meta-Data Management, Data Extraction, Stream Processing, Scalable Data Processing, Data-Driven AI, Serverless Computing, Cloud-Native Data Management.
    • Example student project topics: Data Integration, Meta-Data Management for Digital Libraries, Data Enrichment for Digital Humanities, Real-Time Data Analytics, Deployment of ML-Models in Edge Devices, Bias in ML Data.
  • Crowd Computing
    • Crowd Computing is a computational paradigm that advocates for the adoption of human intelligence at scale, in particular the exploitation of the complementary strength of humans and machines, to improve the performance of Web data processing, retrieval, and interpretation systems in terms of: accuracy; adherence to people’s values, goals and needs; seamless interaction within complex social settings; and robustness and adaptability to changing, open-world environments. The Crowd Computing theme is led by assistant professors Ujwal Gadiraju and Jie Yang, in collaboration with prof. Alessandro Bozzon and prof. Geert-Jan Houben. Activities in this research line focus on the creation of mathematical models and computational methods for Crowd Computing, to address both problems of analysis and design of this class of computational systems. Our goal is to seek answer to questions such as: How can humans and machines better collaborate in the creation, analysis, enrichment, and interpretation of (Web) data? How to control and accelerate the knowledge creation process at scale? How to best leverage human intelligence in the management of machine learning training data?
    • Keywords: Crowdsourcing, Human Computation, Crowd Computing, Human-Centered AI.
    • Example student project topics: Content Privatization in Crowdsourcing Tasks, Crowd Computing Techniques for Bias-aware Machine Learning, Conversational Interfaces for Crowd Computing, Understanding and Leveraging Communities for Subjective Inference, Steered Learning from Large Behavioral Data, Human-in-the-Loop Training Data Augmentation.
  • Information Retrieval
    • Lambda-Lab, led by associate professor Claudia Hauff, brings together research expertise in information retrieval (IR), data science and natural language processing. Core IR topics are investigated as well as applications of data science and IR to application domains, most prominently the domain of Massive Open Online Learning. Among the covered topics are:
      • Collaborative search: this research direction is all about making collaborative search (i.e. the search of several users together) efficient in terms of the design of user interface elements and retrieval ranking models.
      • Conversational search: here, search is not restricted to just query/document ranking, but instead the search system engages in a conversation with the user to elicit the information need and present/generate the search results in the manner that is best suited for the user.
      • Search as learning: while standard search systems are optimized for relevance, in this research direction we are interested in optimising the search system for human learning.
      • Neural ranking models: large-scale neural models often perform not great when applied to IR problems. Here we explore how best to design neural ranking models for specific IR tasks.
    • Keywords: collaborative search, neural ranking models, conversational search, search as learning, exploratory search, complex search, natural language processing.
  • Intelligent User Interaction
    • The E(psilon)-lab, led by assistant professor Nava Tintarev, is a research line within the Web Information Systems group and is concerned with human interaction with artificial advice givers, and specifically explanations to support decision making. The E-lab takes a user-centered approach to research, and evaluates the quality of human decision making to drive both interface and algorithm design. The research is currently driven by two applied challenges: 1) Explainable algorithms; and 2) Interactive interfaces for explanations.
    • Keywords: Explanations, Diversity, News Recommendations, Natural Language Processing
    • Example student project topics: Explaining news recommendations on disputed topics, Algorithms to help users discover unexplored news articles, Fair and explainable news summarization, Crowdsourcing explainable annotations for diversity of perspectives, Explaining recommendations to groups with different preferences, Novel interfaces and interactions for explanations.

Our vision on web data drives our research:

  • Vision: Web Data and its Role in Information Systems
    • One characteristic property of Web data is that it transforms the way we need to design and build information systems. Data has always been the main ingredient in an information system to represent the world or the process that the system is serving, both inbound to detect what is going on in that world and outbound to support and run that world. In the traditional approach, the complexity was in the software to make all of this happen and data was designed to fit the software. The Web has brought an abundance of data to 'use' and it enables to make systems with a much better, much more encompassing, and much more accurate view of what is going on in the world the system is meant to serve. It implies a new complexity - this complexity is in the data and relates to the understanding of how the data can be used in the system. It implies that we need to understand the data and what software can make of the data in terms of useful knowledge. In other words, we need to fit the software to the data, for making systems that effectively use web data. As scientists, we are inspired to study data and technology for making sense of data, to make information system engineering make full use of web data.This reversing of the paradigm comes also with a much higher degree of user-centeredness of systems. Web data is often used to assess how system users, like students, customers, travellers, patients, etc., can be served better and more tailored. The abundance of Web data allows to unlock more knowledge that allows a higher degree of customisation and adaptation of systems to users. As scientists, we are therefore specially enthusiastic about studying how data and technology for making sense of data help us towards better user-adaptation. In a broad interpretation, this is all part of research into user modeling, and comes with a variety of data processing research challenges.


In addition to the research lines and teams mentioned above, WIS also hosts the research team involved in LCE-CEL, the Leiden-Delft-Erasmus Centre for Education and Learning, led by professor Marcus Specht.

Selected Research Activities

Examples of Research Activities

  • Example: Learning Analytics and Search as Learning
    • MOOCs (Massive Open Online Courses) represent a form of web information systems that is revolutionising the way education is brought to people. MOOCs offer great opportunities to change how people learn. By analyzing learners' interactions with the course materials and adapting/personalizing the materials and platforms to the learners, we should be able to improve (online) learning. Another area of research here is search as learning, i.e. the exploration of how learners rely on search engines and search processes to facilitate learning.
    • The WIS group uses its expertise in information retrieval, data science and web user modeling for learning analytics at scale for the objective of providing at scale education that fits the learner. Example: Lambda-Lab, ImREAL.
  • Example: Urban Analytics
    • Given that most of the world population lives in cities, to solve urban problems is to improve quality of life for a considerable amount of users. The challenge is to improve the understanding of urban environments by enriching and interpreting big urban data through social sensing. Social Data generated by users in online social networks recently attracted a lot of attention as a potential source of knowledge to better understand urban environments, citizens, and (natural) phenomenon. However, as of today, there is no general theory able to establish the effectiveness of social data for predictive or descriptive purposes: intuitively, solutions that might work for a given city might not be as valuable for another one.
    • The WIS group uses its expertise in social web data analytics to study to what extent social data can be used to represent or describe the targeted reality, possibly in combination with other data sources such as census data. Research subjects include resolution of data, bias of cultural or technological nature in data and conclusions drawn from the data, and crowdsourcing for creating and interpreting social data. Example: SocialGlass, AMS
  • Example: Explainable Advice-Giving systems
    • Artificial Advice Givers support people in making choices and decisions, they propose and evaluate options while involving their human users in the decision-making process. Examples include recommender systems, and (semi-)autonomous systems. Current systems offer limited capabilities when it comes to properties such as transparency and diversity.
    • The WIS group uses its expertise in human-computer interaction and intelligent user interfaces to improve the transparency and decision support of artificial advice givers. Research subjects include: visualizing consumption blind-spots, recommending sequences (of diverse) recommendations, and strategies for supporting critical thinking in classrooms. Example: ENSURE, SuSPECT.
  • Example: Analytics for Academic Document Repositories
    • Academic publications are a central repository of human-knowledge, and are at the core of scientific advancements both in academia itself, but also of industrial progress. Tapping into this vast repository of knowledge is a daunting and challenging task, as the number of available publications is growing with tremendous speed: without proper support, it is often hard or even impossible to find relevant publications related to a given problem in a timely fashion. The near exponential growth of content in the recent years together with the shift to digital resources invalidated many well-proven workflows, demands new solutions suitable for the current age and time.
    • The WIS group uses its expertise in information extraction and query processing to discover meaningful and useful meta-data for annotating academic document repositories, and ·to develop user-centered query capabilities utilizing that meta-data for innovative new query paradigms, as for example visual exploration or facetted navigation. See examples on Christoph Lofi under Research & Project Semantic Digital Libraries and Semantic Open Courseware
  • Example: Democratising Scalable Data Science
    • Given the sheer size of the data that has to be processed in order to train machine learning (ML) models at scale, both data management and ML gain an extra degree of complexity: scalability. The classic single-node database systems are replaced by distributed filesystems and scalable analysis tools (e.g., Hadoop MapReduce, Apache Flink or Spark). For data scientists to setup and be able to operate on such such a plethora of systems, data scientists require expertise on parallel algorithms and scalable data management operations (e.g., parallel joins). Most importantly, data scientists need to fully understand the mathematical properties of parallel ML algorithms and strategies on scaling those algorithms out. As a result data scientists of today have to be “jack of all trades” and have deep knowledge in i) data modelling and cleaning, ii) distributed systems, iii) data management, and iv) machine learning as well as possess Big Data systems programming skills.
    • The WIS group uses its expertise in database systems and scalable data management in order to: 1) develop query languages for bridging data management and machine learning to remove the systems complexity and make data science more accessible to “laymen” data scientists; 2) design novel scalable parallel database operators that are well suited for mixed data management & machine learning pipelines; 3) use and contribute to systems such as Apache Flink, Spark and SystemML (, by adding new features or applying optimisation techniques. Examples:,

Selected Research Activities

Among the current or recent research activities and projects that the WIS group performs in this research are:

  • AMS - Amsterdam Institute for Advanced Metropolitan Solutions
  • ATSearch - Adaptive Faceted Search on Twitter
  • CrossUM - Cross-system User Modeling on the Social Semantic Web
  • Data Bridges - Smart (user) data services in digital cities
  • Delft Data Science - Data Science & Big Data research at TU Delft
  • Devising Metrics for Assessing Echo Chambers, Incivility, and Intolerance
  • ENSURE - ExplaiNing SeqUences in REcommendations
  • GeniUS - Generic user modeling on the Social Semantic Web
  • GRAPPLE - Adaptation in technology-enhanced learning
  • Hera - Semantics-based adaptation engineering
  • ImREAL - Augmenting user models with real-world information in training and learning
  • Net2 - Semantic-based methodologies for networked web and knowledge engineering
  • PoliMedia - Linked Open Politics (winner of LinkedUp Challenge)
  • RDF Gears - Data integration for the Semantic Web
  • SEEQR - Structural indexing and EfficiEnt Query processing on massive RDF data sets
  • SocialGlass - Your City Through the Social Data Lens
  • SuSPECT - Scaffolding Student PErspectives for Critical Thinking
  • TweetUM - Analyzing and modeling user behavior on Twitter for recommending trending news on the Social Web
  • Twitcident - Using relevant tweets during big incidents
  • Twinder - Finding interesting information in Social Web streams
  • U-Sem - Holistic User Modeling on the Social Semantic Web
  • WUDE - Web user demand elicitation in cultural heritage access

Research Community & Service

A selection of upcoming and recently organized conferences or workshops that WIS group members helped to organise and chair:

  • TheWebConference 2020 - The Web Conference (formerly known as WWW conference), Taipei, 20-24 April 2020.
  • TheWebConference 2019 - The Web Conference (formerly known as WWW conference), San Francisco, USA, 13-17 May, 2019.
  • ACM Web Science 2018 - 10th ACM Conference on Web Science, Amsterdam, the Netherlands, 27-30 May 2018.
  • CitRec 2017 - Recsys Workshop on Citizens' Recsys
  • IntRS 2017 - Recsys Workshop on Interfaces and Human Decision Making for Recommender Systems
  • MSR Challenge 2017 - 14th Conference on Mining Software Repositories: mining challenge
  • BeyondMR 2015 - EDBT Workshop on Algorithms and Systems for MapReduce and Beyond, 27 March, 2015
  • EDBT Summer School 2015 - The EDBT Summer School on Graph Data Management 2015, Aug 31 - Sept 4, 2015
  • ACM HT 2015 - 26th ACM Conference on Hypertext and Social Media, Cyprus, September 1-4, 2015.
  • ICWE 2015 - 15th International Conference on Web Engineering, Rotterdam, the Netherlands, June 22-26, 2015.
  • UMAP 2014 - 22nd Conference on User Modeling, Adaptation and Personalization, Aalborg, Denmark, July 7-11, 2014.
  • ICWE 2014 - 14th International Conference on Web Engineering, Toulouse, France, July 1-4, 2014.
  • WebSci 2014 - ACM Web Science 2014 Conference, Bloomington, USA, June 23-26, 2014.
  • UMAP 2013 - 21st Conference on User Modeling, Adaptation and Personalization, Rome, Italy, June 10-14, 2013.
  • Web Engineering at WWW2013 - 22nd International World Wide Web Conference, Rio de Janeiro, Brazil, 13-17 May 2013.

The following is a non-exhaustive list of research events that WIS group members are or have been involved in as chair, organizer or committee member:

  • AIMSA2020, BNAIC2020, CIKM2020, ESWC2020, HAAPIE2020, ICWE2020, IUI2020, LAK20, SEM20-EU, SocInfo20, TheWebConference2020, UMAP2020, WebSci2020
  • BNAIC2019, ESWC2019, HAAPIE2019, HT19, ICWE2019, ISWC2019, IUI2019, LAK19, MSM19, Semantics2019, SocInf019, TheWebConference2019, UMAP2019, WebSci19
  • IJCAI-ECAI2018, CHI2018, HT2018, ICWE2018, ICWSM2018, ISWC2018, IUI2018, LAK2018, UMAP 2018, WebSci18, WWW2018, Recsys2018
  • ESWC2017, EvalUMAP2017, Hypertext2017, ICWE2017, ICSME2017, ISWC2017, L@S2017, MSM2017, MSR2017, Recsys2017, Semantics2017, UMAP2017
  • AIMSA2016, BLINKS2016, EvalUMAP2016, ESWC2016, Hypertext2016, ICWE2016, ISWC2016, MSM2016, UMAP2016, USEWOD2016, WebSci2016, WWW2016
  • CSSWS2015, DeCAT2015, ESWC2015, HT2015, I3E2015, ICWE2015, ISWC2015, MSM2015, PATCH2015, RDSM2015, SPS2015, UMAP2015, USEWOD2015, WebSci2015, WWW2015, GRADES 2015
  • CrowdSens2014, CSSWS2014, ICWE2014, IESD2014, ISWC2014, SP2014, UMAP2014, USEWOD2014, WebSci2014, WISM2014
  • ComposableWeb2013, CulTEL2013, EDBT2013, HT2013, i-KNOW2013, ICWE2013, IESD2013, ISWC2013, MDWE2013, MSM2013, RAMSS2013, SALAD2013, SMERST2013, UMAP2013, WISM2013, WWW2013
  • AAAI2012, ECIR2012, ESWC2012, HT2012, ICWE2012, I-Semantics2012, LAPIS2012, MultiA-Pro2012, RAMSS2012, UMAP2012, WebSci12, WISM2012, WWW2012
  • AUM2011, BEWEB 2011, CHI 2011, CIKM 2011, ComposableWeb2011, DAH2011, EDBT 2011, ESWC 2011, EUROITV2011, FOMI2011, HT2011, ICWE 2011, IJCAI2011, I-Semantics 2011, ISWC 2011, LISC2011, MDWE2011, MMM2011, MODIQUITOUS-2011, MSW2011, SASWeb2011, SIGIR 2011, SocialObjects2012, S3T 2012, UMAP2011, USEWOD2011, UWEB2011, VISSW 2011, WebSci11, WeRE 2011, WIN2011
  • AIMSA 2010, ComposableWeb'10, Coopis 2010, EDBT 2010, EIS 2010, EKAW2010, ESWC2010, HT2010, ICDKE 2010, ICWE2010, KMIS 2010, LUPAS2010, MDWE 2010, PODS 2010, QWE'10, RecsysTEL-2010, SASweb 2010, SLE 2010, SOFSEM2010, UDISW2010, UMAP2010, WABBWUAS2010, WANDS 2010, WebSci10, WECU2010, WISH2010, WSW2010
  • ABIS2009, CIKM 2009, ComposableWeb2009, DAH2009, EDBT 2009, ESWC 2009, HT 2009, ICOODB 2009, ICSC 2009, ICWE 2009, KMIS 2009, Mashups09, MDWE2009, MMM2009, SWIM 09, UMAP09, WISE 2009, WISM2009, WWW2009
  • AH2008, CIKM2008, EDBT 2008, HT 2008, ICWE 2008, MDWE2008, MMM2008, PATCH 2008, SOFSEM 08, WISE 2008, WISM2008, WWW2008

Further, WIS group members are board member of the following journals:


And they have been involved in reviewing and special issue editing for these and many other relevant journals, for example JoDI, JWS, SWJ, DKE, IEEE IntSys, IJHCS, IJSWIS, TiiS, UMUAI.