From Web Data to Information

The role of Web data in the engineering of Web-based information systems.

Core focus:

In the Web Information Systems group, we aim at making web information system engineering more effective in processing, retrieving and interpreting human‐generated web data. Extracting semantics from people's actions and behaviors on the web enables systems that are personalized and adaptive. Our first research objective is to understand what human‐generated web data represents in terms of people's actions, interests, intents, and behaviors on the web. Through data science and web science research, we develop theories on how web data can provide insights about system users. Our second objective is to develop new solutions to meet the fundamental challenges in how systems effectively attribute semantics to human‐generated data, given the size and dynamic nature of the web. We implement and analyze approaches for augmenting web systems with the capability to process, retrieve, and interpret web data, and subsequently adapt to their users.

In this context, we investigate three research topics

  • Web‐based user modeling: How can human‐generated data that are retrieved and semantically enriched from the web, be used to enhance systems’ interpretation of people’s actions for user-adaptation?
  • Web data processing and retrieval: How to efficiently process data at web‐scale and design effective information retrieval mechanisms for domain‐specific web tasks?
  • Human‐enhanced web systems: What are effective ways to augment web‐based systems with automated, large‐scale human computation and interaction?

This research includes topics in processing, retrieving, and interpreting Web data, with focus on the special properties of Web data: 1) a large portion of Web data is human-made, e.g. in social networks or Twitter streams, and this brings scientific challenges in how to effectively attribute meaning to Web data; 2) the size of the Web brings challenges in how to efficiently store, index, and analyze data at Web scale. WIS researchers and students strive to advance the state-of-the-art in relevant disciplines like user modeling, Web science, information retrieval, Web engineering, Web data management, user interaction, and crowdsourcing.

At WIS, we are enthusiastic about doing science on the Web and in particular the Social Web, and we love to investigate the power of Web data technologies.

A few examples of research activities:

  • Example: Learning Analytics and Search as Learning

    • MOOCs (Massive Open Online Courses) represent a form of web information systems that is revolutionising the way education is brought to people. MOOCs offer great opportunities to change how people learn. By analyzing learners' interactions with the course materials and adapting/personalizing the materials and platforms to the learners, we should be able to improve (online) learning. Another area of research here is search as learning, i.e. the exploration of how learners rely on search engines and search processes to facilitate learning.
    • The WIS group uses its expertise in information retrieval, data science and web user modeling for learning analytics at scale for the objective of providing at scale education that fits the learner. Example: Lambda-Lab, ImREAL.

  • Example: Urban Analytics

    • Given that most of the world population lives in cities, to solve urban problems is to improve quality of life for a considerable amount of users. The challenge is to improve the understanding of urban environments by enriching and interpreting big urban data through social sensing. Social Data generated by users in online social networks recently attracted a lot of attention as a potential source of knowledge to better understand urban environments, citizens, and (natural) phenomenon. However, as of today, there is no general theory able to establish the effectiveness of social data for predictive or descriptive purposes: intuitively, solutions that might work for a given city might not be as valuable for another one.
    • The WIS group uses its expertise in social web data analytics to study to what extent social data can be used to represent or describe the targeted reality, possibly in combination with other data sources such as census data. Research subjects include resolution of data, bias of cultural or technological nature in data and conclusions drawn from the data, and crowdsourcing for creating and interpreting social data. Example: SocialGlass, AMS

  • Example: Explainable Advice-Giving systems

    • Artificial Advice Givers support people in making choices and decisions, they propose and evaluate options while involving their human users in the decision-making process. Examples include recommender systems, and (semi-)autonomous systems. Current systems offer limited capabilities when it comes to properties such as transparency and diversity.
    • The WIS group uses its expertise in human-computer interaction and intelligent user interfaces to improve the transparency and decision support of artificial advice givers. Research subjects include: visualizing consumption blind-spots, recommending sequences (of diverse) recommendations, and strategies for supporting critical thinking in classrooms. Example: ENSURE, SuSPECT.

  • Example: Analytics for Academic Document Repositories

    • Academic publications are a central repository of human-knowledge, and are at the core of scientific advancements both in academia itself, but also of industrial progress. Tapping into this vast repository of knowledge is a daunting and challenging task, as the number of available publications is growing with tremendous speed: without proper support, it is often hard or even impossible to find relevant publications related to a given problem in a timely fashion. The near exponential growth of content in the recent years together with the shift to digital resources invalidated many well-proven workflows, demands new solutions suitable for the current age and time.
    • The WIS group uses its expertise in information extraction and query processing to discover meaningful and useful meta-data for annotating academic document repositories, and ·to develop user-centered query capabilities utilizing that meta-data for innovative new query paradigms, as for example visual exploration or facetted navigation. See examples on Christoph Lofi under Research & Project Semantic Digital Libraries and Semantic Open Courseware

  • Example: Democratising Scalable Data Science

    • Given the sheer size of the data that has to be processed in order to train machine learning (ML) models at scale, both data management and ML gain an extra degree of complexity: scalability. The classic single-node database systems are replaced by distributed filesystems and scalable analysis tools (e.g., Hadoop MapReduce, Apache Flink or Spark). For data scientists to setup and be able to operate on such such a plethora of systems, data scientists require expertise on parallel algorithms and scalable data management operations (e.g., parallel joins). Most importantly, data scientists need to fully understand the mathematical properties of parallel ML algorithms and strategies on scaling those algorithms out. As a result data scientists of today have to be “jack of all trades” and have deep knowledge in i) data modelling and cleaning, ii) distributed systems, iii) data management, and iv) machine learning as well as possess Big Data systems programming skills.
    • The WIS group uses its expertise in database systems and scalable data management in order to: 1) develop query languages for bridging data management and machine learning to remove the systems complexity and make data science more accessible to “laymen” data scientists; 2) design novel scalable parallel database operators that are well suited for mixed data management & machine learning pipelines; 3) use and contribute to systems such as Apache Flink, Spark and SystemML (, by adding new features or applying optimisation techniques. Examples:,

Our vision on web data drives our research:

  • Vision: Web Data and its Role in Information Systems

    • One characteristic property of Web data is that it transforms the way we need to design and build information systems. Data has always been the main ingredient in an information system to represent the world or the process that the system is serving, both inbound to detect what is going on in that world and outbound to support and run that world. In the traditional approach, the complexity was in the software to make all of this happen and data was designed to fit the software. The Web has brought an abundance of data to 'use' and it enables to make systems with a much better, much more encompassing, and much more accurate view of what is going on in the world the system is meant to serve. It implies a new complexity - this complexity is in the data and relates to the understanding of how the data can be used in the system. It implies that we need to understand the data and what software can make of the data in terms of useful knowledge. In other words, we need to fit the software to the data, for making systems that effectively use web data. As scientists, we are inspired to study data and technology for making sense of data, to make information system engineering make full use of web data.This reversing of the paradigm comes also with a much higher degree of user-centeredness of systems. Web data is often used to assess how system users, like students, customers, travellers, patients, etc., can be served better and more tailored. The abundance of Web data allows to unlock more knowledge that allows a higher degree of customisation and adaptation of systems to users. As scientists, we are therefore specially enthusiastic about studying how data and technology for making sense of data help us towards better user-adaptation. In a broad interpretation, this is all part of research into user modeling, and comes with a variety of data processing research challenges.

Selected Research Activities

Selected Research Activities

Among the current or recent research activities and projects that the WIS group performs in this research are:

  • AMS - Amsterdam Institute for Advanced Metropolitan Solutions
  • ATSearch - Adaptive Faceted Search on Twitter
  • CrossUM - Cross-system User Modeling on the Social Semantic Web
  • Data Bridges - Smart (user) data services in digital cities
  • Delft Data Science - Data Science & Big Data research at TU Delft
  • Devising Metrics for Assessing Echo Chambers, Incivility, and Intolerance
  • ENSURE - ExplaiNing SeqUences in REcommendations
  • GeniUS - Generic user modeling on the Social Semantic Web
  • GRAPPLE - Adaptation in technology-enhanced learning
  • Hera - Semantics-based adaptation engineering
  • ImREAL - Augmenting user models with real-world information in training and learning
  • Net2 - Semantic-based methodologies for networked web and knowledge engineering
  • PoliMedia - Linked Open Politics (winner of LinkedUp Challenge)
  • RDF Gears - Data integration for the Semantic Web
  • SEEQR - Structural indexing and EfficiEnt Query processing on massive RDF data sets
  • SocialGlass - Your City Through the Social Data Lens
  • SuSPECT - Scaffolding Student PErspectives for Critical Thinking
  • TweetUM - Analyzing and modeling user behavior on Twitter for recommending trending news on the Social Web
  • Twitcident - Using relevant tweets during big incidents
  • Twinder - Finding interesting information in Social Web streams
  • U-Sem - Holistic User Modeling on the Social Semantic Web
  • WUDE - Web user demand elicitation in cultural heritage access

Research Community & Service

A selection of upcoming and recently organized conferences or workshops that WIS group members helped to organise and chair:

  • TheWebConference 2020 - The Web Conference (formerly known as WWW conference), Taipei, 20-24 April 2020.
  • TheWebConference 2019 - The Web Conference (formerly known as WWW conference), San Francisco, USA, 13-17 May, 2019.
  • ACM Web Science 2018 - 10th ACM Conference on Web Science, Amsterdam, the Netherlands, 27-30 May 2018.
  • CitRec 2017 - Recsys Workshop on Citizens' Recsys
  • IntRS 2017 - Recsys Workshop on Interfaces and Human Decision Making for Recommender Systems
  • MSR Challenge 2017 - 14th Conference on Mining Software Repositories: mining challenge
  • BeyondMR 2015 - EDBT Workshop on Algorithms and Systems for MapReduce and Beyond, 27 March, 2015
  • EDBT Summer School 2015 - The EDBT Summer School on Graph Data Management 2015, Aug 31 - Sept 4, 2015
  • ACM HT 2015 - 26th ACM Conference on Hypertext and Social Media, Cyprus, September 1-4, 2015.
  • ICWE 2015 - 15th International Conference on Web Engineering, Rotterdam, the Netherlands, June 22-26, 2015.
  • UMAP 2014 - 22nd Conference on User Modeling, Adaptation and Personalization, Aalborg, Denmark, July 7-11, 2014.
  • ICWE 2014 - 14th International Conference on Web Engineering, Toulouse, France, July 1-4, 2014.
  • WebSci 2014 - ACM Web Science 2014 Conference, Bloomington, USA, June 23-26, 2014.
  • UMAP 2013 - 21st Conference on User Modeling, Adaptation and Personalization, Rome, Italy, June 10-14, 2013.
  • Web Engineering at WWW2013 - 22nd International World Wide Web Conference, Rio de Janeiro, Brazil, 13-17 May 2013.

The following is a non-exhaustive list of research events that WIS group members are or have been involved in as chair, organizer or committee member:

  • IUI2020, TheWebConference2020
  • IUI2019, TheWebConference2019, UMAP2019
  • IJCAI-ECAI2018, CHI2018, HT2018, ICWE2018, ICWSM2018, ISWC2018, IUI2018, LAK2018, UMAP 2018, WebSci18, WWW2018, Recsys2018
  • ESWC2017, EvalUMAP2017, Hypertext2017, ICWE2017, ICSME2017, ISWC2017, L@S2017, MSM2017, MSR2017, Recsys2017, Semantics2017, UMAP2017
  • AIMSA2016, BLINKS2016, EvalUMAP2016, ESWC2016, Hypertext2016, ICWE2016, ISWC2016, MSM2016, UMAP2016, USEWOD2016, WebSci2016, WWW2016
  • CSSWS2015, DeCAT2015, ESWC2015, HT2015, I3E2015, ICWE2015, ISWC2015, MSM2015, PATCH2015, RDSM2015, SPS2015, UMAP2015, USEWOD2015, WebSci2015, WWW2015, GRADES 2015
  • CrowdSens2014, CSSWS2014, ICWE2014, IESD2014, ISWC2014, SP2014, UMAP2014, USEWOD2014, WebSci2014, WISM2014
  • ComposableWeb2013, CulTEL2013, EDBT2013, HT2013, i-KNOW2013, ICWE2013, IESD2013, ISWC2013, MDWE2013, MSM2013, RAMSS2013, SALAD2013, SMERST2013, UMAP2013, WISM2013, WWW2013
  • AAAI2012, ECIR2012, ESWC2012, HT2012, ICWE2012, I-Semantics2012, LAPIS2012, MultiA-Pro2012, RAMSS2012, UMAP2012, WebSci12, WISM2012, WWW2012
  • AUM2011, BEWEB 2011, CHI 2011, CIKM 2011, ComposableWeb2011, DAH2011, EDBT 2011, ESWC 2011, EUROITV2011, FOMI2011, HT2011, ICWE 2011, IJCAI2011, I-Semantics 2011, ISWC 2011, LISC2011, MDWE2011, MMM2011, MODIQUITOUS-2011, MSW2011, SASWeb2011, SIGIR 2011, SocialObjects2012, S3T 2012, UMAP2011, USEWOD2011, UWEB2011, VISSW 2011, WebSci11, WeRE 2011, WIN2011
  • AIMSA 2010, ComposableWeb'10, Coopis 2010, EDBT 2010, EIS 2010, EKAW2010, ESWC2010, HT2010, ICDKE 2010, ICWE2010, KMIS 2010, LUPAS2010, MDWE 2010, PODS 2010, QWE'10, RecsysTEL-2010, SASweb 2010, SLE 2010, SOFSEM2010, UDISW2010, UMAP2010, WABBWUAS2010, WANDS 2010, WebSci10, WECU2010, WISH2010, WSW2010
  • ABIS2009, CIKM 2009, ComposableWeb2009, DAH2009, EDBT 2009, ESWC 2009, HT 2009, ICOODB 2009, ICSC 2009, ICWE 2009, KMIS 2009, Mashups09, MDWE2009, MMM2009, SWIM 09, UMAP09, WISE 2009, WISM2009, WWW2009
  • AH2008, CIKM2008, EDBT 2008, HT 2008, ICWE 2008, MDWE2008, MMM2008, PATCH 2008, SOFSEM 08, WISE 2008, WISM2008, WWW2008

Further, WIS group members are board member of the following journals:


And they have been involved in reviewing and special issue editing for these and many other relevant journals, for example JoDI, JWS, SWJ, DKE, IEEE IntSys, IJHCS, IJSWIS, TiiS, UMUAI.