The Web Information Systems (WIS) research group in the Software Technology (ST) department at Delft University of Technology (TU Delft) concentrates in its research on engineering and science of the Web. The research specifically considers the role of Web data in the engineering of Web-based information systems.
In the Web Information Systems group, we aim at making web information system engineering more effective in processing, retrieving and interpreting human‐generated web data. Extracting semantics from people's actions and behaviors on the web enables systems that are personalized and adaptive. Our first research objective is to understand what human‐generated web data represents in terms of people's actions, interests, intents, and behaviors on the web. Through data science and web science research, we develop theories on how web data can provide insights about system users. Our second objective is to develop new solutions to meet the fundamental challenges in how systems effectively attribute semantics to human‐generated data, given the size and dynamic nature of the web. We implement and analyze approaches for augmenting web systems with the capability to process, retrieve, and interpret web data, and subsequently adapt to their users.
In this context, we investigate three research topics
- Web‐based user modeling: How can human‐generated data that are retrieved and semantically enriched from the web, be used to enhance systems’ interpretation of people’s actions for user-adaptation?
- Web data processing and retrieval: How to efficiently process data at web‐scale and design effective information retrieval mechanisms for domain‐specific web tasks?
- Human‐enhanced web systems: What are effective ways to augment web‐based systems with automated, large‐scale human computation and interaction?
This research includes topics in processing, retrieving, and interpreting Web data, with focus on the special properties of Web data: 1) a large portion of Web data is human-made, e.g. in social networks, natural language, or Twitter streams, and this brings scientific challenges in how to effectively attribute meaning to Web data and how for systems interface with users, e.g. with conversations or explanations; 2) the size of the Web brings challenges in how to efficiently store, index, and analyze data at Web scale. WIS researchers and students strive to advance the state-of-the-art in relevant disciplines like user modeling, Web science, information retrieval, natural language processing, Web engineering, Web data management, user interaction, and crowdsourcing.
WIS activities & news
Jie Yang is joining WIS as assistant professor. Here is Jie introducing himself:
I am joining the Web Information Systems group as Assistant Professor. Before coming back to Delft, I was a Machine Learning Scientist at Alexa Shopping, Amazon Research, based in Seattle, and a Senior Researcher at the eXascale Infolab, University of Fribourg-Switzerland. I received my Ph.D. from TU Delft in 2017, M.Sc. from TU Eindhoven in 2013, and B.Eng. from Zhejiang University in 2011. During my M.Sc., I also spent some time at Philips Research.
My research focuses on human-centered machine learning for Web-scale information systems, aiming at leveraging the joint power of human and machine intelligence for understanding and making use of data in large-scale information systems. Over the past few years, I have worked on integrating human computation with model training in active learning, transfer learning, and weakly-supervised learning settings, to allow models to effectively and efficiently learn from small, sparse, and noisy data. More recently, I am focusing on developing human-centered approaches for better performance, more robust machine learning systems.
We are seeking motivated individuals interested in pursuing a PhD in Computer Science. Ideal candidates would have a MSc degree in Computer Science, Mathematics or a related field, be broadly interested in Human Computation and Crowdsourcing, and in building hybrid human-machine systems to tackle problems of societal importance. We aim to attract multiple PhD candidates in the Crowd Computing team headed by Jie Yang and Ujwal Gadiraju at the Web Information Systems group of the EEMCS faculty at TU Delft.
Crowd computing involves algorithmic engagement and coordination of people by means of Web-enabled platforms. These complex tasks are mainly focused on the creation, enrichment, and interpretation of data, making crowd computing a building block of data science: from Facebook to Microsoft, from Google to IBM, from Spotify to Pandora, all major companies employ crowd computing to fulfil their data needs, both by involving employees, and by reaching out to anonymous crowds through online marketplaces like Amazon Mechanical Turk. The Crowd Computing team will focus on leveraging human intelligence in combination with machines to solve important problems that breach multiple disciplines including Machine Learning, Information Retrieval, Computer Vision and NLP.
We are of course dependent on the availability of funding, but we have a number of projects lined up, so you can make your interest known to us.
In order to indicate your interest, please send your complete CV to Dr. Ujwal Gadiraju (Assistant Professor) and Prof. Dr. Geert-Jan Houben (Full Professor and head of the Web Information Systems group) prefixing the email subject with [Crowd Computing - PhD Application].
A few paper accepted recently from the Sigma team in the context of crowdsourcing in spatial contexts:
- Sihang Qiu, Ujwal Gadiraju, Alessandro Bozzon. Estimating Conversational Styles in Conversational Microtask Crowdsourcing. Full paper at CSCW 2020.
- Shahin Sharifi Noorian, Sihang Qiu, Achilleas Psyllidis, Alessandro Bozzon, Geert-Jan Houben. Detecting, Classifying, and Mapping Retail Storefronts Using Street-level Imagery. Special session paper at ICMR 2020.
- Sihang Qiu, Ujwal Gadiraju, Alessandro Bozzon. Just the Right Mood for HIT! Analyzing the Role of Worker Moods in Conversational Microtask Crowdsourcing. Full paper at ICWE 2020.
- Sihang Qiu, Alessandro Bozzon, Geert-Jan Houben. VirtualCrowd: A Simulation Platform for Microtask Crowdsourcing Campaigns. Demo paper at WWW 2020.
A full paper will be presented at the 2020 edition of the ACM CHI Conference on Human Factors in Computing Systems (CHI 2020) in Honolulu, on the island of Oahu, Hawaiʻi, USA. The paper is titled "Improving Worker Engagement Through Conversational Microtask Crowdsourcing"', by Sihang Qiu, Ujwal Gadiraju, and Alessandro Bozzon. The paper is a follow up of previous work on conversational micro task crowdsourcing; we show that chatbots can offer an engaging and effective working environment for crowd workers, thus demonstrating their suitability as a crowd work platform.
We are involved with a European Training Network called NL4XAI: Interactive Natural Language Technology for Explainable Artificial Intelligence.
NL4XAI will train 11 creative, entrepreneurial and innovative early-stage researchers (ESRs), who will face the challenge of making AI self-explanatory and thus contributing to translate knowledge into products and services for economic and social benefit, with the support of Explainable AI (XAI) systems. Project consortium consists of 10 beneficiaries and 7 partner organizations.
Paper accepted to the User Modeling and User-Adapted Interaction (Journal): ``Effects of Personal Characteristics in Control-oriented User Interfaces for Music Recommender Systems'', by Yucheng Jin, Nava Tintarev, Nyi Nyi Htun, and Katrien Verbert.
ContextPlay: User Control for Context-Aware Music Recommendation. With Yucheng Jin, Nava Tintarev, Nyi Nyi Htun, and Katrien Verbert
A full paper will be presented at the 23rd International Conference on Theory and Practice of Digital Libraries (TPDL 2019) in Oslo (Norway) this year. The paper is titled "Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications'', by Daniel Vliegenthart, Sepideh Mesbah, Christoph Lofi, Akiko Aizawa and Alessandro Bozzon. The paper is the result of a collaboration between the group and researchers from the University of Tokyo.
A demonstration paper titled "Stateful Functions as a Service in Action" (pdf) by Adil Akhter (collaborator from ING), Marios Fragkoulis and Asterios Katsifodimos is going to be presented in this year's VLDB conference in Los Angeles, CA. In short, this paper presents a novel method to deploy, execute and scale Stateful Functions in the cloud, using stateful streaming dataflows.
Within a span of four weeks, three PhD students from Lambda-Lab defended their PhD theses within the area of learning anlaytics: Yue Zhao (Learning Analytics Technology to Understand Learner Behavioral Engagement in MOOCs) , Guanliang Chen (MOOC Analytics: Learner Modeling and Content Generation) and Dan Davis (Large-Scale Learning Analytics: Modeling Learner Behavior & Improving Learning Outcomes in Massive Open Online Courses), supported by TU Delft's Extension School, the Leiden-Delft-Erasmus Centre for Education & Learning and a CSC scholarship.
Asterios Katsifodimos received the best paper award from EDBT 2019 (International Conference on Extending Database Technology) with his collaborators from TU Berlin for their paper:“Efficient Window Aggregation with General Stream Slicing”: Jonas Traub, Philipp M. Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl and Volker Markl.
Moreover, WIS members published a on how to use streaming technology to execute scalable services in the cloud: "Operational Stream Processing: Towards Scalable and Consistent Event-Driven Applications": Asterios Katsifodimos, Marios Fragkoulis.
Members of Lambda-Lab will present two papers at the European Conference on Information Retrieval (ECIR): a demo paper titled "node-indri: moving the Indri toolkit to the modern Web stack", by Felipe Moraes & Claudia Hauff and a full paper titled "An Axiomatic Approach to Diagnosing Neural IR Models" by Daan Rennings, Felipe Moraes & Claudia Hauff.
The WIS group will present two papers at the 2019 edition of The Web Conference (WWW 2019), to be held in San Francisco, USA. The full paper titled "Crowd-Mapping Urban Objects from Street-Level Imagery" by Sihang Qiu, Achilleas Psyllidis, Alessandro Bozzon, and Geert-Jan Houben. And the short paper titled "Evaluating Neural Text Simplification in the Medical Domain" with Laurens van den Bercken, Robert-Jan Sips and Christoph Lofi
A full paper will be presented at CIKM in Turin/Italy this year titled "Contrasting search as a learning activity with instructor-designed learning'', by Felipe Moraes, Sindunuraga Rikarno Putra and Claudia Hauff.
A full paper will be presented at Recsys in Vancover this year titled "Effects of Personal Characteristics on the Music Recommender with Different Levels of Controllability'', by Yucheng Jin, Nava Tintarev and Katrien Verbert.
Two demonstration papers have been accepted at the 2018 edition of SIGIR: SearchX: Empowering Collaborative Search Research (Sindunuraga Rikarno Putra, Felipe Moraes and Claudia Hauff) and A/B Testing with APONE (Mónica Marrero and Claudia Hauff).
A full paper will be presented at UMAP in Singapore this year titled "Effects of Individual Traits on Diversity-aware Music Recommender User Interfaces'', by Yucheng Jin, Nava Tintarev and Katrien Verbert.
Lambda-Lab will be present at ACM Hypertext in Baltimore this year with a full paper titled IntelliEye: Enhancing MOOC Learners’ Video Watching Experience with Real-Time Attention Tracking, by Tarmo Robal, Yue Zhao, Christoph Lofi and Claudia Hauff.
Full paper titled "How do Crowdworker Communities and Microtask Markets Influence Each Other? A Data-Driven Study on Amazon Mechanical Turk" by Jie Yang, Carlo van der Valk, Tobias Hoßfeld, Judith Redi, and Alessandro Bozzon has been accepted at the The sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018) to be held in Zurich, 5th-8th of July 2018. The paper stems from the MsC thesis work of Carlo van der Valk.
Full paper titled "Social Gamification in Enterprise Crowdsourcing" has been accepted at the 10th ACM Conference on Web Science (WebSci 2018) to be held in Amsterdam, 27th-30th of May 2018. The paper stems from the MsC thesis work of Gregory Afentoulidis, performed in collaboration with the IBM Benelux Center of Advanced Studies, in the context of the TU Delft - IBM CIC on Big Data Science.
Lambda-Lab PhD student Guanliang Chen created a dataset for learning question generation; the accompanying paper (aptly named LearningQ: A Large-scale Dataset for Educational Question Generation co-authored by Guanliang Chen, Jie Yang, Claudia Hauff and Geert-Jan Houben) has been accepted at ICWSM '18. The dataset will be made public in the coming weeks.
Two demonstration papers have been accepted at the 2018 edition of The Web Conference (27th edition of the former WWW conference): SmartPub: A Platform for Long-Tail Entity Extraction from Scientific Publications Sepideh Mesbah, Alessandro Bozzon, Christoph Lofi and Geert-Jan Houben) and Social Smart Meter: Identifying Energy Consumption Behavior in User-Generated Content (Andrea Mauri, Achilleas Psyllidis and Alessandro Bozzon).
The proceedings of the International Workshop on Citizens for Recommender Systems (CitRec 2017) are now available on the ACM Digital Library.
Last month Jie Yang defended his Ph.D. research with his thesis ‘Crowd Knowledge Creation Acceleration'. The focus of his thesis was on better understanding crowd knowledge creation processes and developing novel methods and tools to accelerate the processes.
Full paper titled "Leveraging Crowdsourcing Data For Deep Active Learning - An Application: Learning Intents in Alexa" by Jie Yang, Thomas Drake (Amazon, Seattle), Andreas Damianou (Amazon, Cambridge), and Yoelle Maarek (Amazon) was accepted at the 2018 edition of The Web Conference (27th edition of the former WWW conference).
In January, Geert-Jan Houben gives the Foundation Day Lecture at the 176th Dies Natalis of TU Delft: Data and Science we can rely on.
Our research titled "The Half-Life of MOOC Knowledge: A Randomized Trial Evaluating the Testing Effect in MOOCs" was accepted as a full paper at the 8th International Learning Analytics & Knowledge Conference! This is a collaboration between WIS members Dan Davis, Claudia Hauff and Geert-Jan Houben as well as René Kizilcec (U Stanford). Our work on "Webcam-based attention tracking in Online Learning: A Feasibility Study" was accepted as a full paper at IUI 2018 (the 23rd annual meeting of the Intelligent User Interfaces community); the work was a collaboration between Tarmo Robal (Tallinn University of Technology) and WIS members Yue Zhao, Christoph Lofi and Claudia Hauff.
The paper titled "Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques and Assurance Actions" was accepted to the ACM Computing Surveys journal. The paper is co-authored by Florian Daniel (POLIMI), Pavel Kucherbaev (WIS TU Delft), Cinzia Cappiello (POLIMI), Boualem Benatallah (UNSW), and Mohammad Allahbakhsh (University of Zabol).
Alessandro Bozzon has been selected for the 2017 IBM Faculty Award, in the sectiom Cognitive Computing and IoT, for his work on Enterprise Crowdsourcing. The work has been performed in collaboration with the IBM Benelux Center for Advanced Studies. The IBM Faculty Award is a competitive worldwide program intended to foster collaboration between researchers at leading universities worldwide and those in IBM research.
The paper titled "Human Aided Bots" was accepted to the Internet Computing magazine as a Spotlight. The paper is co-authored by Pavel Kucherbaev, Alessandro Bozzon, and Geert-Jan Houben.
The paper titled "Interacting Attention-gated Recurrent Networks for Recommendation" was accepted at CIKM '17. The paper is co-authored by Wenjie Pei, Jie Yang, Zhu Sun, Jie Zhang, Alessandro Bozzon, and David M.J. Tax. The paper will be presented in November 2017 in Singapore.
The "Clarity is a Worthwhile quality – On the Role of Task Clarity in Microtask Crowdsourcing", and authored by Ujwal Gadiraju, Jie Yang, Alessandro Bozzon won the "Douglas Engelbart best paper award" at Hypertext '17.
Claudia Hauff has been awarded an NWO VIDI Grant for her proposal SearchX: Integrating search and sensemaking into large-scale open online learning. 2 PhD and 1 Postdoc vacancies for this project will be announced soon.
The paper accepted at Hypertext'17, titled "Clarity is a Worthwhile quality – On the Role of Task Clarity in Microtask Crowdsourcing", and authored by Ujwal Gadiraju, Jie Yang, Alessandro Bozzon has been nominated for "Douglas Engelbart best paper award". The paper will be presented on Thursday July 6th in Prague, Czech Republic.
Two submissions on recommender systems were respectively accepted at IJCAI'17 and AAAI'17. The paper titled "MRLR: Multi-level Representation Learning for Personalized Ranking in Recommendation" was accepted at IJCAI'17 (Melbourne, Australia). The paper titled "Exploiting both Vertical and Horizontal Dimensions of Feature Hierarchy for Effective Recommendation" was accepted at AAAI'17 (San Francisco, CA, US).
Two submissions were accepted in the Technology-Enhanced Adaptive Learning track, co-written by members of the Lambda-Lab: "Measuring student behaviour dynamics in a large interactive classroom setting" (in collaboration with researchers from Lugano) was accepted as full paper; "Certificate Achievement Unlocked: Exploring MOOC Learners' Behaviour Before & After Passing" was accepted as a late-breaking result paper.
The paper "Nudge your Workforce. A Study on the Effects of Task Notification Strategies in Enterprise Mobile Crowdsourcing" has been accepted as full paper in the Personalized Social Web track. The paper has been co-written with researchers from the IBM Amsterdam Centre of Advanced Studies, in the context of the IBM-EWI CIC collaboration. The paper is a result of the master thesis of our former WIS master student Sarah Bashirieh.
Our submission ``Sequences of Diverse Song Recommendations: An exploratory study in a commercial system'' was accepted as an extended abstract to the Recommender Systems track.
The paper titled "Clarity is a Worthwhile quality – On the Role of Task Clarity in Microtask Crowdsourcing" was accepted at Hypertext'17 (Prague, Czech Republic). Authors: Ujwal Gadiraju, Jie Yang, Alessandro Bozzon.
The papers co-written by members of the Pi-lab: "Structure and evolution of package dependency networks" (with collaborators from U. Tartu) and "Oops, my tests broke the build: An explorative analysis of Travis CI with GitHub" (with collaborators from SERG/TU Delft) were accepted as full papers at MSR '17.
WIS research is prominently featured in the official EEMCS faculty Research video.
See our Vacancies page.
See our Vacancies page.
The paper titled "Describing Data Processing Pipelines in Scientific Publications for Big Data Injection" is accepted at the SWM workshop of WSDM conference in Cambridge, UK. Paper titled "Semantic Annotation of Data Processing Pipelines in Scientific Publications" is accepted at ESWC 2017 conference in Portoroz, Slovenia.
Claudia Hauff has been awarded an NWO Top Grant to conduct “innovative or high-risk scientific research that addresses questions of high quality and urgency.” The grant will push the boundaries of large-scale collaborative search.
Alessandro's work on social data science has been recognised as a 2015 highlight of TU Delft. The piece mentions the successes obtained in the research lines related to urban analytics (Social Glass), crowd knowledge generation (SealincMedia and WUDE), and the Inclusive Enterprise (with IBM Benelux).
Sunday October 4th, World Animal Day, 40 ornithologists, bird watchers and bird enthusiasts assembled in the library of the Rijksmuseum in Amsterdam for digital birdwatching. These bird experts were invited to use their knowledge ans skills to identify birds depicted on prints, paintings and other digital heritage objects. The event was an initiative of researchers from Delft University of Technology, VU University Amsterdam and the Centrum Wiskunde & Informatica (CWI), assembled in the COMMIT/ research project SEALINCMedia, together with the Rijksmuseum, Naturalis and Wikimedia.
The digital bird watching event attracted quite some media attention, both in paper media (Trouw) and digital media (RTL, Parool, Telegraaf) and was even featured on the Dutch TV news (NOS, from 10:49). The full article can be read here.
Under the umbrella of AMS, the Social Glass team and the research group lead by Prof. Serge Hoogendoorn (Faculty of Civil Engineering and Geoscience) developed and executed a pilot live experiment in Crowd Management during SAIL 2015, the international nautical event which took place from the 19th till the 23rd of August in Amsterdam. The pilot study used a combination of various methods of real-time data collection to give an optimum picture of pedestrian flows along the SAIL route and its different areas of interest. The main focus of the study was on how to gain reliable information on pedestrian flows during large-scale public events, such as SAIL, and use this effectively for crowd management. This experiment is a first step towards the development of a real-time monitoring system that can be utilized as a support tool by stakeholders responsible for the smooth running of such large-scale events. The experiment run parallel to the established crowd-management methods used by the municipality and, therefore, had no effect on the tasks of the crowd managers.
Our work together with IBM on Inclusive Enterprise, led by Alessandro, was featured on the Wall Street Journal website: blogs.wsj.com/cio/2015/07/08/ibm-researchers-try-to-measure-employee-well-being-using-technology/
Geert-Jan Houben has become member of the editorial board of UMUAI, User Modeling and User-Adapted Interaction: The Journal of Personalization Research.
Geert-Jan Houben has been appointed as KIVI-chair on Big Data Science, connecting the TU Delft research to the engineering in practice.
The WIS group is present with its social data science research in AMS, the Amsterdam Institute for Advanced Metropolitan Solutions. Example is the work on SocialGlass.
Together with our partners in the COMMIT project SEALINCMedia we released a video showing the Accurator platform for the art collection of the Rijksmuseum. The platform itself is demonstrated at accurator.nl.
WIS is involved with its research on social data computing and user aspects of social data in Delft Data Science (DDS). Examples are the work in SocialGlass on urban data analytics, for Accurator at Rijkmuseum on nichesourcing and social annotation, or for the Extension School on learning analytics.
Mathematics & Computer Science (EE)MCS
Building 28 - Van Mourik Broekmanweg 6