Assistant Professor at the Web Information Systems group of the Faculty of Engineering, Mathematics and Computer Science (EEMCS/EWI), Delft University of Technology.
My research is on Semantics-based Data Engineering methods and techniques. The general problem in data engineering is that the source data available to a data-driven system is often not fit for that purpose because data is scattered between different sources, is of low quality, or important semantic information is only implicitly available. This is a central challenge in the arising data-driven economy and a major detrimental aspect in many AI- or Data Science-driven systems. Data processing pipelines are needed to overcome these issues, producing the required target data from the available source data.
In the context of this challenge, I focus on problem scenarios where there is a semantic mismatch between mostly unstructured source data and structured target data. My research enables the engineering of sophisticated data processing pipelines for these scenarios which can tackle non-trivial issues with respect to integrating data from different sources, transforming data to more suitable granularities or more explicit representations, and augmenting data with additional data points and properties. This is a challenging aspect of data engineering, and solutions rely on recent methods for facilitating semantic enrichment like natural language processing, information extraction, crowd computing, or AI-driven data analysis.
I chose several application domains for validating my research, focusing on domains where data produced by humans needs to be analyzed and processed. These domains typically exhibit a strong semantic and structural mismatch between available and desired data, and thus are effective testing grounds for my research. Examples are digital libraries and enterprise digital text repositories, online education information systems, and information systems for supporting digital humanities research.
2016: Assistant Professor Delft University of Technology
2014-2016: PostDoc at Technische Universität Braunschweig
2012-2014: PostDoc at National Institute of Informatics, Tokyo, Japan
2011-2012: PostDoc at Technische Universität Braunschweig
2011: Defense of Doctoral Thesis at Technische Universität Braunschweig
2008-2011: Ph.D. Researcher at Technische Universität Braunschweig
2006-2008: Ph.D. Researcher at L3S Research Center, Leibniz University Hannover
2005: Diploma Thesis at Collaborative Software Development Laboratory, University of Hawai’i, Honolulu - Manoa, Certificate issued by University Kaiserslautern
2002-2004: Assistant Researcher at Fraunhofer Institute for Experimental Software Engineering, Kaiserslautern
2000-2004: Student at University Kaiserslautern
Agathe Balayn, Panagiotis Soilis, Christoph Lofi, Jie Yang, and Alessandro Bozzon. "What do You Mean? Interpreting Image Classification with Crowdsourced Concept Extraction and Analysis". In The Web Conference (WWW). 2021. PDF.
Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, and Asterios Katsidodimos. "Valentine: Evaluating Matching Techniques for Dataset Discovery". In Int. Conf. on Data Engineering (ICDE). 2021. PDF.
Agathe Balayn, Christoph Lofi, and Geert-Jan Houben. "Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems". The VLDB Journal, 2021. PDF, DOI.
Ionnis Petros Samiotis, Christoph Lofi, and Alessandro Bozzon. "Hybrid Annotation Systems for Music Transcription". In Int. Workshop on Reading Music Systems. 2021. PDF.
Tom Harting, Sepideh Mesbah, and Christoph Lofi. "LOREM: Language-consistent Open Relation Extraction from Unstructured Text". In The Web Conference (WWW). Taipei, Taiwan, apr 2020. PDF.
Christos Koutras, Marios Fragkoulis, Asterios Katsifodimos, and Christoph Lofi. "REMA: Graph Embeddings-based Relational Schema Matching.". In EDBT/ICDT Workshops. 2020. PDF.
Alessandro Bozzon Ioannis Petros Samiotis, Sihang Qiu, Andrea Mauri, Cynthia CS Liem, Christoph Lofi. "Microtask crowdsourcing for music score Transcriptions: an experiment with error detection". In International Society for Music Information Retrieval Conference. 2020. PDF.
Sepideh Mesbah, Jie Yang, Robert-Jan Sips, Manuel Valle Torre, Christoph Lofi, Alessandro Bozzon, and Geert-Jan Houben. "Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content". In Int. Conf. on Empirical Methods in Natural Language Processing (EMNLP). Hong Kong, China, nov 2019. PDF.
Daniel Vliegenhart, Sepideh Mesbah, Christoph Lofi, Akiko Aizawa, and Alessandro Bozzon. "Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scienti c Publications". In Int. Conf. on Theory and Practice of Digital Libraries (TPDL). Oslo, Norway, sep 2019. PDF.
M.V. Torre, M. Ye, and C. Lofi. "Perceptual relational attributes: Navigating and discovering shared perspectives from user-generated reviews". In Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI), volume P-289. 2019. DOI.
Manuel Valle Torre, Mengmeng Ye, and Christoph Lofi. "Perceptual Relational Attributes: Navigating and Discovering Shared Perspectives from User-Generated Reviews". In Datenbanksysteme für Business, Technologie und Web (BTW). Rostock, Germany, 2019. PDF.
Laurens Van Den Bercken, Robert-Jan Sips, and Christoph Lofi. "Evaluating neural text simplification in the medical domain". In The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019. 2019. PDF, DOI.
Tarmo Robal, Yue Zhao, Christoph Lofi, and Claudia Hauff. "Towards Real-time Webcam-based Attention Tracking in Online Learning". In ACM Annual Meeting of Interactive User Interfaces (IUI). Tokyo, Japan, 2018. PDF.
Yue Zhao, Tarmo Robal, Christoph Lofi, and Claudia Hauff. "Towards MOOC2GO: The impact of mobile learning on learner performance in MOOCs". In Conf. on User Modeling, Adaptation and Personalization (UMAP). Singapore, 2018. PDF.
Sepideh Mesbah, Christoph Lofi, Manuel Valle Torre, Alessandro Bozzon, and Geert-Jan Houben. "TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications". In Int. Semantic Web Conference (ISWC). Monterey, California, USA, 2018. PDF.
Sepideh Mesbah, Alessandro Bozzon, Christoph Lofi, and Geert-Jan Houben. "SmartPub: A Platform for Long-Tail Entity Extraction from Scientific Publications". In The Web Conference (WWW) Demo Track. Lyon, France, 2018. PDF.
Tarmo Robal, Yue Zhao, Christoph Lofi, and Claudia Hauff. "IntelliEye: Enhancing MOOC Learners' Video Watching Experience with Real-Time Attention Tracking". In ACM Conf. on Hypertext and Social Media. Baltimore, Maryland, USA, 2018. PDF.
Sepideh Mesbah, Guanliang Chen, Manuel Valle Torre, Alessandro Bozzon, Christoph Lofi, and Geert-Jan Houben. "Concept Focus: Semantic Meta-Data For Describing MOOC Content". In Europ. Conf. on Technology Enhanced Learning (EC-TEL). Leeds, UK, 2018. PDF.
Yue Zhao, Tarmo Robal, Christoph Lofi, and Claudia Hauff. "Can I have a Mooc2Go, please? On The Viability of Mobile vs. Stationary Learning". In Europ. Conf. on Technology Enhanced Learning (EC-TEL). Leeds, UK, 2018. PDF.
Jan-Christoph Kalo, Christoph Lofi, René Pascal Maseli, and Wolf-Tilo Balke. "Semantic Query Processing: Estimating Relational Purity". In Lernen Wissen Daten Analysen (LWDA) Conference. Rostock, Germany, sep 2017. PDF.
Yue Zhao, Christoph Lofi, and Claudia Hauff. "Scalable Mind-Wandering Detection for MOOCs: A Webcam-Based Approach". In European Conf. on Technology Enhanced Learning (EC-TEL). Tallinn, Estonia, sep 2017. PDF.
Mengmeng Ye, Christoph Lofi, and Nava Tintarev. "Memorability of Semantically Grouped Online Reviews". In Semantics 2017. Amsterdam, Netherlands, sep 2017. PDF.
Sepideh Mesbah, Kyriakos Fragkeskos, Christoph Lofi, Alessandro Bozzon, and Geert-Jan Houben. "Facet Embeddings for Explorative Analytics in Digital Libraries". In Int. Conf. on Theory and Practice of Digital Libraries (TPDL). Thessaloniki, Greece, sep 2017. PDF.
Christoph Lofi and Nava Tintarev. "Towards Analogy-based Recommendation: Benchmarking of Perceived Analogy Semantics". In Workshop on Recommendation in Complex Scenarios @ RecSys. Como, Italy, aug 2017. PDF.
Nava Tintarev and Christoph Lofi. "Sequences of Diverse Song Recommendations". In User Modelling, Adaptation and Personalization (UMAP). Bratislava, Slovakia, jul 2017. PDF.
Yue Zhao, Dan Davis, Guanliang Chen, Christoph Lofi, Claudia Hauff, and Geert-Jan Houben. "Certificate Achievement Unlocked: How Does MOOC Learners' Behaviour Change?". In User Modelling, Adaptation and Personalization (UMAP), 83–88. Bratislava, Slovakia, jul 2017. PDF.
Sepideh Mesbah, Kyriakos Fragkeskos, Christoph Lofi, Alessandro Bozzon, and Geert Jan Houben. "Semantic annotation of data processing pipelines in scientific publications". In European Semantic Web Conference (ESWC), volume 10249 LNCS, 321–336. Portoroz, Slovenia, may 2017. PDF, DOI.
Sepideh Mesbah, Alessandro Bozzon, Christoph Lofi, and Geert-Jan Houben. "Describing data processing pipelines in scientific publications for Big Data injection". In Workshop on Scholary Web Mining (SWM). Cambridge, UK, feb 2017. PDF, DOI.
Christoph Lofi and Manuel Valle Torre. "Perceptual Perspectives for Experience Items: Representation and Query Processing". In Dutch-Belgian DataBase Day (DBDBD). Utrecht, Netherlands, 2017. PDF.
Christoph Lofi and Wolf-tilo Balke. "Large Scale Cooperation Scenarios – Crowdsourcing and its Societal Implication". Transactions on Internet Research (TIR), 12(1):03–14, jan 2016. PDF.
Christoph Lofi. "Towards Human-Centered Database Query Processing using on Perceptual Properties". In Dutch-Belgian DataBase Day (DBDBD). Mons, Belgium, 2016. PDF.
Christoph Lofi, Athiq Ahamed, Pratima Kulkarni, and Ravi Thakkar. "Benchmarking semantic capabilities of analogy querying algorithms". In Int. Conf. on Database Systems for Advanced Applications (DASFAA), volume 9642, 463–478. Dallas, TX, USA, 2016. PDF, DOI.
Nestor Alvaro, Mike Conway, Son Doan, Christoph Lofi, John Overington, and Nigel Collier. "Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use". Journal of Biomedical Informatics, 58:280–287, nov 2015. PDF.
Christoph Lofi and Philipp Wille. "Exploiting social judgements in big data analytics". In 13th Lernen Wissen Adaption (LWA) Conference, volume 1458, 444–455. Trier, Germany, oct 2015. PDF.
Philipp Wille, Christoph Lofi, and Wolf-Tilo Balke. "Towards Narrative Information Systems". In Web-Age Information Management (WAIM). Qingdao, Shandong, China, jun 2015. PDF.
Kinda El Maarry. "Crowdsourcing for Query Processing on Web Data: A Case Study on the Skyline Operator". Journal of Computing and Information Technology (CIT), mar 2015. PDF.
Christoph Lofi and Christian Nieke. "I would like to watch something like ‘The Terminator'…” Cooperative Query Personalization Based on Perceptual Similarity". In 18th International Conference on Extending Database Technology (EDBT). Brussels, Belgium, 2015. PDF.
Jiyin He, Kai Kunze, Christoph Lofi, K. Madria Sanjay, and Stephan Sigg. "Towards Mobile Sensor-Aware Crowdsourcing: Architecture, Opportunities and Challenges". In DASFAA Workshop on Uncertain and Crowdsourced Data. Bali, Indonesia, 2014. PDF.
Christoph Lofi and Christian Nieke. "Exploiting Perceptual Similarity: Privacy-Preserving Cooperative Query Personalization". In Int. Conf. on Web Information System Engineering (WISE). Thessaloniki, Greece, 2014. PDF.
C. Lofi, C. Nieke, and N. Collier. "Discriminating Rhetorical Analogies in Social Media". In Conf. of the Europ. Chapter of the Association for Computational Linguistics (EACL). Gothenburg, Sweden, 2014. PDF.
Christoph Lofi and Kinda El Maarry. "Design Patterns for Hybrid Algorithmic-Crowdsourcing Workflows". In 16th IEEE Conf. on Business Informatics (CBI). Geneva, Switzerland, 2014. PDF.
Christoph Lofi, Kinda El Maarry, and Wolf-Tilo Balke. "Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing". In Int. Conf. on Conceptual Modeling (ER). Hong Kong, China, 2013. PDF.
Christoph Lofi, Kinda El Maarry, and Wolf-Tilo Balke. "Skyline Queries in Crowd-Enabled Databases". In Int. Conf. on Extending Database Technology (EDBT). Genoa, Italy, 2013. PDF.
Christoph Lofi and Christian Nieke. "Modeling Analogies for Human-Centered Information Systems". In 5th Int. Conf. On Social Informatics (SocInfo). Kyoto, Japan, 2013. PDF.
Christoph Lofi. "Just ask a human? – Controlling Quality in Relational Similarity and Analogy Processing using the Crowd". In CDIM Workshop at Database Systems for Business Technology and Web (BTW). Magdeburg, Germany, 2013. PDF.
Christoph Lofi. "Analogy Queries in Information Systems – A New Challenge". Journal of Information & Knowledge Management (JIKM), 2013. PDF.
Christoph Lofi and Ralf Krestel. "iParticipate: Automatic Tweet Generation from Local Government Data". In 17th International Conference on Database Systems for Advanced Applications (DASFAA). Busan, South Korea, 2012. PDF.
Joachim Selke, Christoph Lofi, and Wolf-Tilo Balke. "Pushing the Boundaries of Crowd-Enabled Databases with Query-Driven Schema Expansion". Int. Conf. on Very Large Data Bases (VLDB), 5(2):538–549, 2012. PDF.
Christoph Lofi and Wolf-Tilo Balke. "On Skyline Queries and how to Choose from Pareto Sets". In Advanced Query Processing in Intelligent Systems Reference Library (ISRL 36), chapter 2, pages 15–36. Springer, 2012. PDF.
Christoph Lofi, Wolf-Tilo Balke, and Ulrich Güntzer. "Malleability-Aware Skyline Computation on Linked Open Data". In 17th International Conference on Database Systems for Advanced Applications (DASFAA). Busan, South Korea, 2012. PDF.
Christoph Lofi, Wolf-tilo Balke, and Ulrich Güntzer. "Equivalence Heuristics for Malleability-Aware Skylines". Journal of Computing Science and Engineering (JCSE), 6(3):207–218, 2012. PDF.
Silviu Homoceanu, Michael Loster, Christoph Lofi, and Wolf-tilo Balke. "Will I like it ? – Providing Product Overviews based on Opinion Excerpts". In IEEE Conference on Commerce and Enterprise Computing (CEC). Luxembourg, Luxembourg, 2011. PDF.
Christoph Lofi, Wolf-Tilo Balke, and Ulrich Güntzer. "Eliciting Customer Wishes using Example-Based Heuristics in E-Commerce Applications". In IEEE Conference on Commerce and Enterprise Computing (CEC). Luxembourg, Luxembourg, 2011. PDF.
Christoph Lofi. "Choosing the Right Thing: Cooperative Trade-Off Enhanced Skyline Queries". In PhD Workshop at the 28th International Conference On Data Engineering (ICDE). Hannover, Germany, 2011. PDF.
Christoph Lofi and Wolf-Tilo Balke. "Preference Trade-Offs – Towards Manageable Skylines". In 22. GI-Workshop Grundlagen von Datenbanken (GvD). Bad Helmstedt, Germany, 2010. PDF.
Christoph Lofi, Christian Nieke, and Wolf-Tilo Balke. "Mobile Product Browsing Using Bayesian Retrieval". In IEEE Conf. Commerce and Enterprise Comp. (CEC). Shanghai, China, 2010. PDF.
Joachim Selke, Christoph Lofi, and Wolf-Tilo Balke. "Highly Scalable Multiprocessing Algorithms for Preference-Based Database Retrieval". In 15th International Conference on Database Systems for Advanced Applications (DASFAA). Tsukuba, Japan, 2010. PDF.
Christoph Lofi, Ulrich Güntzer, and Wolf-Tilo Balke. "Efficient Computation of Trade-Off Skylines". In 13th International Conference on Extending Database Technology (EDBT). Lausanne, Switzerland, 2010. PDF.
Christoph Lofi, Wolf-Tilo Balke, and Ulrich Güntzer. "Efficient Skyline Refinement Using Trade-Offs". In 3rd International IEEE Conference on Research Challenges in Information Science (RCIS). Fès, Morocco, 2009. PDF, DOI.
Christoph Lofi, Wolf-Tilo Balke, and Ulrich Güntzer. "Efficient Skyline Refinement Using Trade-Offs Respecting Don't-Care Attributes". International Journal of Computer Science and Applications (IJCSA), 6(5):1–29, 2009. PDF.
Christoph Lofi, W.-T. Wolf-Tilo Balke, Ulrich Güntzer, and U. Guntzer. "Efficiently Performing Consistency Checks for Multi-Dimensional Preference Trade-Offs". In 2nd International IEEE Conference on Research Challenges in Information Science (RCIS), volume 5, 271–278. Marakech, Morocco, 2008. IEEE. PDF, DOI.
Christoph Lofi, Wolf-Tilo Balke, and Ulrich Güntzer. "Consistency Check Algorithms for Multi-Dimensional Preference Trade-Offs". International Journal of Computer Science & Applications (IJCSA), 5(3):165–185, 2008. PDF.
Wolf-Tilo Balke, Christoph Lofi, and Ulrich Güntzer. "User Interaction Support for Incremental Refinement of Preference-Based Queries". In 1st International IEEE Conference on Research Challenges in Information Science (RCIS). Ouarzazate, Morocco, 2007. PDF.
Christoph Lofi and Wolf Siberski. "Service Oriented Architectures for Open E-Learning Systems: An Overview of the Prolix Project". In eTeaching & eScience. Hannover, Germany, 2007. PDF.
Wolf-Tilo Balke, Christoph Lofi, and Ulrich Güntzer. "Incremental Trade-Off Management for Preference Based Queries". International Journal of Computer Science & Applications (IJCSA), 4(2):75–91, 2007. PDF.
Wolf-Tilo Balke, Ulrich Güntzer, and Christoph Lofi. "Eliciting Matters - Controlling Skyline Sizes by Incremental Integration of User Preferences". In 12th International Conference on Database Systems for Advanced Applications (DASFAA). Bangkok, Thailand, 2007. PDF.
Juri L. De Coi, Eelco Herder, Arne Koesling, Christoph Lofi, Daniel Olmedilla, Odysseas Papapetrou, and Wolf Siberski. "A Model for Competence Gap Analysis". In 3rd International Conference on Web Information Systems and Technologies (WEBIST). Barcelona, Spain, 2007. PDF, DOI.
Christoph Lofi. "cGQM - Ein zielorientierter Ansatz für kontinuierliche, automatisierte Messzyklen". In 4th National Conference on Software Measurement and Metrics (DASMA MetriKon 2005). Kaiserslautern, Germany, 2005. PDF.
Data is widely accepted as one of the most valuable assets for many industrial and governmental organizations. At the heart of the data-driven economy lies the ability to provide value-adding services based on that data, which typically rely on purposefully analyzing and providing data to users and stakeholders.
Data needs to be fit for its intended purpose. Especially, the shape and properties of data must be fit for the chosen data analysis methods (e.g., machine-learning based techniques) and human-data interaction paradigms. However, the required information, if available at all, is often scattered across different heterogeneous sources, is incomplete or of bad quality, is lacking in semantic richness, or information is just implicitly available. Here, the role of Data Engineering research is to enable the development of data processing pipelines bridging the gap between available source data and the required target data.
My research focuses on Semantics-based Data Engineering methods and techniques, and especially on scenarios when the gap between source and target is rooted in a semantic structural mismatch. This is a dominant issue when dealing with data produced directly or indirectly by humans: most analysis and query techniques for tackling the requirements of such application domains demand explicitly structured high-quality data. However, human-produced data like natural text is typically unstructured and only implicitly mentions the desired information. The semantics of such unstructured data is often unclear or ambiguous and cannot easily be processed. Furthermore, human data is typically produced in a distributed fashion. Thus, the resulting challenge is to model what kind of target data is desired by the application domain, and then engineer data processing pipelines which integrate, transform, and augment the existing implicit and unstructured source data to comply with that model. My research focuses on the principled methods and insights to enable and support this engineering process.
As a simplified example, consider the following scenario: in sophisticated scientific document management system which recommends relevant research papers to users, information on research papers can be found in limited extent in structured bibliographic datasets, but mostly in the unstructured full-texts, on social media, or in talks or lectures. These highly heterogenous sources first need to be integrated, cleaned, and then transformed and augmented such that recommendation or query algorithms can use that data effectively, e.g. by extracting structured data from full texts or analyzing and summarizing the social media posts. Then, this data can be used for semantically rich exploration, recommendation, or visualization.
For realizing pipelines for integrating, transforming, and augmenting, I am employing a toolbox of different semantic enhancement methods. These covers:
- Natural Language Processing methods, and especially language embeddings for analyzing natural text. I use these techniques to represent unstructured text fragments in latent spaces such that they can be analyzed, clustered, and classified using other data analytics techniques.
- Information Extraction methods, and especially named entity recognition and named entity typing techniques for extracting keywords of specific types for summarizing or describing unstructured text content and linking them to semantic knowledge bases.
- Crowd Computing to support all aspects of the developed data processing pipelines where fully automated methods fail. Crowd computing is used to train or verify other algorithms, but also for developing hybrid processing pipelines where users work together with automated methods. This is typically required as the semantic gap exhibited by many application domains is quite large, and automated techniques will typically need human guidance and cooperation.
- AI-driven Data Analysis methods for drawing conclusions and reasoning during individual steps of the data processing pipeline.
The foundational challenge of researching and developing semantics-based data engineering methods must also be seen in the light of the intended application domains, as the notion of “data being fit to purpose” is domain dependent. I chose application domains which strongly exhibit the problem features relevant to my research focus: source data is typically highly unstructured and implicit in its semantics, and target data requires structure and explicit semantics.
The core applications I currently explore are digital libraries, online education information systems, and information systems for supporting digital humanities research.
- In the digital library domain, I investigate data engineering workflows for augmenting existing metadata repositories with additional metadata types which allow for more meaningful exploration and visualization of large scientific document collections. To this end, natural language processing methods for extracting information from documents are combined with semantic modelling and crowd computing. In this domain, I cover both digital offerings of traditional libraries but also enterprise digital text repositories like medical information systems.
- In the domain of information systems for digital humanities, I research data engineering workflows for integrating and transforming data from different sources of low quality (e.g. digital document scans) to higher quality formats (unified and annotated semantic digital documents). Due to the low quality and diversity of the raw data sources, resulting data engineering pipelines make heavy use of crowd-computing techniques.
- In the online education domain, I focus on data engineering methods integrating and augmenting repositories with semantically more meaningful meta-data. This metadata is used to also develop richer exploration and interaction paradigms.
Fields: Database Query Processing, Subspace Clustering, Probabilistic Databases, Sentiment Analysis, Recommender Systems.
Some of the most valuable features of Relational Databases are clearly defined schemas with crisp semantics, thus allowing for rich and complex declarative queries. However, this also comes at a cost: the underlying schema must be carefully designed upfront to support queries expected to fulfill the information need of future users, and the modelling of the structured schema should represent the actual nature and semantics of the represented real world entities in such a way that it naturally aligns with the internalized semantics of user issuing the queries. Here, in some application scenarios, this focus on strict schemas can become problematic. As an example, consider an e-commerce scenario focusing on selling experience products like movies, books, music, or games. Here, the perceived properties describing the user experience those products will entail (which, for most people, is the deciding factor for buying the product) are difficult to capture using relational schemas, which thus often leads to a focus on more objective and crisp properties like production year, actor names, or rough genre labels. Thus, many queries users would naturally ask are not supported by the system, as for example queries for movies which “feel” like a given example movie, or movies which feature a “thought-provoking plot”, movies which are “educational”, or “suitable for children” (we call those queries human-centered queries, as they are the queries most humans would use in a natural conversation with another human, but are often not supported by information systems). One of the challenges around perceived properties of experience products is that it is very hard to foresee during schema design time which properties will be relevant for users, and how they are perceived by them (i.e., the challenge of obtaining values for the properties.) Especially, many of these properties might even be subjective, and thus the perception of different users might differ or be even conflicting (e.g., there might be conflicting views on how “funny” a given movie is).
I claim that most of the perceptual information required to support such human-centered queries can be obtained from user-generated judgements as for example ratings, comments, or reviews. This form of feedback, which can be seen as self-motivated crowdsourcing is a promising source of information as such judgements usually cover the perceptual properties and aspects deemed important by the creator of the judgement. However, integrating this rich source of information into the query process is hard due to the aforementioned challenges, and many applications choose not to try an integration at all: e.g., in most applications (like for example web shops), user reviews are simply displayed for manual consumptions, or user ratings might be used within a recommender systems – but usually it is not possible to access the richness of information contained in human judgements in a declarative and explicit relational fashion.
In this line of work, we are exploring the challenge of supporting such human-centered queries focusing on perceptual properties from a database query processing perspective.
The outlined contributions are as follows:
- Developing a general vision of a database system using perceptual properties, and discuss a high-level model of how to integrate perceptual properties into a suitable data model.
- A special focus will be on consensual perceptual properties to deal with subjectivity in user perception, i.e., properties of entities for which the values emerge form a consensus in perception of a larger user base. Also, we introduce multi-consensual properties for which there is not a single, but multiple consensual values.
- Research into both explicit and latent properties. Here, explicit properties have a real-world interpretation which is explainable to users, while latent properties are opaque but still can be used for several query types like similarity queries.
- Investigating how perceptual properties are represented within a database system. A promising candidate is adapting probabilistic databases, coupled with subspace clustering and exploration to deal with both subjectivity and uncertainty of extraction.
- Developing multiple prototype implementations of systems which can extract, store, and process perceptual properties. Each of these implementations focus on a specific subset of the challenge, e.g., extracting explicit properties, or dealing with multi-consensual values. The long-term goal is to aggregate and combine these individual systems into a larger demonstrator which can be used to showcase the research results.
Fields: Knowledge Extraction, Digital Libraries, Ontology Design.
Research Focus: This research line is an application of the fundamental theory and practice of human-centered information systems as developed in the fundamental research line. It handles (unstructured) text documents and their related (structured) meta-data in the context of digital libraries. In this scenario, the relevant meta-data which would be required to perform human-centric queries is unavailable. Therefore, that missing information again needs to be extracted from both external sources like user judgements, but also from the actual textual document itself. Thus, the focus of this research line is on the domain-specific knowledge extraction and linking techniques required to realize the vision of human-centered information systems for digital libraries.
Domain-Specific Pitch: Academic publications are a central repository of human-knowledge, and are at the core of scientific advancements both in academia itself, but also of industrial progress. However, tapping into this vast repository of knowledge is a daunting and challenging task, as the number of available publications is growing with tremendous speed. Without proper support, it is often hard or even impossible to find relevant publications related to a given problem in a timely fashion. Providing this support has always been the domain of libraries. However, the near exponential growth of con-tent in the recent years together with the shift to digital resources invalidated many well-proven workflows, demanding new solutions suitable for the current age and time. Efforts to make highly specialized academic knowledge accessible need to go beyond simple bibliographic metadata, the current state of the art. Instead, most information search of human users is inherently entity-centric, being in the most central aspect of publications people perceive as relevant. Some domains like medicine and chemistry have realized this trend early on, and invested heavily into annotating scientific publications with their most relevant entities to support more meaningful search and exploration like genes and proteins, chemical structures and molecules, or drug names. However, the efforts are very costly as they still rely heavily on manual curation and semi-manual workflows, and are thus pro-hibitive for many domains which lack the resources for such measures. Thus, in many domains the query capabilities and meta-data availability is insufficient to cope with the user’s information demand. Therefore, in this line of work, I propose to design, develop, and evaluate novel techniques for extracting entity, centric-meta from research publications for human-centered queries in a mostly automatic fashion, and showcase the effectiveness of our approaches in domains which currently lack support of rich semantic academic metadata. Beyond the obvious contributions like providing entity-centric search, offering facetted browsing capabilities, and realizing semantically meaningful recommendation and exploration of content, I can also use the extracted metadata for contributions to the digital library domain itself by tracking trends or the change of topics in a visual-ly appealing and comprehensive fashion.
Outlined contributions are as follows:
- Extend the current state-of-the-art of systems re-search in the digital library domain by covering challenges like analyzing and annotation educational content, sequencing educational content into micro-learning objects, and developing both recommenda-tion and query capabilities
- Developing a demonstrator prototype system which can augment current digital libraries with additional human-centric meta-data
- Developing human-centered query capabilities utilizing that meta-data for innovative new query paradigms, as for example visual exploration or facetted navigation
Example of Domain-Specific Content-Related Metadata
Example of possible analysis techniques: Visualization of Corpus, Trend-tracking
Fields: Knowledge Extraction, Online Education, Recommendation.
Research Focus: This line of research pushes for human-centered information systems in the educational domain, i.e. instead of simply offering learning materials with limited meta-data, content is analyzed with respect to which parts human users consider relevant for a certain information need. Thus, this is another application of my fundamental theory to domain-specific challenges. From a technology point of view, this shares many similarities with the previous of line of work of entity-centric extraction in digital libraries, as many methods, insights, and results can be shared across both research lines. Ultimately, I envision that both research lines can even be unified into a shared prototype implementation which can also bridge and integrate scientific knowledge into university education tailored to individual users/students.
Domain-Specific Pitch: Online education has seen a tremendous growth during the recent years, covering e-learning offers ranging from traditional online courses to Massive Open Online Courses (MOOCs) and private online courses (SPOCs). Additionally, nearly all higher education institutions support their on-site courses by providing the necessary materials like slides or other course material digitally. At the heart of this development are courseware infrastructures, platforms handling the communication between learners and teachers, and storing and distributing all relevant learning objects. Those learning objects, usually created, curated, and tailored with great care and costs by educational or domain experts represent a significant investment. Therefore, it has been a long-term challenge to provide these learning objects as open education resources to a wide public to maximize their impact with the goal to support a wide variance of target audiences like traditional learners in university courses or online courses, professionals which need to obtain focused competencies, but also educators at higher educational institutions to motivate the reuse of high-quality material to free up valuable personal resources. However, while nowadays there a multitude of platforms offering whole courses with varying degree of openness, this vision of easy access to fine-grained open educational resources has still not been fully realized. One reason for this is that current courseware platforms lack semantic and analytic capabilities to support the sophisticated query, search, and recommendation requirements necessary to efficiently serve the specific information needs of individual learners or educators – courseware platforms are mostly used as repositories for storing and statically serving learning resources alongside manually created meta-data with respect to fixed learning paths as provided by the course designer. The goal of this project is to complement current courseware platforms with state-of-the-art semantic analysis capabilities to obtain deep understanding of both the users and the resources stored in a courseware platform to provide personalized access tailored to the individual information need of users. At the core of my suggested solution are micro-learning objects, i.e. the smallest units of thematic cohesion found in learning content. In this project, I aim at identifying, extracting, and semantically annotation such micro-learning objects in an automated fashion. This annotation will cover multiple facets perceived relevant by both learners and educations like topics, didactic intend, required expertise, or perceived attributes (i.e. based on user judgements).
(2012-2015) in MovieExplore - Discovering Movies in a Human-Centric Fashion
This project focuses on discovering the perceptual properties of movies (and also other experience items) for enabeling human-centric interaction paradigms. This will allow users to interact with complex product spaces in a more natural and easier to understand fashion.
This project is funded by the presidential office of TU Braunschweig in order to foster and encourage innovation in university didactics. In this project, we develop novel concepts for teaching relational query languages like SQL, unifying approaches from modern didactics with mobile gamification. The goal of this project is to develop and establish a “serious” online game for supporting the B.Sc. lecture “Relational Databases 1”
see http://www.ifis.cs.tu-bs.de/content/sqlalchemist (in German)
(2013-2015) Anaqonda Project - Analogy Queries by Ontology-based Data Analytics
The Anaqonda project deals with intelligent queries und personalization, and focuses especially on analogy queries. This project was funded by DAAD and executed at NII Tokyo.
For more information, see http://www.ifis.cs.tu-bs.de/content/anaqonda
(2006-2010) APIS - Advanced Personalization in Information Services
The APIS project (Advanced Personalization in Information Services) investigates the impact of personalization technology on future information provisioning. Being interdisciplinary between computer science and cognitive sciences a basic belief of the APIS group is that modern information provisioning needs advanced query processing and optimization techniques using and understanding human preferences, usage patterns, conceptual views and (domain) ontologies. At the same time the architectures for information provisioning have to move from monolithical database systems to more open service-oriented infrastructures.
Source Code for our Skyline Simulator & Datasets can be found here.
For more information, see http://www.ifis.cs.tu-bs.de/content/apis
(2006 - 2008) Prolix - Process Oriented Learning and Information Exchange