Jay Pujara's Research

Research

Summary

We inhabit a vast, uncertain, and dynamic universe. To succeed in such an environment, machine learning approaches must handle massive amounts of noisy, changing evidence. My research tackles these challenges: I devise scalable algorithms for big data, design probabilistic models that gracefully capture uncertainty, and develop techniques for streaming inference that provide theoretical guarantees.

In addition to addressing fundamental challenges in machine learning, my work is also driven by important, practical questions in artificial intelligence. How can we use the wealth of knowledge on the Web to construct structured knowledge bases? When a user rates a new item, how can we update recommendations so other similar users benefit? What can we learn about the invisible influences of organizations from the social media activity of their followers? The central thread connecting these diverse questions is the need to exploit relationships and dependencies between instances -- whether they are facts in a knowledge base, items in a product catalog, or users of a social network.

Interests

Scalable Machine Learning
Probabilistic Models
Statistical Relational Learning
Knowledge Graph Construction
Streaming and Online Inference
Natural Language Processing
Social Network Analysis

Research Projects

Knowledge Graph Construction

Transforming noisy text into useful knowledge by combining statistics and semantics.

The web is a vast repository of knowledge, but automatically extracting that knowledge at scale has proven to be a formidable challenge. My research identifies the common failure patterns in information extraction projects and proposes solutions that use statistical signals from NLP pipelines and semantic knowledge from ontologies to build beter knowledge graphs. These techniques can improve F1 measure over IE approaches by 25% and, in a parallel implementation, require only ten minutes on KGs with millions of facts!
Learn more: ISWC13, AIMag15, Thesis16, AKBC13, GitHub

Streaming Collective Inference

Coping with change as only probabilistic models can!

A key challenge of many artificial intelligence problems is that the evidence grows and changes over time, requiring updates to inferences. Every time a user rates a new movie on Netflix, posts a status update on Twitter, or adds a connection on LinkedIn, inferences about preferences, events, or relationships must be updated. My work investigates approximate updates as new evidence arrives, providing theoretical guarantees and offering practical algorithms
Learn more: UAI15, AKBC14, StaRAI15, GitHub

Entity Resolution for Relational Domains

Easy recipes for delicious collective entity resolution.

The world is filled with ambiguous references to entities. These problems show up everywhere from a blurry photo, pronouns and titles in news articles, or misspelled names. Often relational information exists that can help remove the ambiguity of these references. Unfortunately, most relational entity resolution systems require painstaking effort to adapt to new problems. My work provides a unified approach to entity resolution that can be applied to any problem, and is easily customized to incorporate domain knowledge.
Learn more: StaRAI16, BayLearn14

Optimizer-Guided Active Inference

Pushing active learning into the optimizer for efficiency and generality.

Learn more: MLG16, UAI15

Social Network Analysis

I get insights with a little help from my friends.

Learn more: ASONAM16, SOCIAL12

Large-Scale Hierarchical Topic Models

Quickly finding the hidden structure in text over millions of documents.

Learn more: BigLearn12, EMNLP15

Coarse-to-Fine Active Feature Acquisition

Feature-efficient classifier cascades for structured prediction tasks.

Learn more: CEAS11, NIPS10