We inhabit a vast, uncertain, and dynamic universe. To succeed in such an environment, machine learning approaches must handle massive amounts of noisy, changing evidence. My research tackles these challenges: I devise scalable algorithms for big data, design probabilistic models that gracefully capture uncertainty, and develop techniques for streaming inference that provide theoretical guarantees.

In addition to addressing fundamental challenges in machine learning, my work is also driven by important, practical questions in artificial intelligence. How can we use the wealth of knowledge on the Web to construct structured knowledge bases? When a user rates a new item, how can we update recommendations so other similar users benefit? What can we learn about the invisible influences of organizations from the social media activity of their followers? The central thread connecting these diverse questions is the need to exploit relationships and dependencies between instances -- whether they are facts in a knowledge base, items in a product catalog, or users of a social network.


  • Scalable Machine Learning
  • Probabilistic Models
  • Statistical Relational Learning
  • Knowledge Graph Construction
  • Streaming and Online Inference
  • Natural Language Processing
  • Social Network Analysis

Research Projects

  • image

    Knowledge Graph Construction

    Transforming noisy text into useful knowledge by combining statistics and semantics.

    The web is a vast repository of knowledge, but automatically extracting that knowledge at scale has proven to be a formidable challenge. My research identifies the common failure patterns in information extraction projects and proposes solutions that use statistical signals from NLP pipelines and semantic knowledge from ontologies to build beter knowledge graphs. These techniques can improve F1 measure over IE approaches by 25% and, in a parallel implementation, require only ten minutes on KGs with millions of facts!

    Learn more: ISWC13, AIMag15, Thesis16, AKBC13, GitHub
  • image

    Streaming Collective Inference

    Coping with change as only probabilistic models can!

    A key challenge of many artificial intelligence problems is that the evidence grows and changes over time, requiring updates to inferences. Every time a user rates a new movie on Netflix, posts a status update on Twitter, or adds a connection on LinkedIn, inferences about preferences, events, or relationships must be updated. My work investigates approximate updates as new evidence arrives, providing theoretical guarantees and offering practical algorithms

    Learn more: UAI15, AKBC14, StaRAI15, GitHub
  • image

    Entity Resolution for Relational Domains

    Easy recipes for delicious collective entity resolution.

    The world is filled with ambiguous references to entities. These problems show up everywhere from a blurry photo, pronouns and titles in news articles, or misspelled names. Often relational information exists that can help remove the ambiguity of these references. Unfortunately, most relational entity resolution systems require painstaking effort to adapt to new problems. My work provides a unified approach to entity resolution that can be applied to any problem, and is easily customized to incorporate domain knowledge.

    Learn more: StaRAI16, BayLearn14
  • image

    Optimizer-Guided Active Inference

    Pushing active learning into the optimizer for efficiency and generality.

    Learn more: MLG16, UAI15
  • image

    Social Network Analysis

    I get insights with a little help from my friends.

    Learn more: ASONAM16, SOCIAL12
  • image

    Large-Scale Hierarchical Topic Models

    Quickly finding the hidden structure in text over millions of documents.

    Learn more: BigLearn12, EMNLP15
  • image

    Coarse-to-Fine Active Feature Acquisition

    Feature-efficient classifier cascades for structured prediction tasks.

    Learn more: CEAS11, NIPS10