HCI, IR and the search for better search with Dr. Susan Dumais

This post has been republished via RSS; it originally appeared at: Microsoft Research.

Dr. Susan Dumais

Episode 90, September 18, 2019

Dr. Susan Dumais knows you have things to do, and if you need help finding stuff to get them done (and you probably do) then her long and illustrious career in search technologies has been worth it. Situated firmly in Louis Pasteur’s quadrant of the research grid (the square where you answer “yes” to both the quest for fundamental understanding and use-based applications) the Microsoft Technical Fellow, and Deputy Lab Director of MSR AI, has made finding information the focus of her career, and has probably made your life a little more productive in the process.

Today, Dr. Dumais tells us how the landscape of information retrieval has evolved over the past twenty years; reminds us that queries don’t fall from the sky but are grounded in the context of real people, real events and real time; talks about her current interest in non-web-based search (or how I can easily put my hands on my own digital belongings) and reveals what apples and Michael Jordan have in common with search research.

Related:


Transcript

Susan Dumais: I think, more and more, information retrieval is moving from helping people find information to helping people get things done. I’ve spent a lot of my life thinking about search. It is nobody’s end goal. You don’t get up in the morning and say, I’m going to search for the next two minutes. You’re trying to accomplish a task. And search is a means by which you do that. And I think we shouldn’t ever forget that.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Dr. Susan Dumais knows you have things to do, and if you need help finding stuff to get them done (and you probably do) then her long and illustrious career in search technologies has been worth it. Situated firmly in Louis Pasteur’s quadrant of the research grid (the square where you answer “yes” to both the quest for fundamental understanding and use-based applications) the Microsoft Technical Fellow, and Deputy Lab Director of MSR AI, has made finding information the focus of her career, and has probably made your life a little more productive in the process.

Today, Dr. Dumais tells us how the landscape of information retrieval has evolved over the past twenty years; reminds us that queries don’t fall from the sky but are grounded in the context of real people, real events and real time; talks about her current interest in non-web-based search (or how I can easily put my hands on my own digital belongings) and reveals what apples and Michael Jordan have in common with search research. That and much more on this episode of the Microsoft Research Podcast.

Host: Susan Dumais, welcome to the podcast!

Susan Dumais: Thank you, Gretchen.

Host: Listen, I’ve been waiting a long time to get you on! Way back in 2017, Eric Horvitz said, you gotta get Susan on the podcast. And I guess you are kind of like a hot Manhattan restaurant: you have to book two years out!

Susan Dumais: Well, it’s finally come true and it’s fun to be here.

Host: I like to start by situating my guests and their research, so let’s get situated. You’re a Microsoft Technical Fellow and the Deputy Managing Director of Microsoft Research AI, and your work lives at the intersection of information retrieval and human computer interaction. Actually, as we’ve noted, it’s a much larger intersection than that, but we’ll keep it at those two roads for now. And you have more papers, patents and honors than it would be prudent to list in a half hour podcast. But it’s worth noting that there’s a common theme running through all the accomplishments and accolades. So, tell us in broad strokes, what’s the driving motivation behind the work you do and why you do it. What gets you up in the morning?

Susan Dumais: Yeah, I think there are two commonalities and themes in my work. One is topical. So, as you said, I’m really interested in understanding problems from a very user-centric point-of-view. I care a lot about people, their motivations, the problems they have. I also care about solving those problems with new algorithms, new techniques and so on. So, a lot of my work involves this intersection of people and technology, thinking about how work practices co-evolve with new technological developments. And so thematically, that’s an area that I really like. I like this ability to go back and forth between understanding people, how they think, how they reason, how they learn, how they find information, and finding solutions that work for them. In the end, if something doesn’t work for people, it doesn’t work. In addition to topically, I approach problems in a way that is motivated, oftentimes, by things that I find frustrating. We may talk a little bit later about my work in latent sematic indexing, but that grew out of a frustration with trying to learn the Unix operating system. Work I’ve done on email spam, grew out of a frustration in mitigating the vast amount of junk that I was getting. So, I tend to be motivated by problems that I have now, or that I anticipate that our customers, and people will have in general, given the emerging technology trends.

Host: Right.

Susan Dumais: And I approach it, not just from a use-base perspective, understanding situations that will likely happen, but also try to generalize a bit and provide a more theoretical and generalizable foundation. Donald Stokes wrote a fascinating book about basic science and technology innovation and he talks about Pasteur’s quadrant, which is use-based, fundamental research. And I characterize myself as living in Pasteur’s quadrant.

Host: That’s a good place to live.

Susan Dumais: Yeah.

Host: I love the idea that you talk about things that frustrate you and you want to solve them because if it frustrates you, it’s probably frustrating me, too. And so, I’m glad to know that you are approaching it from that perspective.

Susan Dumais: Well actually, as an HCI person, I think the other thing we need to constantly remind ourselves of is that we’re not the typical person. In fact, when we started this spam work, most people didn’t get a lot of spam emails. I’m motivated by things that frustrate me. I try to understand how broadly applicable those ideas are. But there are things that frustrate me that, if I spent, you know, a career solving them, would not benefit lots of other people. But my work is really very much motivated by pain points that I see either in myself or in others or that I anticipate seeing in technology.

Host: Well, as your work is anchored in information retrieval and search, let’s do a little “then and now” on the search landscape…

Susan Dumais: Okay.

Host: …because contrary to what we experience today, high-quality search results were not always a click away. So, give us a snapshot of the field twenty years ago and tell us how things have evolved, in part because of the work you’ve done, over the ensuing decades.

Susan Dumais: Yeah, you are absolutely right. If you’re under twenty years of age, you have probably not lived in a world where you don’t have, at-your-fingertips-access to an increasingly broad set of information, 24/7. Even in, let’s say, the mid-90s, the first web search engines were just starting. And by web search engine, I mean a system that crawls for content, indexes that content and provides it in a browser. We clearly had libraries. We had library catalogues. But the ability to have, at your fingertips, an amazing breadth of information, is really, fairly new. Some of the early search engines, things like Infoseek, Alta Vista, Lycos, were operating in a very different time. Lycos, I think, in the mid-90s, indexed a few hundred thousand web pages. They had a thousand or two thousand queries a day. Fast forward to today, and there are billions of web pages, billions of queries per day. And so, the world has evolved, you know, a lot in terms of size. It’s evolved a lot in terms of diversity of content.

Host: Hmm.

Susan Dumais: Mostly the web then was HTML pages. It wasn’t videos, it wasn’t images, it wasn’t news. And so, more and more, a variety of different kinds of information are there. The depth of the analysis that’s provided has changed tremendously. We used to just look at simple key words. More and more, we’re going beyond key words to do a deeper understanding of the language, the objects, the entities. And think about something like your phone, when you’re on the go. You’re asking queries verbally, often.

Host: Yeah.

Susan Dumais: That’s just such a far cry from typing in 2.1 words into a rectangle…

Host: Right.

Susan Dumais: …on the screen. How it’s presented, how you iterate through it, it’s becoming much more of a dialogue. So, the world has gone from a situation where search was really this arcane skill. So, you needed almost a graduate degree in library science to – there were librarians. We went to them and…

Host: Absolutely!

Susan Dumais: …asked for information – to a case where, today, search is just ubiquitous. You expect it to be there and when it’s not, it’s incredibly frustrating. So, we’ve gone from something which was a real specialty skill to something that’s just a core fabric of everything we do. You use it to find information.

Host: Yeah.

Susan Dumais: You use it to buy things, to learn about medical conditions, to learn about household or electronic troubleshooting…

Host: To find someone you are looking for…

Susan Dumais: Exactly, yeah. Sure! And that was available in different ways…

Host: Absolutely!

Susan Dumais: …not through web search engines. The ubiquity, I think, makes it more exciting for me in many ways. It’s more important to understand people, what they’re trying to accomplish and, really, to help them generate, make sense of, and find information.

Host: Well that’s an amazing segue into what you’re actually doing about it because there’s a lot that went on behind-the-scenes, from being a very specialty thing to something that I can use very, very easily every day. And in fact, my sister’s three-year-old grandchild can do it better than I can.

Susan Dumais: That’s right.

Host: What do they call a magazine? An iPad that doesn’t work.

Susan Dumais: That’s right!

Host: So, I want to talk specifically today about three areas where your research contributions have, as you say, built bridges among several communities, notably human computer interaction, information retrieval, or IR, and web. So, first, let’s start with the work you did way back at Bell Labs, before you even came to Microsoft Research, in what you referred to a little bit earlier as latent semantic indexing, or LSI. So, this work addresses what’s known as “vocabulary mismatch” in IR systems. You’ll unpack that for us.

Susan Dumais: I will.

Host: Explain the problem first, how you addressed it, and then tell us why this work from the 1990s is still relevant and highly cited today…

Susan Dumais: The last century.

Host: Yeah, right? A century ago…

Susan Dumais: In graduate school, I pursued research interest in cognitive science, so a lot of my work there revolved around building models of how people learn and retrieve information from their own memories. And when I moved to Bell Labs and really started interacting much more with what was becoming a very ubiquitous computer industry at the time, I got very interested in how people find information from external sources. So, not their own heads, but other people, computers… And one of the problems that kept coming up over and over and over again was this kind of impedance mismatch between the way that I seek information and the way that you, as an author, might have written that information.

Host: Mmm.

Susan Dumais: It was very acute at Bell Labs because I was trying to learn the Unix operating system and I wanted to find the function that allowed me to find a word in a document that I had. And it was called GREP, for Generate Regular Expression. Who, in their right mind, would have done that?

Host: An engineer!

Susan Dumais: Well, somebody who did not understand the broad set of users who might wind up using those systems.

Host: Right.

Susan Dumais: And so, there are two aspects to the problem, and they’re both due to fundamental characteristics of how people generate text. The first is called synonymy. That we use many different words to describe the same object. So, you might refer to a medical professional as a doctor or a physician. Apple means fruit, and in the last forty years or so, it’s meant a computer system.

Host: Right.

Susan Dumais: Even people, like Michael Jordan… There’s a very famous computer scientist named Michael Jordan. There’s also a more famous basketball player named Michael Jordan.

Host: Sad for the computer scientist…

Susan Dumais: No, no…! Actually, we take care of him in web search engines!

Host: I bet you do.

Susan Dumais: And so, one problem is that there are lots of ways of saying the same thing.

Host: Right.

Susan Dumais: And the other problem, which I just mentioned, is that the same word can have many different meanings. And both of those present problems for retrieval. I think the key insight in latent sematic indexing was that we tried to represent words not as isolated tokens, but as a richer representation of the context in which they appear. So, we projected words into a much lower dimensional space and the impact was, it brought together words that shared similar context. So, “physician” and “doctor” occur in the same company.

Host: Right.

Susan Dumais: And that allowed those words to be very similar in this reduced dimension, or what we call semantic space. There’s been a tremendous resurgence of interest in these word embeddings, or context embeddings, in the last five years or so. Many of the modern word embedding techniques, whether it’s Word2vec or GloVe or BERT or GPT2, really share the same goal of uncovering latent structure. That problem still exists because people write and read and understand text. And there’s tremendous variability in that. What has changed, tremendously, are the data resources. It’s easy to get billions of web pages, hundreds of thousands of Wikipedia pages. The computational capabilities have increased and really the representational richness of the models have changed tremendously, by orders and orders of magnitude.

Host: Right.

Susan Dumais: And so, I think there’s been a resurgence in rethinking what you can do with some of these approaches.

(music plays)

Host: Well, another area in which you and your colleagues have made a significant contribution is in the area of context in search. Context in anything makes a difference with language…

Susan Dumais: Right.

Host: …and this is integrally linked to the idea of personalization, which is a buzz word in almost every area of computer science research these days: how can we give people a “valet service” experience with their technical devices and systems? So, tell us about the technical approaches you’ve taken on context in search, and how they’ve enabled machines to better recognize or understand the rich contextual signals, as you call them, that can help humans improve their access to information?

Susan Dumais: If you take a step back and consider what a web search engine is, it’s incredibly difficult to understand what somebody is looking for given, typically, two to three words. These two to three words appear in a search box and what you try to do is match those words against billions of documents. That’s a really daunting challenge. That challenge becomes a little easier if you can understand things about where the query is coming from. It doesn’t fall from the sky, right? It’s issued by a real live human being. They have searched for things in the longer term, maybe more acutely in the current session. It’s situated in a particular location in time. All of those signals are what we call context…

Host: Yeah.

Susan Dumais: …that help understand why somebody might be searching and, more importantly, what you might do to help them, what they might mean by that. You know, again, it’s much easier to understand queries if you have a little bit of context about it. If I search for Michael Jordan, and you know I’m a computer scientist, that provides you a signal. If, today, I type in Hong Kong Airport, I probably don’t want to know about all the concession stores in the Hong Kong Airport, I want to know about ongoing protests there.

Host: Right.

Susan Dumais: A lot of searches are motivated by things that happen in the real world. And so that’s what context means, just trying to understand a little bit about where the request is coming from, what larger task it might be embedded in, what contextual situation it might be embedded in. If you have a single web search engine and you return exactly the same results for the same query to everyone, at every point in time, in every location, you’re going to have suboptimal performance.

Host: All right. So, going a little deeper on the technical approaches that you’ve taken…

Susan Dumais: Right.

Host: …to bring context. I’m leaving traces. Wherever I go, online, I’ll leave a little footprint or fingerprint, and that becomes part of this inferred data about who I am, what I’m doing. And like you said, if I searched the Hong Kong airport maybe six months ago, I wouldn’t get the same results today.

Susan Dumais: Right, newsworthy today, right. What you just highlighted is what I would call contextualization. So, in that case, there are spikes in queries. Queries do not occur uniformly over time.

Host: Right.

Susan Dumais: And so, when a query starts spiking, things like Hong Kong airport or Hong Kong in general, you better figure out what’s going on. In many cases it’s driven by external events.

Host: Right.

Susan Dumais: That’s not you as an individual, it’s the aggregate of people who are approaching search engines, asking different queries over time. So, you can think about it at an aggregate level. Um, you know, at a more personal level, or in a session, if you’ve asked a query that’s related to basketball, and then you ask about Michael Jordan…

Host: Mmm-hmmm.

Susan Dumais: …that gives you a hint about how to handle what might be, otherwise, a very ambiguous query.

Host: Well, a third area of contribution I want to talk about has to do with the temporal dynamics of information. This rests on the notion that information isn’t static and when you say it out loud it seems kind of like a no-brainer. Of course, it isn’t static! But the tools we’ve traditionally used tend to focus on snapshots of information rather than the dynamic nature of our information. So, tell us again, what technical approaches you’ve explored to help people interact with the reality of dynamic information.

Susan Dumais: Okay, so, you know, as you said, the world is constantly changing around us, whether it’s the world of information or the physical world in which we live. In web search, what’s changing is the content. The web is not static. We’re crawling new content all the time. The questions people ask are changing as a function of events that are going on in the world, as a function of events in their personal lives. And what’s most interesting is that what’s relevant changes. So, let me just give you an example to ground the pervasiveness. If you typed in the query “US Open,” do you mean last year or this year? It’s an event.

Host: Or do I mean tennis of golf?

Susan Dumais: Exactly. Even if you said US Open 2019, what’s relevant depends on where you are relative to that event. So right now, you’re probably not interested in the scores and results because they don’t exist. You want to buy tickets. During the event, you care about the results. And we’ve done a couple of things to try to address that. One is on the algorithmic front. So, we’ve tried to model things like how the content on web pages changes. We also model how people’s interactions change, the queries they issue, what’s clicked on. And by combining those in a kind of time-series analysis, you can understand how to weight new information versus older information.

Host: Right.

Susan Dumais: Search engines learn from people as they interact with things, what’s relevant to a particular query. But that means new information is disadvantaged because it doesn’t have that historical interaction data. And so, by being smart and modeling things as a time series, knowing how things change over time, you can do a much better job of finding information.

Host: Yeah.

Susan Dumais: We also built what I think is a really fun system. It’s still one of my favorite systems. It was a browser plug-in called Diff-IE. Not a very well-named system. I complained about GREP earlier. This is not a lot better.

Host: Still coming up with dumb names.

Susan Dumais: Right, exactly. This was a prototype we built to help people understand how the world was changing around them. And what I mean by that is, the system, all in the browser, as you visited a web page, would look at how that was different than the version of the page that was in the web cache and highlight those changes to you.

Host: Wow!

Susan Dumais: So… Yeah, it was totally a fun system. So, imagine going to a new site and you’d see what the changes were, not relative to what a news editor thought, but since you had last been there.

Host: Sure.

Susan Dumais: If you hadn’t been there in two days, it might be what the headlines were. If you were following a story, it would just show you what was different. It really brought to light for people how information changes in ways that they had never seen before.

Host: Right.

Susan Dumais: So, if I would go to somebody’s web page, I might see new publications highlighted. I might see a new job title. And that really brought the dynamics to people in ways that were, really, previously hidden. And so that was a really fun project that touched not so much on the underlying algorithms, but how we can help people understand and experience that change.

Host: The interesting thing about the temporal dynamics of things… I mean, just yesterday, my husband came home, and he said, there was a huge accident on 405. So, I go and search “accident on 405.” Well… 5 days ago? 7 years ago?

Susan Dumais: Right.

Host: I-405 in California? It’s like, there’s still a lot of work that needs to be done on this temporal dynamics thing…

Susan Dumais: Absolutely. But, and that’s a really interesting trade off between things that are, a priori, really important, that you want to make sure to continue to retrieve, and the dynamics of information. In that case, it’s also possible that the content wasn’t there. But the fact that you thought about going to web search suggests that you expect to find that kind of information there. And as you say, there’s a long way to go in a lot of this, so…

Host: Well, you know where it led me was to the Washington State DOT Twitter feed, which is immediate. You know, somebody’s on that but it doesn’t hit the web as news, necessarily, if it’s just happened in the last half hour.

Susan Dumais: Right. That gets to the point of trying to integrate different sources of information. But you need to stay on top of that.

Host: Yeah.

Susan Dumais: How do web search engines decide what to crawl and what frequency to crawl it at, or is some of the information pushed? This highlights a couple of different dimensions. One is getting the data in the first place.

Host: Right.

Susan Dumais: And then how you take all this stuff from web pages to news to maybe Twitter feeds to structured data like Wikipedia feeds and compose those into an environment or representation that can really help people. And that’s much harder if you’re on a phone.

Host: Oh, yeah!

Susan Dumais: Because if you have a big screen in front of you, you can show a lot of information, you can allow people’s visual systems to quickly scan it. If you’re on a phone, you need to take your best guess, iterate, start a conversation with people. It’s a much more temporal processing of the information than a spatial one.

Host: Also, you look at the generational aspect of this. My daughter rarely goes on her computer unless she’s doing something for school. She’s on her phone. That is her primary source.

Susan Dumais: Yes.

Host: And so that data point is got to be where a lot of researcher’s brains are heading is, well, what is the mobile-first generation, how are we going to adapt something innovative that we did into this milieu?

Susan Dumais: The world’s constantly changing, and you need to evolve. We’ve clearly gone off as an industry search and even beyond that from the desktop into the real world.

Host: Right.

Susan Dumais: And I think that raises all sorts of interesting opportunities as well as challenges.

Host: We’re not even going to talk about HoloLens or any of the other wearable technologies that I’ve had other researchers in the booth about saying hey, even your phone, looking at your rectangle, is going to be obsolete sooner than you think, so… Susan, I can find almost any piece of general information by searching the web, but my own information is fragmented, it’s scattered everywhere on apps, bookmarks, email folders, devices, etc. Tell us how your current interest in non-web search applications is going to help people like me access my personal information better.

Susan Dumais: Right. It’s interesting, I think the search industry, for a while, was focused, actually, on finding information on your desktop, finding information in email. And with the advent of the web, a lot of public information moved online. And you’ve seen a tremendous set of innovations in that arena. But search is really much more prevalent and I… a particular pain point for me – I told you I was motivated…

Host: Yeah.

Susan Dumais: …by things that annoy me – are that, uh, you know, we haven’t done as good a job of helping people make sense of their own, kind of, personal space of information, is the way I like to think about it. In many ways, it’s stuff you’ve seen before, stuff you’ve interacted with. It’s web pages, it’s email, it’s documents, apps of all kinds. There are so many times when you say I know I saw this article or I saw this photo, where is it?

Host: Yeah, was it on Twitter, was it on the web, was it on Instagram, was it on Facebook?

Susan Dumais: And there’s no reason that you should have to remember that. And so, I think the challenge is providing people with unified access to that information without necessarily making copies of it everywhere. At Microsoft, we are certainly working on it from within the Microsoft ecosystem. It’s increasingly easy to find not just files, but shared files, email, with the click of a button. In Research, Shane Williams and others have developed a prototype called TaskEasy that tries to improve that. But it’s an area that I think still has a lot of opportunity for improvement.

Host: Let me ask you a little off-script question.

Susan Dumais: Sure.

Host: Because this is a frustration of mine. When I do a web search and I misspell something by accident, it tells me “did you mean…” or “looking for results for…” or… On other websites, if I spell your name wrong, no results. I get nothing…

Susan Dumais: It’s a pain point. And the same was true in web search twenty years ago. If you mistyped something, you didn’t get anything. One of the… You didn’t. Or you got some – somebody else who randomly typed things in the same way. One of the things that search engines and lots of other web services do is understand what people are looking for in the ways in which they are doing it. Web search engines have gotten better at searching, not because the algorithms are better, but because you can observe, in aggregate, lots of people searching for things, failing to find them. There were some really interesting observations that folks published very early on about web search. They were things that were unexpected to people who were in the search industry. We all thought that people would go to web search and type in these beautiful informational requests. The most common queries at that time, in the late 90s, were things like eBay, Hotmail, Pokémon, weather, horoscope… They weren’t asking for information, they were using the web to navigate to things.

Host: Ahhh.

Susan Dumais: Getting back to your spelling example, there were many queries – things like, I think, Abercrombie and Fitch, Arnold Schwarzenegger – that are misspelled more than they’re spelled correctly.

Host: Right.

Susan Dumais: But it’s learning by people typing things incorrectly, looking at their reformulations and then figuring out how to improve the spelling correction…

Host: Right.

Susan Dumais: …to handle those cases.

Host: We’ve talked about personalization…

Susan Dumais: Right.

Host: …which in theory is something we all want, but there’s always some big trade-offs here. We’ll get to the pitfalls in a second, and the discussion of the downside of large-scale behavior analysis. But right now, tell us about the potential of large-scale behavioral analysis that helps you contextualize things.

Susan Dumais: One of the things that’s happened over the last two decades is that web-based services – whether it’s a website that you go to, travel sites, shopping sites – web-based services like this, because they see lots and lots of information, have provided this really new lens onto how people are interacting with their systems. They provide insights about how you can improve those systems. This is a lens onto people’s behavior that we just never had before. Even when I joined Microsoft, when I first joined, folks from Office Help came and said, help us fix Office Help Search. And so, my first question was, what are the most common queries? And they go, we don’t know. What are people looking for? We don’t know. The reason they didn’t know is Search for Office Help happened on your desktop machine. All the Office Help was downloaded to your desktop. All of the searching was done on your desktop. We knew nothing about what people were asking or whether they were being successful.

Host: Because your desktop was Las Vegas, whatever happened there, stayed there.

Susan Dumais: Exactly. My desktop is a little cleaner than Las Vegas, but yeah!

Host: Good to know.

Susan Dumais: And the minute they moved Office Search onto the web, you learned all sorts of things. And so, by knowing what people are seeking, doing, we can create the relevant content. We can create the relevant algorithms. And so, this has been an amazingly rich lens, this virtuous feedback cycle between delivering content and using it, to understand what it is that people are looking for and where the failure points are. It’s hard to understate how much systems have really changed because of that.

(music plays)

Host: Right. Well, like a recurring nightmare, here we are again at “what could possibly go wrong?”

Susan Dumais: Right.

Host: And you’ve done a lot and seen a lot over the course of your career. One thing that’s of great interest to me is this idea that, in order to help us get better search results – and I want that. I want personalization, on the one hand – but the things I have to give up about my own personal information, my privacy, I’m giving up to the web to help you make my search better. So, talk about the potential pitfalls here, because I know you are thinking about them. What keeps you up at night?

Susan Dumais: Yeah, sure. In general, there’s really a need to balance, in a very thoughtful and responsible way, the benefits that accrue from seeing various kinds of interaction, understanding how people are interacting with systems, and the potential risks for storing information about individuals that enable these services. For some of the things that we talked about, in terms of spelling correction, the fact that there were navigational queries that people had not anticipated, those happen at the aggregate level. And frankly, a lot of the insights happen at the aggregate level, or group level. And some of them happen at the individual level, but many of them happen at a much higher level. We all make these tradeoffs every day. You know, I give credit card information to some services because it’s easier for me.

Host: Mmm-hmm.

Susan Dumais: I want to save my purchase history in some places because it’s much easier to go back and refine things. And I think, you know, as a company, Microsoft is tremendously invested in protecting people’s privacy, the security of information that people entrust us with. So, I think, as an industry, what we need to do is work, first and foremost, to protect whatever data we have. Also, to be clearer on what information is being stored, be transparent about it, and provide people with ways of opting out of that. When I type in the beginning of a name, I would like it to auto complete.

Host: Sure.

Susan Dumais: When I move to a new computer where that’s not the case, I find it frustrating. And again, there are ways that data can be stored over different time horizons. It can be aggregated and anonymized. I think search engines, in particular, but really almost any web service, tries to strike the right balance between understanding things at a very fine level and then aggregating things where that’s relevant and appropriate.

Host: All right. It’s story time. I happen to know you didn’t start out thinking, I’m going to be a computer scientist or the Deputy Lab Director of MSR AI. So, tell us how it all began for you, maybe not back to when you were a baby, but, you know, kind of academically, and how you landed here at MSR in your leadership role today?

Susan Dumais: Well, what you just said is certainly true. Microsoft didn’t exist when I was in high school and in graduate school…

Host: Me neither!

Susan Dumais: …so I had no aspirations of being there! If I did, I would be incredibly wealthy right now. Um, yeah. When I look back on my career, I think it’s fun to reflect on a few pivot points, because the road from where I was as a high school student and undergraduate in Maine, to Redmond, Washington, and the tech industry, is not one that I had planned, from end-to-end, and I was able and lucky enough to be in environments where I could take some risks and take some turns. So, let me just tell you a few of them…

Host: Yeah!

Susan Dumais: …that really stand out in my mind. I started college as a math major intending to go to law school. I wanted to do environmental law. I took a course when I was a junior called Mathematical Psychology which was a course that talked about how people learn information, how they recall information, and how you can precisely describe the evolution of learning and retrieval of information from memory. And I was just smitten. I just thought it was the most fascinating thing, blending algorithms with the ability to understand people and how they worked. And so, I just decided that I was going to go to psychology graduate school. I had no idea what it was. My parents were even more concerned, but I did it. I had a blast doing it. And then, when I finished my PhD, I had every intent of teaching at a university. And when I was looking for jobs, I got a call from Bell Labs. And they had just started the industry’s first Human Computer Interaction lab. And I was still all set on going to a university and my undergraduate advisor called and said, I hear you are going to Bell Labs. And I said, no, I don’t think so. He literally said, you ought to have your head examined! And I asked why, and he made a very good point which was, you really have nothing to lose by this, and a lot to gain. You’re at the beginning of something that could be a really important future direction. And, if you decide you don’t like it, you can leave, and two years later you’ll be better off than you are now in looking for jobs. And almost forty years later, you can say that it suited me very well! And my transition from Bell Labs to Microsoft was also based on opportunities that I decided to seize. We had had a post doc at Bells Labs who was a product manager in Office at the time on FindFast, and he said, hey, Microsoft Research is looking for somebody in information retrieval. I told them they should reach out to you. And again, I said, I’m not looking to move. But I came, I really enjoyed meeting the people. The problems, the scale of problems, I could just see being very, very different from what I had.

Host: Mmm.

Susan Dumais: And again, now twenty-two years later, it’s been, maybe, one of the best decisions in my life, in part because what I’m interested in, helping people create, find, manage, make sense of information, is exactly what Microsoft is about.

Host: Right.

Susan Dumais: So, every question I have, every innovation I have, has really natural outlets. So, I find that really sort of exciting and fun. MSR is also just this amazingly vibrant intellectual environment that I love, people from lots of different perspectives coming together.

Host: Well, as we close – and I’m sad that we’re closing, because you’re fun – there are a handful of people who’ve really earned the right through length, depth and quality of career to give advice to people, and you are one of those people. Let’s frame the final question in terms of your leadership role in cultivating the next generation of talent here at MSR. Tell our audience, from your perspective, what’s on the horizon in the field and why is now a good time to be a researcher?

Susan Dumais: When I think back on my career and I look at other successful people, I think we all share some traits that I think are important to think about. One is, have a purpose, but also be willing to seize a new opportunity. And I just told you several times…

Host: Right.

Susan Dumais: …how really pivotal points in my life came from having a true north, but also be willing to take not the obvious and straight path to it.

Host: So, Jack Sparrow’s compass?

Susan Dumais: Exactly.

Host: Wherever.

Susan Dumais: No, no, and actually not wherever. I had a goal, but I was also willing to deviate when there were opportunities.

Host: Sure.

Susan Dumais: The second is to be passionate about what you do. I think I’m incredibly fortunate to be in an environment where my passion and what people pay me to do align, but, in any endeavor, you’re going to work hard, you’re going to work long hours. Find something that speaks to you. It might be an application area. It might be a particular theoretical framework, a methodology. But make sure that, at the end of the day, when you’ve worked really hard, you’re proud of that outcome. And perhaps the most important thing is to persevere, be persistent in what you do. There is no straight path to an aspiration and how you get there. And I think it’s often deceptive because students will see this brilliant talk by somebody who is very well-known in the field, and go, oh my gosh, this person is just brilliant. Sure, they may be brilliant, but they’ve also worked hard behind-the-scenes. They’ve tried lots of things that failed. And I think it’s really important to stick with it and learn from failures, but also celebrate successes. In terms of, I think, some of the interesting areas moving forward, let me just mention three. One of them is that I think, more and more, information retrieval is moving from helping people find information to helping people get things done. I’ve spent a lot of my life thinking about search. It is nobody’s end goal. You don’t get up in the morning and say, I’m going to search for the next two minutes. You’re trying to accomplish a task. And search is a means by which you do that. And I think we shouldn’t ever forget that. So really, trying to go from finding information to using that information in a way that helps you solve the problem. The other one we mentioned briefly before, it’s moving off the desktop into the world. More and more, our systems are interacting. There’s this interesting mix of digital and physical worlds. And I guess the last is a personal one. I think there are really interesting opportunities, moving forward, to combine insights from computation, cognitive science and neuroscience. It’s an area that I haven’t had as much time to spend as I would like, but I think there’s some interesting things coming together in that space.

Host: You know, I’m glad that you’re passionate and persistent about what you’re doing because it’s helped my life in many, many ways. You are right, I don’t get up and say, I’m going to go search. I have to find something, and I need that click to be the one I want. Susan Dumais, thank you so much, FINALLY, for coming on the podcast!

Susan Dumais: Thanks Gretchen. It’s really been fun to talk with you.

(music plays)

To learn more about Dr. Susan Dumais and how the search for better search goes on, visit Microsoft.com/research

The post HCI, IR and the search for better search with Dr. Susan Dumais appeared first on Microsoft Research.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.