Blog
Alumni Profiles Series: Animesh Srivastava
Dr. Animesh Srivastava received his Ph.D. in computer science from Duke in 2017, specializing in user privacy support, following a B.S. in computer science from Haldia Institute of Technology and an M.S. in computer science from the Indian Institute of Technology, Kharagpur. During his graduate studies, he developed notable projects on visual privacy for Android devices, creating a custom module to enhance data security. Post-graduation, Dr. Srivastava briefly joined a Bay Area startup focused on smart home technology, contributing his expertise in performing computation on resource-constrained devices. Currently, he is part of an applied privacy research team at Google, working on innovative solutions to manage and analyze the overwhelming volume of user feedback. His recent work includes the development of a pipeline to identify and categorize privacy-related issues from millions of user submissions; this pipeline employs state-of-the-art NLP algorithms and clustering techniques to identify and cluster privacy concerns at scale. Dr. Srivastava's contributions are pivotal in advancing how tech companies understand and respond to user privacy needs.
When you began your graduate study, what professional or career plans did you have in mind?
When I started my Ph.D., I wasn't thinking about career goals; at that time I was mostly interested in working on mobile devices. The mobile computing area was something that was very new, and the kinds of papers that were coming out were very exciting. What stood out to me was that you could infer so much personal information from the things shared with these small devices: as I continued working with the technology, mobile devices became even more personal, so I was just excited about working in mobile computing with an angle of privacy.
By the end of my second year, I had also accrued some teaching experience by working with my advisor. He gave me plenty of good opportunities, and as I interacted with more students I found that I really enjoyed working with them. I began to think, “Maybe I'll try to go towards academia.”
But towards the end of my Ph.D. I began to become more inclined towards building large-scale systems in the form of products that can reach out to users. In the final years of my program, I thought more seriously about joining industry, building products, and shipping them to users to help them in a way that hasn't been done before.
Did a particular moment or experience during your time at Duke impact your choices or direction?
There actually was not one big moment, but several small interesting moments. I remember reading papers from different companies; there was one really good paper from Amazon about distributed storage. I also saw some really good papers from Google, and what was most intriguing was that some of these papers could only be done at Google because you'd need such a huge amount of data to build your hypothesis and create large-scale experiments; and of course, the goal is to build a system that works not just for hundreds of thousands of people, not even millions of people. It should work for billions of people with 99.9% availability. Now, those are really interesting avenues to work in!
As I read these papers, and I saw the kind of work that was going on, I slowly became more inclined to work in these directions. I don't think there was one thing that made me say, “Yeah, I want to go to industry.” I saw that in industry, you get access to so much data and so much exposure. You receive so many signals from such a vast number of users, and that can help you guide a large-scale system with the goal that it should work for everyone, ideally. It’s unsatisfactory to have to say, “This thing I created works for this group of people, but it does not work for that group of people.”
You mentioned being influenced by papers from big tech companies. How did you wind up working at a startup?
As I was building computer vision modules during my Ph.D., I would integrate them into the default Android operating system. In a way, I was building my own Android operating systems. When I was just about to graduate, I got this opportunity to talk to the CEO of a startup company in the Bay Area building smart homes—homes that have a lot of sensors and interactivity with their residents. Think cameras, infrared cameras, audio sensors, all different kinds of sensors. In a lot of ways, it was very similar to what I was working on with Android devices; they were building an operating system that can control all these sensors, to make the experience of the residents in the home a comfortable one, that was sensitive to their needs. There was a very good overlap between my interests, this smart home product's goal, and the skills that I had earned during my Ph.D. that felt like a natural extension.
Was the transition from Durham to the Bay Area a natural one to make at that stage in your career?
During grad school, I interned at HP Labs twice, first in 2014 and then again in 2016, so I had already gotten to experience the West Coast. Silicon Valley is commonly known to be the place where stuff happens in tech, and people move to this side of the country for certain kinds of jobs, so I knew a bit about the culture beforehand.
But being here for myself, it was the first time I was experiencing it rather than just hearing about it. It was very common to see people, you know, in restaurants talking about some startup ideas. And these people are not just CEOs and CFOs, they’re students who are working in undergrad, or they're working on their Ph.D. degrees. And they're outside in a restaurant talking about these great ideas that can then become a startup. You can also see some venture capitalists sitting at the same table with them, listening to the idea, and everybody is involved in planning something very organic and home-grown.
Of course, it’s common knowledge that most startups will fail sometime between the first two and five years, but it's the spirit that we keep trying even with that knowledge. Some of it will work, and most of it won't work. But that's okay, as long as we keep that spirit of, “Okay, let's try something new.” I had the opportunity to experience all that, and I was looking forward to going back to this different environment after being in Durham for so long; I had slowly ramped up myself over these two internships, and I was looking forward to it.
Have you ever thought of creating your own startup? Is that something that's on the horizon for you?
Definitely, I definitely have ideas. I've talked about some of the ideas with the people around me, and I would love to start my own startup at some point. When the right time comes with the right group of people. I would definitely love to have my own startup.
What advice would you give to current Duke graduate students?
Talk about your project with as many people as you can. Talk to other students, to faculty, and to people outside of your advisor. Tell them what you are working on. Get feedback as early as possible and listen well, but remember that you have the final authority to decide what to incorporate and what not to because you are the one who is working in this particular field. You are more passionate than others. That's why you are working on this problem. Don't be shy when talking about your project, especially if you feel like it's not yet developed: this is how it's going to get developed, not by staying alone and just reading papers.
You recently authored a paper at Google about a deep learning system for navigating privacy feedback at scale. Can you talk about this project and its genesis?
While Google receives over three million user feedback submissions daily, manually processing them all for privacy concerns is impractical. The subjective nature of privacy, coupled with varying cultural and regional understandings, further complicates the identification of relevant feedback.
In 2019, two people from my team were working on problems related to natural language and they were wondering, “Is there a way to uncover users’ privacy concerns at scale?” That was really the genesis of this project. It was very clear that it would not be feasible to manually go through these daily three million feedback submissions. Of course, there are machine learning models that are really good at understanding these kinds of [privacy] patterns. But we wanted to do something beyond just pattern recognition: we don't want to just match and find keywords, because my home address can get mixed up with my IP address, and home address can be super sensitive whereas IP address may not be so. There has to be some semantic contextualization of what, specifically, the privacy problem is.
We developed a model that could understand text and extract privacy-related issues, and it was very successful; we shared our results with other teams that were working on similar problems, and we told them, “Hey, our model can now find thousands of privacy-related feedback submissions in an hour.” That was a big, big jump, and they became our first customer, so to speak.
At this point in the project, we had a model that could separate user feedback that talks about their privacy concerns from the rest of the feedback corpora. But even though it’s now all privacy-related, this is still a huge amount of feedback; no one is going to see it all. So what can we do? We thought, “Can we cluster similar issues together?” For example, if we find that many feedback submissions are identified to be talking about search history deletion, we would create a cluster of just these search history deletion submissions, so you don't have to manually read all the feedback.
We found that the challenge became different. People talk about the same problem in different ways based on their own understanding, so could we group them together? We developed an algorithm that automatically generates theme titles: the first theme title could be “search history deletion,” and the second could be “sensor issues” based on the contents. So now our engineers can easily see the problem, the volume, and they can begin to understand and compartmentalize.
Then there was one last thing that we felt we needed to address. Typically, when you look at bugs that affect a lot of people, you want to address them in a way that helps the greatest number of people. However, privacy issues are unlike other bugs: while technical bugs might affect specific functions, privacy issues can erode trust and have wide-ranging consequences for user experience.
Sometimes, there are particular privacy concerns, such as, “My picture was posted without my consent. It appears in the Google search results. I may lose my job because of this” or “In my culture, this is considered very bad. And there can be drastic effects such that I'm afraid of for my life.” We realized that there are privacy issues that have fear or anger attached to them. Those are more serious. We then built an emotion model; now, when feedback comes, we not only label its relation to privacy and the scope of the issue being talked about, but we also extract emotion.
In the process of doing all this, we've built state-of-the-art algorithms. We built really scalable models that can operate on millions of feedback submissions within hours, and we felt that this is a really strong contribution to the research community. That's why we published this paper. We explained all our methodology, including how we built these models so that others can reproduce them.
Author
Jabari Kwesi
Ph.D. student, Computer Science
Jabari Kwesi is currently a Ph.D. student in the Department of Computer Science at Duke University. He received his B.S. in Computer Science, also from Duke University, in May of 2023. His research involves studying the development, implementation, and effects of data collection and surveillance technologies in American communities. In his free time, he enjoys playing and watching basketball, as well as spending time with friends and family.