We're temporarily closed, but we look forward to seeing you online at MOS at Home.
View Today's Schedule
We ask Shilpa Lawande, a data engineer at Facebook Boston, all about data and its impact on our lives during this Pulsar podcast brought to you by #MOSatHome. We ask questions submitted by listeners, so if you have a question you'd like us to ask an expert, send it to us at firstname.lastname@example.org.
ERIC: In the 1990s, you could fill up an entire 3 and 1/2 inch floppy disk by typing for three straight days. Today, the equivalent for storing data, a USB drive, would have room to spare even after you typed for 4,000 years.
Storage is just one aspect of data management that we've gotten drastically better at in the last few decades. Modern technology allows us to collect, organize, and analyze incredibly large quantities of data.
Today on Pulsar, we explore what data really is and how we can put today's modern data sets to work improving our lives. Thanks to Facebook Boston for supporting this episode of Pulsar.
My guest today is Shilpa Lawande, an engineering director at Facebook Boston. Shilpa, thanks so much for joining me today.
SHILPA: Thank you for having me.
ERIC: So to start with, what made you interested in a career in computer science?
SHILPA: So I grew up in India. My parents were both in the field of science, my mom a math teacher and my dad a physicist. I grew up hearing stories of Richard Feynman, and Schrodinger's cat, and so on. So I always assumed that I was going to do physics as my career.
So when it came time to choosing a major, I was surprised when my dad said, hey, when you are going to be 10, 15 years from now in the workplace, you will realize that all science is done with computers.
And so why not study computer science and get really good at it? And then you will be able to make an impact on whatever field of science you like, not just physics.
So that advice resonated with me, even though I had never touched a computer in my life. I studied computer science in engineering school. And there, I found that I was the only girl in my class. So that was a further shock.
But eventually, I realized computer science has this really nice way to blend logic and creativity. And so that was what made me continue and pursue a career in the technology field.
ERIC: Can you tell us about what you're working on now?
SHILPA: I'm a data engineer. I work with data infrastructure. So if you think of an app that you might use, there is complex systems that are underneath that app that are responsible for storing, processing, and moving data around. I'm part of a team that builds infrastructure of that sort.
At Facebook, my team is involved in supporting data infrastructure for the Facebook family of apps. So if you're using Instagram, posting pictures on Facebook, or sending messages with Messenger or Messenger Kids, then my team is involved in some of the data bits that are underneath those apps.
Previously, I also worked on a system called Vertica, which was able to query very large amounts of data in a fraction of a second.
ERIC: When we ask our listeners questions about a subject, they often want to know more about what the subject actually is. So can you give us a good definition of data from a computer science standpoint?
SHILPA: Data is basically a collection of facts. You could call it numbers, observations, information about the sort of interactions that people have with the physical world, or digital footprints of interactions in the digital world.
Really, any piece of information that is usually processed, stored by a computer, we refer to as data.
If you're a teacher taking attendance in class and turning it into report cards, we call that data. When you look at Google Maps and you find your way, the maps are data. Data is, really, all around us-- weather, sports statistics, census, water information. It really comes in many different forms.
ERIC: We got a question from Megan, who said that the human brain is so good at analyzing data and looking for patterns, she was wondering why we need computers to analyze that data for us.
SHILPA: Human beings are really great at solving problems where creativity is involved. We are not really great at doing the same thing over and over.
So as far as coming up with calculations or computations, human beings are actually pretty slow compared to what a computer might be able to do. Millions, billions, even quadrillions of operations per second is what you could do with a computer.
Human beings also get bored. We get tired. We need to sleep. Computers need none of those things. What is interesting to me is that behind every computer program I've written, there is a human being.
Computers are just really better at following those instructions and doing repetitive tasks. And so that's why we need them.
ERIC: And a few people asked us about the term big data, which has been used more and more often. So can you explain what that is?
SHILPA: Before the time of the internet, many of the interactions we had were people generating data. Like, you go to a grocery store, and the person enters your grocery list into the computer.
And maybe that's used to charts for sales or to check which kind of apple is more popular, or things like that.
Now, as people are interacting more and more online, there is more and more opportunity to collect data about these digital interactions. And this data is collected at a much faster pace and a much finer grain. So imagine every click that you do on a website - imagine the data that that produces.
One of the reasons that you might collect such data is that by putting data of many people's interactions together, you may be able to come up with models that then are used to create more personal experiences.
So for example, when you go to YouTube or Netflix and watch a movie or a video, it may show you right on the side, other videos or other movies that you might like.
How do you do that? It is by analyzing what is done by people watching that video. What other videos do they like? By collecting a large aggregate of data, you can create better experiences.
Another reason that we generate what is called big data is because of connected devices or sensors. So if you think of the Nest thermostat that is, maybe, in your home, or smart energy meters that are in your yard, these are things that can be used to collect machine telemetry - such as temperature, or energy readings, and so on - which may be used to create more efficient modes of operation.
So like, it may turn down your heat when you're not home. And so this data is much bigger in size, created at a much bigger velocity, and so hence the name big data.
ERIC: Heather asked us about the good that can be done using these large data sets and collection methods. So can you give us some examples of how data can be used to improve our lives?
SHILPA: The one that is probably closest to what we are experiencing is right now, we are in the middle of a pandemic. There's a lot of good, publicly available data on COVID-19, how it spreads, where the occurrences are, and so on.
A lot of that data could be analyzed and visualized. If you want to check out one of these visualizations, try looking at domo.com. They have a coronavirus tracker that updates every 10 minutes using all the publicly available data.
It actually uses the software I built previously, called Vertica, to analyze this data under the covers.
That's a great example of the use of data for public health. Similarly, if you have a Kinsa thermometer at home, to you, it's just an everyday-use device. But the aggregate readings of temperatures across communities and so on can be used to predict outbreaks of contagious diseases, like the flu.
Another great area, which is somewhat emerging right now, is the area of digital farming. So if you can imagine the problem of feeding the world while also being sustainable and being environmentally friendly, digital farming is about taking weather data, sensor data about the moisture in the soil, and the temperature, and so on, and using it to improve productivity of farms. And they do that with smart tractors, and aerial imaging of farms, and so on. So that's really like cutting-edge food production.
I'll leave you with one last fun example, one that you may not think about. Where data could be useful is art. A good friend of mine, Jason Bailey, has an art blog called artnome.com. He has managed to collect a database of artworks and data about artworks, the largest art database in the world, that he collected painstakingly by himself.
You can sort of find things like van Gogh made 96 paintings a year, roughly one painting every four days. Why is this interesting? Because people who are art collectors, before this, were flying blind. They had no idea how to decide how to value paintings. So now they have an actual quantitative way of getting different artists and new ways of looking at how to value artwork.
ERIC: Those are all really great. Finally, if some of our listeners are interested in learning more about computer science and coding, what can they do?
SHILPA: At Facebook, we sponsor Girls Who Code. It's a great nonprofit organization that has many programs for girls and young women who want to study computer science. If you are younger, and are still in school, and would be interested, there are toys like Ozobot and Osmo that have fun ways to learn the concepts of programming without actually having to learn a programming language. This is called block programming. My kids love Scratch and my daughter, especially, recommends CodeCombat.
ERIC: Well, Shilpa, thank you so much for answering all our questions about data today.
SHILPA: Thank you for having me.
ERIC: You can visit the Hall of Human Life in the Green Wing of the Museum of Science and compare your own data against other visitors. Please visit engage.mos.org to support the Museum of Science and MOS at Home.
Until next time, keep asking questions.
Theme song by Destin Heilman