Greg Diamos AI and ML software pioneer

Tech Barometer – From The Forecast by Nutanix

03/04/24 • 9 min

In this Tech Barometer podcast segment, Greg Diamos tells how an early passion for computing led him to a pioneering role Baidu’s Silicon Valley AI lab, where he discovered ways to scale deep learning systems, and went on to co-found MLCommons and AI company Lamini.

Find more enterprise cloud news, features stories and profiles at The Forecast.

Transcript:

Greg Diamos: I think computing is a way to give people magical abilities, like superpowers. I like to live in a world where you can give it to everyone. Like, what if every single person had superpowers? I had basically ten years with nothing to do and just a pile of computers in my garage.

Jason Johnson: Just getting the ball rolling, I was wondering, Greg, could you tell me a little bit about what it is you do?

Greg Diamos: I’m one of the people who builds the software infrastructure that’s needed to run applications like ChatGPT or Copilot. Let’s see, I think it’s helpful to start from the beginning. My last name is Diamos. It’s a Greek name. It’s kind of cut in half. About a hundred years ago, my family immigrated to the U.S. They mostly settled in the West.

Before I was born, my father, he moved to Arizona. And his goal in life was to retire and find someplace relaxing and to go and play golf most of the time. And so he moved to Carefree, Arizona. Most people, when they think of Arizona, they think of Tucson or Sedona or places that normal people would go. You go to the Grand Canyon or something. Carefree is a place where no one ever goes. It’s right in the middle of the desert. There are not a lot of people, but it has a lot of golf courses and it’s very relaxing. It’s like a spa. So it’s a good place for that, but it was a very boring and brutally hot place to be as a 10-year-old boy.

[Related: AI Reorients IT Operations]

So I was bored, bored out of my mind. My mom worked as a database administrator for IBM. IBM at the time, they had a mainframe business. And so they would basically throw away the last generation of stuff. She would just take all the machines that were headed to the dumpster, put them up in the car, drive them out and throw them in our garage because it would be such a waste if such a thing was thrown away. I had basically 10 years with nothing to do and just a pile of computers in my garage. So it took me from when I was 10 years old until when I was about 25 to kind of figure out how the machines worked. It seems, like, magical to me that you could build such a thing. You could do that as a little boy stuck in the desert. I did a PhD in computer engineering. After that, I went to Georgia Tech.

Okay, let me tell the story of this one. I really love telling the story. I actually worked at the Baidu search engine. You could think of it as the Google of China. If you go to the search bar, there’s a little microphone on it. You press the microphone, you can talk to it. It was based on the last generation of machine learning technology. It still used machine learning, but it didn’t use deep learning. This exists right now in all major search engines. Around 2014, 2015, Baidu was in the process of deploying their first deep learning system.

One of the projects that we had in the Baidu Silicon Valley AI lab was to apply deep learning, which had just been invented and was just starting to work to improve that particular product. Along the way, we had a number of researchers. One of the researchers was Jesse Engel. He was a material scientist, but he was not a deep learning researcher. Actually, none of us were deep learning researchers because deep learning didn’t exist. One of the things that he did is he performed sweeps over all the parameters in the system.

One of the sweeps produced this plot that seemed to show that there was a relationship between the accuracy of the system or the quality of the system with the amount of data that went into the model and the size of the model. If you think of a neural network as just being a collection of a ton of simulated neurons, not real neurons, but simulated neurons in that system, trained on thousands and thousands of hours of recordings of people talking. It seemed like as you added more neurons or simulated neurons, and you added more recordings of people talking, the system got smarter. It didn’t just happen in an arbitrary way. It happened in a very clear relationship you could fit with a physics equation, E equals mc squared kind of equation, a one parameter equation. It was a very simp...

03/04/24 • 9 min

Generate a badge

Get a badge for your website that links back to this episode

Select type & size

Copy