Is that an image of a cat? It’s a simple question for human beings, but was a tough one for machines—until recently. Today, if you type “Siamese cats” into Google’s image search engine, voilà!, you’ll be presented with scores of Siamese cats, categorized by breed (“lilac point,” “totie point,” “chocolate point”), as well as other qualities, such as “kitten” or “furry.”
What’s key here is that while some of the images carry identifying, machine-readable text or meta information, many do not. Yet the search still found them. How? The answer is that the pictures— more accurately, a pattern in the pictures—was recognized as “Siamese cat” by a machine, without requiring a human to classify each instance.
This is machine learning. At its core, machine learning upends the programming model, forgoing the hard coded “if this, then that” instructions and explicit rules. Instead, it uses an artificial neural network (ANN)—a statistical model directly inspired by biological neural networks—that is “trained” on some data set (the bigger, the better) to accomplish some new task that uses similar but yet unknown data.
The data comes first in machine learning. The system finds its own way, adjusting and refining its model, iteratively.
But back to Siamese cats. Computer vision researchers worked on image recognition for decades, but Google effectively perfected it in months once the company developed a machine-learning algorithm. Today, machine-learning facial recognition systems for mug shots and passport photos outperform human operators.
Not New But Definitely Now
In fact, machine learning, neural networks and pattern recognition aren’t new. In 1950, a computer program was written that improved its checkers performance the more it played (by studying winning strategies and incorporating these into its own program). In 1957, the first neural network for computers (the Perceptron) was designed. In 1967, the “nearest neighbor” algorithm, which allowed a computer to do very basic pattern recognition, was created.
Indeed, some would say that Alan Turing’s famous machine that ultimately broke the German “Enigma” code during World War II was an instance of machine learning—in that it observed incoming data, analyzed it and extracted information.
So why has machine learning exploded on the scene now, pervading fields as diverse as marketing, health care, manufacturing, information security and transportation?
Researchers at Georgia Tech say the explanation is the confluence of three things:
1. Faster, more powerful computer hardware (parallel processors, GPUs, etc.)
2. Software algorithms to take advantage of these computational architectures
3. Loads and loads of data for training (digitized documents, internet social media posts, YouTube videos, GPS coordinates, electronic health records, and, the fastest-growing category, all those networked sensors and processors behind the much-heralded Internet of Things).
This digitalization began in earnest in the 1990s. According to IDC Research, digital data will grow at a compound annual growth rate of 42 percent through 2020. In the 2010-20 decade, the world’s data will grow by 50 times, from about one Zettabyte (1ZB) in 2010 to about 50ZB in 2020.
These oceans of data and data sources not only enable machine learning, but also, in a sense, they create an urgent need for it, offering a solution to the human programmer bottleneck. “The usual way of programming computers these days is, you write a program,” says Irfan Essa, director of Tech’s new Center for Machine Learning. “Now we’re saying, that cannot scale.”
There are simply too many data sources, arriving too fast.
The ability of these systems to quickly and reliably make inferences from data has galvanized the attention of the world’s biggest technology players and businesses, who’ve seen the commercial benefits and opportunities.
“It created a disruption,” says Essa, who also serves as associate dean of the College of Computing, a professor in the School of Interactive Computing and an adjunct professor in the School of Electrical and Computer Engineering.
As Jeff Bezos, CEO of Amazon, put it in his widely circulated April 2017 letter to company shareholders, Amazon’s use of machine learning in its autonomous delivery drones and speech-controlled assistant Alexa is only part of the story.
“Machine learning drives our algorithms for demand forecasting, product search ranking, product and deals recommendations, merchandising placements, fraud detection, translations and much more,” Bezos wrote. “Though less visible, much of the impact of machine learning will be of this type—quietly but meaningfully improving core operations.”
Two other drivers for the rapid growth of machine learning have been the widely available, open source toolkits (such as Google’s TensorFlow) that can rapidly prototype a machine learning system, and cloud-based storage and computation services to host it.
This April, for instance, Amazon Web Services announced that Amazon Lex, the artificial intelligence service (AI) used to create applications that can interact with users via voice and text—and the technology behind Amazon Alexa—would be available to Amazon Web Services customers.
“You can build a startup very, very fast,” says Sebastian Pokutta, Georgia Tech’s David M. McKenney Family Associate Professor in the H. Milton Stewart School for Industrial and Systems Engineering, and associate director of the Center for Machine Learning (ML@GT). “Before, machine learning was very academic and somewhat esoteric. Now we have a toolbox that I can give a student, and within a week they can create something that’s usable.”
Natural Language: Going Deeper
Like image recognition, speech recognition has seen great strides thanks to machine learning. Consider Amazon’s Alexa or Google Home, two darlings in the speech-controlled appliance space.
Georgia Tech researchers aren’t competing with these new commercial efforts. “We’re working on things that we hope will be important components of systems in the much longer term,” says Jacob Eisenstein, assistant professor in the School of Interactive Computing, where he leads the Computational Linguistics Laboratory. “As a field right now, we’re the intersection of machine learning and linguistics.”
That said, Eisenstein points out that Google quietly incorporates increasingly sophisticated natural language processing into its search system every few months.
“What I think they’re doing is drawing ideas from the research literature, from the stuff that’s produced at universities like Georgia Tech,” he says.
Highlighting the market excitement over speech control, Eisenstein notes that five former Tech students are working at Amazon on Alexa development, as are a number of his undergrads and masters students.
So, what sorts of problems are Eisenstein and his colleagues working to solve?
“Imagine you are interested in some new area of research, and could have a system that summarizes the 15 most important papers in that field into a four-page document,” Eisenstein says.
But creating such a system goes far beyond word or phrase recognition. “We know that to understand language, you have to have some understanding of linguistic structure—how sentences are put together,” he explains. Language understanding is hard, from a machine standpoint, because it has very deep, nested structures.
Tackling subjects like language or other complex, non-linear relationships has given rise to a subset of machine learning known as deep learning. A deep neural network is an artificial neural network with multiple hidden layers between the input and output layers.
Black Box Problems
However, those hidden layers give rise to a black box problem. That is, if the artificial neural network contains hidden layers, its processes aren’t transparent. To take a real-word example: how do we audit the autonomous car’s decision to swerve right, not left?
That’s an area of study for Dhruv Batra, an assistant professor in the School of Interactive Computing. His research aims to develop theory, algorithms and implementations for transparent deep neural networks that are able to provide explanations for their predictions, and to study the effect of developed transparent neural networks and explanations on user trust and perceived trustworthiness.
According to Batra: “We have to be a little careful though, because if we tack on the explanatory piece—‘That’s why I’m calling this a cat’—the system may learn to produce an explanation, a post hoc justification that may not have anything to do with its choice.”
Other problems range from the practical, “How can we remove human bias when setting up the algorithm?” to the unexpectedly philosophical, “How can we be sure these systems are, in fact, learning the right things?”
Tech researchers are hard at work on these fascinating questions.
Essa admits there’s a lot of hype around machine learning right now. But he notes that people are very good at overestimating the impact of technology in the short term, yet underestimating it in the long run.
If optical character recognition and, increasingly, speech recognition are taken for granted because they “just work,” there are other technologies that are far from perfect.
“And we’d like them to be perfect, which is why research and development needs to continue,” Essa says.
Machine learning may even play a role in improving how Georgia Tech students are taught in the future.
“At Tech we have a lot of educational data,” he says. “How do we now use that data to learn more about and support our student body—learn more about their learning, and provide the right kinds of guidance and support?”
INSIDE MACHINE LEARNING @ GEORGIA TECH
“At Georgia Tech, we recognize machine learning to be a game-changer not just in computer science, but in a broad range of scientific, engineering, and business disciplines and practices,” writes Irfan Essa, the inaugural director of the Center for Machine Learning at Georgia Tech (ML@GT), in his welcome note on the Center’s web page.
Launched in June 2016, ML@GT is an interdisciplinary research center that combines assets from the College of Computing, the H. Milton Stewart School of Industrial and Systems Engineering and the School of Electrical and Computer Engineering. Its faculty, students and industry partners are working on research and real-world applications of machine learning in a variety of areas, including machine vision, information security, healthcare, logistics and supply chain, finance and education, among others.
The center truly is a collaborative effort across campus, with 125 to 150 Tech faculty involved, and more than 400 students, says Sebastian Pokutta, David M. McKenney Family Associate Professor in the School of Industrial and Systems Engineering, and an associate director of ML@GT. “Tech has always had a lot of researchers working on machine learning, but they’d been spread out, working in different departments independently,” Pokutta says. “There wasn’t a real community on campus.”
Echoing Essa’s message, Pokutta says the goal of the Center is straightforward and daring: “We want to become the leader in bringing together computing, learning, data and engineering.”
True, there are other machine learning centers in higher ed—MIT, Columbia, Carnegie Mellon—but most focus on combining computing and statistics.
“One of the unique things about Georgia Tech, since we’re a big engineering school, is our machine learning effort is really closely embedded with our engineering units,” Essa says. “We’re close to the sensor, close to the processor, close to the actuator.”
This matters because of what is known as “edge computing”: the concept of moving applications, data and services to the logical extremes of a network, so that knowledge generation can occur at the point of action.
The objective is to use Tech’s engineering prowess—and data-driven techniques—to help design the next generation of technologies and methodologies.
MACHINE LEARNING'S IMPACT ON PRECISION MEDICINE
Healthcare offers a rich source of data to machine learning researchers. There are scanned and electronic health records, claims data, procedure results, lab tests, genetics studies, and even telemetry from devices like heart monitors and wearables like Fitbits and smart watches.
A number of Georgia Tech’s researchers are mining this data to better understand health outcomes at scale and to ultimately figure out the right treatment for each individual patient. This is known as individualist or precision medicine.
Jacob Eisenstein, an assistant professor in the School of Interactive Computing, and Jimeng Sun, an associate professor in the School of Computational Science and Engineering, are mining the text in electronic health records to better understand health outcomes at scale.
Today, patients and doctors try rounds of treatments for ailments, looking for the best fit. “There’s a lot of trial and error,” Eisenstein explains. The project hopes to reduce that, by systematizing treatment based on a deeper understanding of patients, treatments and outcomes.
Last year, Sun was part of a group of researchers who developed a new, accurate-but-interpretable approach for machine learning in medicine.
Their Reverse Time Attention model (RETAIN) achieves high accuracy while remaining clinically interpretable. It is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g., key diagnoses). RETAIN was tested on a large health system dataset with 14 million visits completed by 263,000 patients over an eight-year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as recurrent neural networks, and ease of interpretability comparable to traditional models (logistic regression).
In other work, Tech professors and students are analyzing data from Geisinger, a hospital network in Pennsylvania, to help predict the risk for sepsis and septic shock in patients before they are admitted to the hospital. Other researchers within the School of Industrial and Systems Engineering’s Health Analytics group are collecting health care utilization data involving millions of individuals for events such as hospitalizations that can be used in estimating the cost savings of preventive care.
Why Facebook and Amazon Want to “See” Your Images Better
Facebook’s interest in having machines better assess the billions of images uploaded to its platform—in order to describe, rank or even delete objectionable images—is obvious.
Georgia Tech faculty Dhruv Batra and Devi Parikh—married partners both in life and at work—are assistant professors in the College of Computing’s School of Interactive Computing who are currently serving as visiting researchers at Facebook Artificial Intelligence Research (FAIR).
At Facebook, the duo is working on ways to improve the interaction between human beings, a machine platform and images posted on the social network platform. In April 2016, Facebook began automatically describing the content of photos to blind and visually impaired users. Called “automatic alternative text,” the feature was created by Facebook’s accessibility team. The technology also works for Facebook versions in countries with limited internet speeds or that don’t allow visual content.
And last December, Batra and Parikh also received Amazon Academic Research Awards for a pair of projects they are leading in computer vision and machine learning. They received $100,000 each from Amazon—$80,000 in gift money and $20,000 in Amazon Web Services credit—for projects that aim to produce the next generation of artificial intelligence agents.
Batra and Parikh are using giant image data sets with human annotations that have been built up at Mechanical Turk, Amazon’s crowdsourcing internet marketplace.
One project, Visual Dialog, led by Batra, aims at creating an AI agent able to hold a meaningful dialogue with humans in natural, conversational language about visual content. Facebook can already generate automatic alternative text for an image, explains Batra. So a user can be told, “This picture may contain a mug, a person, a cat.” The goal, he said, is to go much further—to offer not only more information about the image but also engage the user in a dialog.
Training the machine learning algorithm for the task requires a huge data set—as many as 200,000 conversations on the same set of images, each conversation including 10 rounds of questions and answers (or roughly 2 million question-and-answer pairs).
Another project, titled “Counting Everyday Objects in Everyday Scenes,” is led by Parikh, and aims to enable an AI to count the number of objects belonging to the same category. One particularly interesting approach will try to estimate the counts of objects in one try by just glancing at the image as a whole. This is inspired by “subitizing”—an ability humans inherently possess to see a small number of objects and know how many there are without having to explicitly count.