Artificial intelligenceMachine learningTechnology

Sony’s racing AI destroyed its human competitors by being nice (and fast)

“Wait, what? How?” Emily Jones wasn’t used to being left behind. A top sim-racing driver with multiple wins to her name, Jones jerked the steering wheel in the esports rig, eyes fixed on the screen in front of her: “I’m pushing way too hard to keep up— How does it do that?” Her staccato commentary intercut with squealing tires, Jones flung her virtual car around the virtual track at 120 miles per hour—then 140, 150—chasing the fastest Gran Turismo driver in the world.

Built by Sony AI, a research lab launched by the company in 2020, Gran Turismo Sophy is a computer program trained to control racing cars inside the world of Gran Turismo, a video game known for its super-realistic simulations of real vehicles and tracks. In a series of events held behind closed doors last year, Sony put its program up against the best humans on the professional sim-racing circuit. 

What they discovered during those racetrack battles—and the ones that followed—could help shape the future of machines that work alongside humans, or join us on the roads. 

Back in July 2021, Jones, who is based in Melbourne, Australia, and races for the esports team Trans Tasman Racing, didn’t know what to expect. “I wasn’t told much about it,” she says  a year later. “‘Don’t do any practice,’ they said, ‘Don’t look at its lap times.’ I was like, it’s obviously going to be good if they’re keeping it secret from me.” In the end, GT Sophy beat Jones’ best lap by 1.5 seconds. At a level where records are smashed in millisecond increments, 1.5 seconds is an age.  

But Sony soon learned that speed alone wasn’t enough to make GT Sophy a winner. The program outpaced all human drivers on an empty track, setting superhuman lap times across three different virtual courses. Yet when Sony tested GT Sophy in a race against multiple human drivers, where intelligence as well as speed is needed, GT Sophy lost. The program was at times too aggressive, racking up penalties for reckless driving, and too timid, giving way when it didn’t need to.

Sony regrouped, retrained its AI, and set up a rematch in October. This time GT Sophy won with ease. What made the difference? It’s true that Sony came back with a larger neural network, giving its program more capabilities to draw from on the fly. But, ultimately, the difference came down to giving GT Sophy something that Peter Wurman, head of Sony AI America, calls “etiquette”: the ability to balance its aggression and timidity, picking the most appropriate behavior for the situation at hand.

This is also what makes GT Sophy relevant beyond Gran Turismo. Etiquette between drivers on a track is a specific example of the kind of dynamic, context-aware behavior that robots will be expected to have when they interact with people, says Wurman.

An awareness of when to take risks and when to play it safe would be useful for AI that is better at interacting with people, whether it be on the manufacturing floor, in home robots or driverless cars. 

“I don’t think we’ve learned general principles yet, about how to deal with human norms that you have to respect,” says Wurman. “But it’s a start and hopefully gives us some insight into this problem in general.”

Game changer

GT Sophy is just the latest in a line of AI systems that have beaten the world’s best human players at various games, from chess and Go to video games like Starcraft and DOTA. But Gran Turismo offered Sony a new kind of challenge. Unlike those other games, especially those that are turn-based, excelling at Gran Turismo means controlling a vehicle at the limits of what’s physically possible, in real time, and in close proximity with other players all trying to do the same.

Cars hurtle around corners at more than 100 miles per hour with only inches between them. At those speeds, the smallest errors can lead to a crash. Gran Turismo captures real-world physics in extreme detail, simulating the aerodynamics of a car and the friction of its tires on the track. The game is sometimes used to train and recruit drivers for real-world racing.

“It does an excellent job with the realism,” says Davide Scaramuzza, who leads the robotics and perception group at the University of Zurich in Switzerland. Scaramuzza was not involved with GT Sophy, but his team has used Gran Turismo to train a previous AI driver—though not one that was ever tested against humans. 

GT Sophy doesn’t get the same view of the game that human players do. Instead of reading pixels off a screen, the program takes in updates about the positions of its car on the track and the positions of the cars around it. It also gets sent information about the virtual physical forces impacting its vehicle. In response, GT Sophy tells the car to turn or brake. This back and forth between GT Sophy and the game happens 10 times a second, which Wurman and his colleagues claim matches the reaction time of human players.

Sony used reinforcement learning to train GT Sophy from scratch via trial and error. At first the AI struggled to keep a car on the road. But after training on 10 PlayStation 4s, each running 20 instances of the program, GT Sophy matched Gran Turismo’s built-in AI, which amateur players use for practice, in around eight hours. In 24 hours it was laying down lap times near the very top of an online leaderboard of 17,700 human players.

It took nine days before GT Sophy stopped shaving fractions of a second off its lap times. By then it was faster than any human.

Sony’s AI learned how to drive at the limits of what the game allowed, pulling off moves that human players can only gawk at. In particular, Jones was struck by the way GT Sophy took corners, braking early before accelerating out on a much tighter line than she was.

“It used the curve in a weird way, doing stuff that I just didn’t even think of,” she says. For example, GT Sophy often drops a wheel onto the grass at the edge of the track and then skids into turns. “You don’t want to do that because you’ll make a mistake, it’s like a controlled crash,” she says. “I could maybe do that one in a hundred times.”

GT Sophy was quick to master the game’s physics. The bigger problem was the referees. At a professional level, Gran Turismo races are watched by human judges, who can award penalty points for dangerous driving. Racking up penalties was a key reason for GT Sophy losing the first round of races lastJuly, despite being faster than any of the human drivers. And learning to avoid them made all the difference in round two.  

Tough but fair

Wurman has been working on GT Sophy for several years. There’s a painting of two cars jostling for position hanging on the wall behind his desk. “It’s a GT Sophy car passing Yamanaka,” says Wurman, referring to Tomoaki Yamanaka, one of the four Japanese professional sim-racing drivers who competed against GT Sophy last year.

Wurman can’t recall which race the painting is taken from. If it’s the October event, Yamanaka may well be having a great time, pushing himself against a tough but fair opponent. If it’s the July event, he’s probably cussing at the computer.  

Yamanaka’s team mate, Takuma Miyazono tells me about that July race via a translator. “There were a few times where we were pushed off the track because of how aggressively it would go into the corners,” he says. “That threw us off. The human drivers had to hold back on the turns to avoid being run off the road.”

Training the AI to play fair without losing its competitive edge was hard, says Wurman. The human referees make subjective judgements that depend on context, making it difficult to turn them into simple dos and don’ts that the AI can learn from.  

The Sony researchers tried giving the AI lots of different cues, adjusting them as they went, hoping to find a mix that worked. They tried penalizing it if it went off the track or bumped into wall. They penalized it for crashes it caused; and for crashes where a referee’s call might go either way. They experimented with different sized penalties for each and checked how it changed GT Sophy’s driving.

Sony also upped the competition GT Sophy faced in its training. Before, it had mostly trained against previous versions of itself. Leading into the October rematch, Sony tested its AI every week or two against top drivers, tweaking it constantly. “That gave us the kind of feedback we needed to find the right balance between aggression and timidity,” he says.

It worked. When Miyazono went up against GT Sophy three months later, the aggression was gone—but nor was the AI simply backing down. “When you go into a corner with two cars side by side, it leaves just enough space for your car to go through,” he says. “It really does feel like you’re racing with another person.”

“You get a different sort of passion and fun from driving against something that reacts that way,” says Miyazono. “That was something that really left a big impression on my mind.”

Scaramuzza is impressed with Sony’s work. “We measure the progress of robotics against what humans can do,” he says. But Elia Kaufman, who works with Scaramuzza at the University of Zurich, points out that it is still human researchers who choose which of GT Sophy’s learned behaviors to bake in during training. “They’re the ones who judge what is good racing etiquette or not,” he says. “It would be really interesting if that could be done in an automated way.” Such a machine would not only have good manners but could recognise what good manners were, and be able to adapt its behavior to new settings.

Scaramuzza’s team is now applying its Gran Turismo research to real-world drone racing, training an AI to fly using raw video input instead of data from a simulation. Last month they invited two world-champion drone racers to take on the computer. No prizes for guessing who won. “It was very interesting to look at their faces after they saw our AI racing,” says Scaramuzza. “They were mind-blown.” 

Scaramuzza thinks that making the jump to the real-world is essential for true progress in robotics. “There will always be a mismatch between simulation and the real world,” he says. “This is something that gets forgotten when people talk about AI making incredible progress. In terms of strategy, yes. In terms of real-world deployment, we are definitely not there yet.”

For now, Sony is sticking to games. It plans to put GT Sophy in a future version of Gran Turismo. “We’d like this to become part of the product,” says Peter Stone, executive director of Sony AI America. “Sony’s an entertainment company and we want this to make the game more entertaining.”

Jones thinks the sim-racing community could learn a lot from GT Sophy once more people get a chance to see it drive. “There will be tracks where we’re like, hang on a second, we’ve been doing this for years but there’s actually a faster way of doing it.” Miyazono says he’s already tried to copy some of the lines the AI takes around corners, now that it has shown him they can be done.

“If the benchmark changes, everybody rises up as well,” says Jones.

Source link