Saya the Virtual Human, a CG high school girl, was created by marrying creative and artificial intelligence technology (AI). Drawn in 3D computer graphics, with clever use of facial expression recognition and line-of-sight recognition AI technology, the schoolgirl is looks real enough to be a photograph. She made her sensational debut at the SXSW Trade Show in Austin, USA in March 2018, where she delighted large numbers of visitors. We spoke with technical director Genta Nakahara, the team’s technical driving force, about what went on behind the scenes of Saya’s development.
I am currently in Hakuhodo’s Business Incubation Division, and involved not just in mass-media advertising, but also work that challenges new business models as it cannot be talked about in the context of traditional advertising. I majored in mathematics at university, and was doing pricing work using programing-based approximate calculation at a financial engineering firm prior to coming to Hakuhodo. I came into contact with marketing methods using statistics for the first time when I joined Hakuhodo in 2012, and it was fresh and interesting.
One day, in my private time I went to 2045, a club event organized by Daito Manabe of Rhizomatiks and Nao Tokui of Qosmo, and was blown away. At the event, participants’ playlists were synced, the song genres analyzed and AI given the task of selecting tunes in an experimental initiative where AI and a DJ created music alternately. In other words, I saw for myself the expansion of the use of AI and math in the creative realm. My interest in the use of AI in creative really picked up after that.
I was fortunate enough to then get to know Mr. Tokui of Qosmo and learned about AI from him individually, attended internal presentations given by Rhizomatiks people and learned about a live performance method using openFrameworks. I was hugely impressed that realistic expressions that would take days to produce with CG were now possible in real time. The hurdles to learning visual coding and AI models are probably slightly lower for those of us who’ve studied math than others. I took to openFrameworks really quickly, and started experimenting with it on my own for all sorts of things, as if it was a hobby. I used programing-based output to put forward issues at in-house workshops, and created music videos for my musician friends.
Long story short [laughs], my first contact with the Saya team was in May 2017, when the Hakuhodo producer for Saya, who heard about my activities, asked me to give a lecture to the team, as it didn’t have anyone versed in AI on it.
Using AI in the Saya Project came out of an idea the creative director of the project, Takayuki Niizawa of TBWA\HAKUHODO. The first time he saw Saya, Niizawa-san immediately thought, “It would be a shame to just leave her as simple CG pretty girl character.” Saya had a kind of symbolism that could allow her to go beyond characterdom and become an icon. He thought that we could launch her as a symbol of the cutting edge of today’s technology. So he proposed to TELYUKA, the husband and wife unit made up of Saya’s “parents,” CG artists Teruyuki and Yuka Ishikawa, the idea of breathing life into Saya by integrating various technologies into her. His concept would also see Saya evolve as a character by spreading awareness, making her “content that grows.” The idea of creating something new by integrating technology found favor with TELYUKA and the project began.
When I joined the Saya Team, they had pretty much decided to use AI in the project this way and were at the point of considering what to start with and how to proceed. So I suggested asking Tokui-san of Qosmo, who specializes in using AI in creative and has put out numerous media art works using AI, to collaborate. It was at that time, that my participation on the team was decided, too.
The first thing that came to mind when we thought about using AI was language. In other words, to make Saya talk. But language is actually [laughs] really hard, and synchronizing lip movements and sound is exceptionally tricky. So we talked about whether another way of surprising people might be better. Then we hit on the idea of using facial recognition technology to have Saya communicate with emotions. With super-high resolution visual, it takes hours of rendering to create one second of material. So having Saya respond instantly and continuously to the expressions and movements of someone watching her, something that should take so long, would look like magic to someone with a level of understanding of visual technology, and for someone that didn’t understand the technology, seeing a high-res visual respond interactively in real time would be a surprise they’d never experienced before. We thought we would be able to make something interesting for everyone, whether they were tech literate or not.
What we did specifically when we exhibited Saya at SXSW Trade Show was to categorize the faces of users into around seven expressions—say, 70% happy, 20% worried, 10% surprised—and plot them so the total was 100%.
It works by analyzing images of filmed users’ faces in real time, and when each expression exceeds a certain threshold, the image in front of them changes. For the user, Saya instantly responded naturally to their expressions by, for instance, becoming bashful or averting her gaze. Users stood in front of the 94” monitor in which Saya appears, but there was no separate booth or the like. Saya is set up to only recognize the face of the person standing in front of her, no matter how crowded it gets. We were able to do this thanks to a huge breakthrough we had in development. We didn’t have enough time to do it, but we wanted to create the experience of CG and human meeting eyes. Just between you and me [laughs], behind the scenes we were sneakily readying analytical AI that will track users’ lines of sight with a high degree of accuracy. I look forward to the day Saya looks into my eyes.
I can’t go into too much detail, but the interactivity of Saya is the result of combining several cutting-edge technologies, more than a few little tricks, and clever use of computing power.
We set the goal of exhibiting at SXSW in the second half of 2017. From that point, we started full-scale development, culminating with the official launch at SXSW about half a year later. At SXSW, the techie types wanted to know how it worked, the GC whizzes were surprised at how realistic and life-like Saya’s expressions were, and kids and others just came to be entertained. They all enjoyed watching Saya respond to their expressions in real time. As we had hoped, we were able to get a good response from everyone, without restricting our customers. It was just great to have it work out so well.
We are using feedback we received from the many people who experienced Saya at SXSW in development. Many of them waved to Saya, so to enable her to respond to gestures from users, we updated her later with skeleton recognition AI so she can interact by waving back, etc.
It is a dream of the production side to have Saya become a symbol of diversity that can be useful to everyone’s communication, whether linguistic or non-linguistic. There is still a great deal of growth potential for use of AI in the non-linguistic communication sphere, and we look forward to taking on new challenges. We’re still working on it, so I dare not go into details [laughs], but if all goes well, what we have could become a new support for communication. I feel that we will be able to make social contributions using the power of AI in the future.
In terms of technology, the Saya Project was almost completely produced in-house, and I think it was an unusual and totally new challenge for an advertising agency. It was thanks to the talented members of the team, and it wouldn’t be an exaggeration to say that one of the main reasons it is doing so well now is the successful recruiting of the tech members. AI also holds massive potential for advertising creative. Today’s AI mechanisms are extremely close to the human brain. Take learning mechanisms, for instance. Just as a baby learns “Mama” by seeing its mother’s joyful reaction when they happen to say “Mama” and realize that “Mama” is right, AI learns language with data sets that are tagged as correct. It’s the same with distinguishing. Humans and AI both make judgements about what something they see for the first time is by extracting feature values from large volumes of data. Using such features of AI could be useful for new advertising creative, I think, by appropriately triggering next actions according on user input. The work I do is probably still unusual for an advertising agency, but like paving a gravel road that no one was interested in, I hope that I can increase the options for those who follow in some small way. I would be happy if through the Saya project I can help everyone understand the new potential for advertising.