Speech Recognition

Where is voice tech going?

Posted by | Alexa, artificial intelligence, Baidu, Column, COVID-19, Extra Crunch, Gadgets, hardware, Headspace, Market Analysis, Media, Mobile, Podcasts, siri, smart speaker, Speech Recognition, Startups, TC, Venture Capital, virtual assistant, voice, voice assistant, voice search, voice technology, Wearables | No Comments
Mark Persaud
Contributor

Mark Persaud is digital product manager and practice lead at Moonshot by Pactera, a digital innovation company that leads global clients through the next era of digital products with a heavy emphasis on artificial intelligence, data and continuous software delivery.

2020 has been all but normal. For businesses and brands. For innovation. For people.

The trajectory of business growth strategies, travel plans and lives have been drastically altered due to the COVID-19 pandemic, a global economic downturn with supply chain and market issues, and a fight for equality in the Black Lives Matter movement — amongst all that complicated lives and businesses already.

One of the biggest stories in emerging technology is the growth of different types of voice assistants:

  • Niche assistants such as Aider that provide back-office support.
  • Branded in-house assistants such as those offered by BBC and Snapchat.
  • White-label solutions such as Houndify that provide lots of capabilities and configurable tool sets.

With so many assistants proliferating globally, voice will become a commodity like a website or an app. And that’s not a bad thing — at least in the name of progress. It will soon (read: over the next couple years) become table stakes for a business to have voice as an interaction channel for a lovable experience that users expect. Consider that feeling you get when you realize a business doesn’t have a website: It makes you question its validity and reputation for quality. Voice isn’t quite there yet, but it’s moving in that direction.

Voice assistant adoption and usage are still on the rise

Adoption of any new technology is key. A key inhibitor of technology is often distribution, but this has not been the case with voice. Apple, Google, and Baidu have reported hundreds of millions of devices using voice, and Amazon has 200 million users. Amazon has a slightly more difficult job since they’re not in the smartphone market, which allows for greater voice assistant distribution for Apple and Google.

Image Credits: Mark Persaud

But are people using devices? Google said recently there are 500 million monthly active users of Google Assistant. Not far behind are active Apple users with 375 million. Large numbers of people are using voice assistants, not just owning them. That’s a sign of technology gaining momentum — the technology is at a price point and within digital and personal ecosystems that make it right for user adoption. The pandemic has only exacerbated the use as Edison reported between March and April — a peak time for sheltering in place across the U.S.

Powered by WPeMatico

Week-in-Review: Alexa’s indefinite memory and NASA’s otherworldly plans for GPS

Posted by | 4th of July, AI assistant, alex wong, Amazon, Andrew Kortina, Android, andy rubin, appeals court, Apple, apple inc, artificial intelligence, Assistant, China, enterprise software, Getty-Images, gps, here, iPhone, machine learning, Online Music Stores, operating systems, Sam Lessin, social media, Speech Recognition, TC, Tim Cook, Twitter, United States, Venmo, voice assistant | No Comments

Hello, weekenders. This is Week-in-Review, where I give a heavy amount of analysis and/or rambling thoughts on one story while scouring the rest of the hundreds of stories that emerged on TechCrunch this week to surface my favorites for your reading pleasure.

Last week, I talked about the cult of Ive and the degradation of Apple design. On Sunday night, The Wall Street Journal published a report on how Ive had been moving away from the company, to the dismay of many on the design team. Tim Cook didn’t like the report very much. Our EIC gave a little breakdown on the whole saga in a nice piece.

Apple sans Ive


Amazon Buys Whole Foods For Over 13 Billion

The big story

This week was a tad restrained in its eventfulness; seems like the newsmakers went on 4th of July vacations a little early. Amazon made a bit of news this week when the company confirmed that Alexa request logs are kept indefinitely.

Last week, an Amazon public policy exec answered some questions about Alexa in a letter sent to U.S. Senator Coons. His office published the letter on its site a few days ago and most of the details aren’t all that surprising, but the first answer really sets the tone for how Amazon sees Alexa activity:

Q: How long does Amazon store the transcripts of user voice recordings?

A: We retain customers’ voice recordings and transcripts until the customer chooses to delete them.

What’s interesting about this isn’t that we’re only now getting this level of straightforward dialogue from Amazon on how long data is kept if not specifically deleted, but it makes one wonder why it is useful or feasible for them to keep it indefinitely. (This assumes that they actually are keeping it indefinitely; it seems likely that most of it isn’t, and that by saying this they’re protecting themselves legally, but I’m just going off the letter.)

After several years of “Hey Alexa,” the company doesn’t seem all that close to figuring out what it is.

Alexa seems to be a shit solution for commerce, so why does Amazon have 10,000 people working on it, according to a report this week in The Information? All signs are pointing to the voice assistant experiment being a short-term failure in terms of the short-term ambitions, though AI advances will push the utility.

Training data is a big deal across AI teams looking to educate models on data sets of relevant information. The company seems to say as much. “Our speech recognition and natural language understanding systems use machine learning to adapt to customers’ speech patterns and vocabulary, informed by the way customers use Alexa in the real world. To work well, machine learning systems need to be trained using real world data.”

The company says it doesn’t anonymize any of this data because it has to stay associated with a user’s account in order for them to delete it. I’d feel a lot better if Amazon just effectively anonymized the data in the first place and used on-device processing the build a profile on my voice. What I’m more afraid of is Amazon having such a detailed voiceprint of everyone who has ever used an Alexa device.

If effortless voice-based e-commerce isn’t really the product anymore, what is? The answer is always us, but I don’t like the idea of indefinitely leaving Amazon with my data until they figure out the answer.

Send me feedback
on Twitter @lucasmtny or email
lucas@techcrunch.com

On to the rest of the week’s news.

Trends of the week

Here are a few big news items from big companies, with green links to all the sweet, sweet added context:

  • NASA’s GPS moonshot
    The U.S. government really did us a solid inventing GPS, but NASA has some bigger ideas on the table for the positioning platform, namely, taking it to the Moon. It might be a little complicated, but, unsurprisingly, scientists have some ideas here. Read more.
  • Apple has your eyes
    Most of the iOS beta updates are bug fixes, but the latest change to iOS 13 brought a very strange surprise: changing the way the eyes of users on iPhone XS or XS Max look to people on the other end of the call. Instead of appearing that you’re looking below the camera, some software wizardry will now make it look like you’re staring directly at the camera. Apple hasn’t detailed how this works, but here’s what we do know
  • Trump is having a Twitter party
    Donald Trump’s administration declared a couple of months ago that it was launching an exploratory survey to try to gain a sense of conservative voices that had been silenced on social media. Now @realdonaldtrump is having a get-together and inviting his friends to chat about the issue. It’s a real who’s who; check out some of the people attending here.
Amazon CEO And Blue Origin Founder Jeff Bezos Speaks At Air Force Association Air, Space And Cyber Conference

(Photo by Alex Wong/Getty Images)

GAFA Gaffes

How did the top tech companies screw up this week? This clearly needs its own section, in order of badness:

  1. Amazon is responsible for what it sells:
    [Appeals court rules Amazon can be held liable for third-party products]
  2. Android co-creator gets additional allegations filed:
    [Newly unsealed court documents reveal additional allegations against Andy Rubin]

Extra Crunch

Our premium subscription service had another week of interesting deep dives. TechCrunch reporter Kate Clark did a great interview with the ex-Facebook, ex-Venmo founding team behind Fin and how they’re thinking about the consumerization of the enterprise.

Sam Lessin and Andrew Kortina on their voice assistant’s workplace pivot

“…The thing is, developing an AI assistant capable of booking flights, arranging trips, teaching users how to play poker, identifying places to purchase specific items for a birthday party and answering wide-ranging zany questions like “can you look up a place where I can milk a goat?” requires a whole lot more human power than one might think. Capital-intensive and hard-to-scale, an app for “instantly offloading” chores wasn’t the best business. Neither Lessin nor Kortina will admit to failure, but Fin‘s excursion into B2B enterprise software eight months ago suggests the assistant technology wasn’t a billion-dollar idea.…”

Here are some of our other top reads this week for premium subscribers. This week, we talked a bit about asking for money and the future of China’s favorite tech platform:

Want more TechCrunch newsletters? Sign up here.

Powered by WPeMatico

Apple’s Voice Control improves accessibility OS-wide on all its devices

Posted by | accessibility, Apple, Gadgets, hardware, iOS, macos, Mobile, Speech Recognition, TC, voice control, wwdc, WWDC 2019 | No Comments

Apple is known for fluid, intuitive user interfaces, but none of that matters if you can’t click, tap, or drag because you don’t have a finger to do so with. For users with disabilities the company is doubling down on voice-based accessibility with the powerful new Voice Control feature on Macs and iOS (and iPadOS) devices.

Many devices already support rich dictation, and of course Apple’s phones and computers have used voice-based commands for years (I remember talking to my Quadra). But this is a big step forward that makes voice controls close to universal — and it all works offline.

The basic idea of Voice Control is that the user has both set commands and context-specific ones. Set commands are things like “Open Garage Band” or “File menu” or “Tap send.” And of course some intelligence has gone into making sure you’re actually saying the command and not writing it, like in that last sentence.

But that doesn’t work when you have an interface that pops up with lots of different buttons, fields, and labels. And even if every button or menu item could be called by name, it might be difficult or time-consuming to speak everything out loud.

To fix this Apple simply attaches a number to every UI item in the foreground, which a user can show by saying “show numbers.” Then they can simply speak the number or modify it with another command, like “tap 22.” You can see a basic workflow below, though of course without the audio cues it loses a bit:

Remember that these numbers may be more easily referenced by someone with little or no vocal ability, and could in fact be selected from using a simpler input like a dial or blow tube. Gaze tracking is good but it has its limitations, and this is a good alternative.

For something like maps, where you could click anywhere, there’s a grid system for selecting where to zoom in or click. Just like Blade Runner! Other gestures like scrolling and dragging are likewise supported.

Dictation has been around for a bit but it’s been improved as well. You can select and replace entire phrases, like “Replace ‘be right back’ with ‘on my way.’ ” Other little improvements will be noted and appreciated by those who use the tool often.

All the voice processing is done offline, which makes it both quick and robust to things like signal problems or use in foreign countries where data might be hard to come by. And the intelligence built into Siri lets it recognize names and context-specific words that may not be part of the base vocabulary. Improved dictation means selecting emoji and adding dictionary items is a breeze.

Right now Voice Control is supported by all native apps, and third party apps that use Apple’s accessibility API should be able to take advantage of it easily. And even if they don’t do it specifically, numbers and grids should still work just fine, since all the OS needs to know are the locations of the UI items. These improvements should appear in accessibility options as soon as a device is updated to iOS 13 or Catalina.

Powered by WPeMatico

Xprize names two grand prize winners in $15 million Global Learning Challenge

Posted by | Android, bangalore, california, carnegie mellon, carnegie mellon university, cci, Education, Elon Musk, Google, kenya, machine learning, musk, New York, pittsburgh, Seoul, south korea, Speech Recognition, Tanzania, TC, technology, transhumanism, United Kingdom, United States, XPRIZE | No Comments

Xprize, the nonprofit organization developing and managing competitions to find solutions to social challenges, has named two grand prize winners in the Elon Musk-backed Global Learning Xprize.

The companies, KitKit School out of South Korea and the U.S., and onebillion, operating in Kenya and the U.K., were announced at an awards ceremony hosted at the Google Spruce Goose Hangar in Playa Vista, Calif.

Xprize set each of the competing teams the task of developing scalable services that could enable children to teach themselves basic reading, writing and arithmetic skills within 15 months.

Musk himself was on hand to award $5 million checks to each of the winning teams.

Five finalists, including New York-based CCI, which developed lesson plans and a development language so non-coders could create lessons; Chimple, a Bangalore-based learning platform enabling children to learn reading, writing and math on a tablet; RobotTutor, a Pittsburgh-based company, which used Carnegie Mellon research to develop an app for Android tablets that would teach lessons in reading and writing with speech recognition, machine learning and human computer interactions; and the two grand prize winners all received $1 million to continue developing their projects.

The tests required each product to be field-tested in Swahili, reaching nearly 3,000 children in 170 villages across Tanzania.

All of the final solutions from each of the five teams that made it to the final round of competition have been open-sourced so anyone can improve on and develop local solutions using the toolkits developed by each team in competition.

Kitkit School, with a team from Berkeley, Calif. and Seoul, developed a program with a game-based core and flexible learning architecture to help kids learn independently, while onebillion merged numeracy content with literacy material to provide directed learning and activities alongside monitoring to personalize responses to children’s needs.

Both teams are going home with $5 million to continue their work.

The problem of access to basic education affects more than 250 million children around the world, who can’t read or write, and one-in-five children around the world aren’t in school, according to data from UNESCO.

The problem of access is compounded by a shortage of teachers at the primary and secondary school levels. Some research, cited by Xprize , indicates that the world needs to recruit another 68.8 million teachers to provide every child with a primary and secondary education by 2040.

Before the Global Learning Xprize field test, 74% of the children who participated were reported as never having attended school; 80% were never read to at home; and 90% couldn’t read a single word of Swahili.

After the 15-month program working on donated Google Pixel C tablets and pre-loaded with software, the number was cut in half.

“Education is a fundamental human right, and we are so proud of all the teams and their dedication and hard work to ensure every single child has the opportunity to take learning into their own hands,” said Anousheh Ansari, CEO of Xprize, in a statement. “Learning how to read, write and demonstrate basic math are essential building blocks for those who want to live free from poverty and its limitations, and we believe that this competition clearly demonstrated the accelerated learning made possible through the educational applications developed by our teams, and ultimately hope that this movement spurs a revolution in education, worldwide.”

After the grand prize announcement, Xprize said it will work to secure and load the software onto tablets; localize the software; and deliver preloaded hardware and charging stations to remote locations so all finalist teams can scale their learning software across the world.

Powered by WPeMatico

Alexa, does the Echo Dot Kids protect children’s privacy?

Posted by | Advertising Tech, Amazon, Amazon Echo, Amazon.com, artificial intelligence, center for digital democracy, coppa, Disney, echo, echo dot kids, eCommerce, Federal Trade Commission, Gadgets, nickelodeon, privacy, privacy policy, smart assistant, smart speaker, Speech Recognition, terms of service, United States, voice assistant | No Comments

A coalition of child protection and privacy groups has filed a complaint with the Federal Trade Commission (FTC) urging it to investigate a kid-focused edition of Amazon’s Echo smart speaker.

The complaint against Amazon Echo Dot Kids, which has been lodged with the FTC by groups including the Campaign for a Commercial-Free Childhood, the Center for Digital Democracy and the Consumer Federation of America, argues that the e-commerce giant is violating the Children’s Online Privacy Protection Act (COPPA) — including by failing to obtain proper consents for the use of kids’ data.

As with its other smart speaker Echo devices, the Echo Dot Kids continually listens for a wake word and then responds to voice commands by recording and processing users’ speech. The difference with this Echo is it’s intended for children to use — which makes it subject to U.S. privacy regulation intended to protect kids from commercial exploitation online.

The complaint, which can be read in full via the group’s complaint website, argues that Amazon fails to provide adequate information to parents about what personal data will be collected from their children when they use the Echo Dot Kids; how their information will be used; and which third parties it will be shared with — meaning parents do not have enough information to make an informed decision about whether to give consent for their child’s data to be processed.

They also accuse Amazon of providing at best “unclear and confusing” information per its obligation under COPPA to also provide notice to parents to obtain consent for children’s information to be collected by third parties via the online service — such as those providing Alexa “skills” (aka apps the AI can interact with to expand its utility).

A number of other concerns about Amazon’s device are also being raised with the FTC.

Amazon released the Echo Dot Kids a year ago — and, as we noted at the time, it’s essentially a brightly bumpered iteration of the company’s standard Echo Dot hardware.

There are differences in the software, though. In parallel, Amazon updated its Alexa smart assistant — adding parental controls, aka its FreeTime software, to the child-focused smart speaker.

Amazon said the free version of FreeTime that comes bundled with the Echo Dot Kids provides parents with controls to manage their kids’ use of the product, including device time limits; parental controls over skills and services; and the ability to view kids’ activity via a parental dashboard in the app. The software also removes the ability for Alexa to be used to make phone calls outside the home (while keeping an intercom functionality).

A paid premium tier of FreeTime (called FreeTime Unlimited) also bundles additional kid-friendly content, including Audible books, ad-free radio stations from iHeartRadio Family and premium skills and stories from the likes of Disney, National Geographic and Nickelodeon .

At the time it announced the Echo Dot Kids, Amazon said it had tweaked its voice assistant to support kid-focused interactions — saying it had trained the AI to understand children’s questions and speech patterns, and incorporated new answers targeted specifically at kids (such as jokes).

But while the company was ploughing resource into adding a parental control layer to Echo and making Alexa’s speech recognition kid-friendly, the COPPA complaint argues it failed to pay enough attention to the data protection and privacy obligations that apply to products targeted at children — as the Echo Dot Kids clearly is.

Or, to put it another way, Amazon offers parents some controls over how their children can interact with the product — but not enough controls over how Amazon (and others) can interact with their children’s data via the same always-on microphone.

More specifically, the group argues that Amazon is failing to meet its obligation as the operator of a child-directed service to provide notice and obtain consent for third parties operating on the Alexa platform to use children’s data — noting that its Children’s Privacy Disclosure policy states it does not apply to third-party services and skills.

Instead, the complaint says Amazon tells parents they should review the skill’s policies concerning data collection and use. “Our investigation found that only about 15% of kid skills provide a link to a privacy policy. Thus, Amazon’s notice to parents regarding data collection by third parties appears designed to discourage parental engagement and avoid Amazon’s responsibilities under Coppa,” the group writes in a summary of their complaint.

They are also objecting to how Amazon is obtaining parental consent — arguing its system for doing so is inadequate because it’s merely asking that a credit or debit/debit gift card number be inputted.

“It does not verify that the person ‘consenting’ is the child’s parent as required by Coppa,” they argue. “Nor does Amazon verify that the person consenting is even an adult because it allows the use of debit gift cards and does not require a financial transaction for verification.”

Another objection is that Amazon is retaining audio recordings of children’s voices far longer than necessary — keeping them indefinitely unless a parent actively goes in and deletes the recordings, despite COPPA requiring that children’s data be held for no longer than is reasonably necessary.

They found that additional data (such as transcripts of audio recordings) was also still retained even after audio recordings had been deleted. A parent must contact Amazon customer service to explicitly request deletion of their child’s entire profile to remove that data residue — meaning that to delete all recorded kids’ data a parent has to nix their access to parental controls and their kids’ access to content provided via FreeTime — so the complaint argues that Amazon’s process for parents to delete children’s information is “unduly burdensome” too.

Their investigation also found the company’s process for letting parents review children’s information to be similarly arduous, with no ability for parents to search the collected data — meaning they have to listen/read every recording of their child to understand what has been stored.

They further highlight that children’s Echo Dot Kids’ audio recordings can of course include sensitive personal details — such as if a child uses Alexa’s “remember” feature to ask the AI to remember personal data such as their address and contact details or personal health information like a food allergy.

The group’s complaint also flags the risk of other children having their data collected and processed by Amazon without their parents’ consent — such as when a child has a friend or family member visiting on a play date and they end up playing with the Echo together.

Responding to the complaint, Amazon has denied it is in breach of COPPA. In a statement, a company spokesperson said: “FreeTime on Alexa and Echo Dot Kids Edition are compliant with the Children’s Online Privacy Protection Act (COPPA). Customers can find more information on Alexa and overall privacy practices here: https://www.amazon.com/alexa/voice [amazon.com].”

An Amazon spokesperson also told us it only allows kid skills to collect personal information from children outside of FreeTime Unlimited (i.e. the paid tier) — and then only if the skill has a privacy policy and the developer separately obtains verified consent from the parent, adding that most kid skills do not have a privacy policy because they do not collect any personal information.

At the time of writing, the FTC had not responded to a request for comment on the complaint.

In Europe, there has been growing concern over the use of children’s data by online services. A report by England’s children’s commissioner late last year warned kids are being “datafied,” and suggested profiling at such an early age could lead to a data-disadvantaged generation.

Responding to rising concerns the U.K. privacy regulator launched a consultation on a draft Code of Practice for age appropriate design last month, asking for feedback on 16 proposed standards online services must meet to protect children’s privacy — including requiring that product makers put the best interests of the child at the fore, deliver transparent T&Cs, minimize data use and set high privacy defaults.

The U.K. government has also recently published a whitepaper setting out a policy plan to regulate internet content that has a heavy focus on child safety.

Powered by WPeMatico

Live transcription and captioning in Android are a boon to the hearing-impaired

Posted by | accessibility, Android Q, artificial intelligence, deafness, Gadgets, Google, Google I/O 2019, Mobile, natural language processing, Speech Recognition, TC | No Comments

A set of new features for Android could alleviate some of the difficulties of living with hearing impairment and other conditions. Live transcription, captioning and relay use speech recognition and synthesis to make content on your phone more accessible — in real time.

Announced today at Google’s I/O event in a surprisingly long segment on accessibility, the features all rely on improved speech-to-text and text-to-speech algorithms, some of which now run on-device rather than sending audio to a data center to be decoded.

The first feature to be highlighted, live transcription, was already mentioned by Google. It’s a simple but very useful tool: open the app and the device will listen to its surroundings and simply display as text on the screen any speech it recognizes.

We’ve seen this in translator apps and devices, like the One Mini, and the meeting transcription highlighted yesterday at Microsoft Build. One would think that such a straightforward tool is long overdue, but, in fact, everyday circumstances like talking to a couple of friends at a cafe can be remarkably difficult for natural language systems trained on perfectly recorded single-speaker audio. Improving the system to the point where it can track multiple speakers and display accurate transcripts quickly has no doubt been a challenge.

Another feature enabled by this improved speech recognition ability is live captioning, which essentially does the same thing as above, but for video. Now when you watch a YouTube video, listen to a voice message or even take a video call, you’ll be able to see what the person in it is saying, in real time.

That should prove incredibly useful not just for the millions of people who can’t hear what’s being said, but also those who don’t speak the language well and could use text support, or anyone watching a show on mute when they’re supposed to be going to sleep, or any number of other circumstances where hearing and understanding speech just isn’t the best option.

Gif showing a phone conversation being captioned live.Captioning phone calls is something CEO Sundar Pichai said is still under development, but the “live relay” feature they demoed onstage showed how it might work. A person who is hearing-impaired or can’t speak will certainly find an ordinary phone call to be pretty worthless. But live relay turns the call immediately into text, and immediately turns text responses into speech the person on the line can hear.

Live captioning should be available on Android Q when it releases, with some device restrictions. Live transcribe is available now, but a warning states that it is currently in development. Live relay is yet to come, but showing it onstage in such a complete form suggests it won’t be long before it appears.

Powered by WPeMatico

Google’s new voice recognition system works instantly and offline (if you have a Pixel)

Posted by | Apps, artificial intelligence, Gadgets, gboard, Google, Mobile, natural language processing, nlp, Speech Recognition | No Comments

Voice recognition is a standard part of the smartphone package these days, and a corresponding part is the delay while you wait for Siri, Alexa or Google to return your query, either correctly interpreted or horribly mangled. Google’s latest speech recognition works entirely offline, eliminating that delay altogether — though of course mangling is still an option.

The delay occurs because your voice, or some data derived from it anyway, has to travel from your phone to the servers of whoever operates the service, where it is analyzed and sent back a short time later. This can take anywhere from a handful of milliseconds to multiple entire seconds (what a nightmare!), or longer if your packets get lost in the ether.

Why not just do the voice recognition on the device? There’s nothing these companies would like more, but turning voice into text on the order of milliseconds takes quite a bit of computing power. It’s not just about hearing a sound and writing a word — understanding what someone is saying word by word involves a whole lot of context about language and intention.

Your phone could do it, for sure, but it wouldn’t be much faster than sending it off to the cloud, and it would eat up your battery. But steady advancements in the field have made it plausible to do so, and Google’s latest product makes it available to anyone with a Pixel.

Google’s work on the topic, documented in a paper here, built on previous advances to create a model small and efficient enough to fit on a phone (it’s 80 megabytes, if you’re curious), but capable of hearing and transcribing speech as you say it. No need to wait until you’ve finished a sentence to think whether you meant “their” or “there” — it figures it out on the fly.

So what’s the catch? Well, it only works in Gboard, Google’s keyboard app, and it only works on Pixels, and it only works in American English. So in a way this is just kind of a stress test for the real thing.

“Given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application,” writes Google, as if it is the trends that need to do the hard work of localization.

Making speech recognition more responsive, and to have it work offline, is a nice development. But it’s sort of funny considering hardly any of Google’s other products work offline. Are you going to dictate into a shared document while you’re offline? Write an email? Ask for a conversion between liters and cups? You’re going to need a connection for that! Of course this will also be better on slow and spotty connections, but you have to admit it’s a little ironic.

Powered by WPeMatico

Google introduces educational app Bolo to improve children’s literacy in India

Posted by | Android, android apps, Apps, Education, Google, Google Play, india, Pratham Education Foundation, reading, Speech Recognition, text-to-speech | No Comments

Google is expanding its suite of apps designed for the Indian market with today’s launch of a new language-learning app aimed at children, called Bolo. The app, which is aimed at elementary school-aged students, leverages technology like Google’s speech recognition and text-to-speech to help kids learn to read in both Hindi and English.

To do so, Bolo offers a catalog of 50 stories in Hindi and 40 in English, sourced from Storyweaver.org.in. The company says it plans to partner with other organizations in the future to expand the story selection.

Included in the app is a reading buddy, “Diya,” who encourages and corrects the child when they read aloud. As kids read, Diya can listen and respond with feedback. (Google notes all personal information remains on-device to protect kids’ privacy.) Diya can also read the text to the child and explain the meaning of English words. As children progress in the app, they’ll be presented with word games that win them in-app rewards and badges to motivate them.

The app works offline — a necessity in large parts of India — where internet access is not always available. Bolo can be used by multiple children, as well, and will adjust itself to their own reading levels.

Google says it had been trialing Bolo across 200 villages in Uttar Pradesh, India, with the help of nonprofit ASER Centre. During testing, it found that 64 percent of children who used the app showed an improvement in reading proficiency in three months’ time.

To run the pilot, 920 children were given the app and 600 were in a control group without the app, Google says.

In addition to improving their proficiency, more students in the group with the app (39 percent) reached the highest level of ASER’s reading assessment than those without it (28 percent), and parents also reported improvements in their children’s reading abilities.

Illiteracy remains a problem in India. The country has one of the largest illiterate populations in the world, where only 74 percent are able to read, according to a study by ASER Centre a few years back. It found then that more than half of students in fifth grade in rural state schools could not read second-grade textbooks in 2014. By 2018, that figure hadn’t changed much — still, only about half can read at a second-grade level, ASER now reports.

While Google today highlights its philanthropic efforts in education, it’s worth noting that Google’s interest in helping improve India’s literacy metrics benefits its bottom line, too. As the country continues to come online to become one of the largest internet markets in the world, literate users capable of using Google’s products like Search, Ads, Gmail and others are of increased importance to Google’s business.

Already, Google has shipped a number of applications designed specifically for Indian internet users, like data-friendly versions of YouTube, Search and other popular services, like payments app Tez (now rebranded Google Pay), a food delivery service, a neighborhood and communities networking app, a blogging app and more.

Today, Bolo is launching across India as an open beta, while Google will continue to work with its nonprofit partners — including Pratham Education Foundation, Room to Read, Saajha and Kaivalya Education Foundation — a Piramal Initiative — to bring the app to more children.

Bolo is available now on the Google Play Store in India, and works on Android smartphones running Android 4.4 (Kit Kat) and higher. The app is currently optimized for native Hindi speakers.

Powered by WPeMatico

Say ‘Aloha’: A closer look at Facebook’s voice ambitions

Posted by | Apps, Developer, Facebook, Facebook Aloha, facebook messenger, Facebook Patents, Facebook Portal, Facebook Smart Speaker, Facebook Voice, instagram, Instagram Voice Messaging, Mobile, Social, Speech Recognition, TC, voice assistant, voice transcription | No Comments

Facebook has been a bit slow to adopt the voice computing revolution. It has no voice assistant, its smart speaker is still in development, and some apps like Instagram aren’t fully equipped for audio communication. But much of that is set to change judging by experiments discovered in Facebook’s code, plus new patent filings.

Developing voice functionality could give people more ways to use Facebook in their home or on the go. Its forthcoming Portal smart speaker is reportedly designed for easy video chatting with distant family, including seniors and kids that might have trouble with phones. Improved transcription and speech-to-text-to-speech features could connect Messenger users across input mediums and keep them on the chat app rather than straying back to SMS.

But Facebook’s voice could be drowned out by the din of the crowd if it doesn’t get moving soon. All the major mobile hardware and operating system makers now have their own voice assistants like Siri, Alexa, Google Assistant and Samsung Bixby, as well as their own smart speakers. In Q2 2018, Canalys estimates that Google shipped 5.4 million Homes, and Amazon shipped 4.1 million Echoes. Apple’s HomePod is off to a slow start with less than 6 percent of the market, behind Alibaba’s smart speaker, according to Strategy Analytics. Facebook’s spotty record around privacy might deflect potential customers to its competitors.

Given Facebook is late to the game, it will need to arrive with powerful utility that solves real problems. Here’s a look at Facebook’s newest developments in the voice space, and how its past experiments lay the groundwork for its next big push.

Aloha voice

Facebook is developing its own speech recognition feature under the name Aloha for both the Facebook and Messenger apps, as well as external hardware — likely the video chat smart speaker it’s developing. Code inside the Facebook and Messenger Android apps dug up by frequent TechCrunch tipster and mobile researcher Jane Manchun Wong gives the first look at a prototype for the Aloha user interface.

Labeled “Aloha Voice Testing,” as a user speaks while in a message thread, a horizontal blue bar expands and contracts to visualize the volume of speech while recognizing and transcribing into text. The code describes the feature as having connections with external Wi-Fi or Bluetooth devices. It’s possible that the software will run on both Facebook’s hardware and software, similar to Google Assistant that runs both on phones and Google Home speakers. [Update: As seen below, the Aloha feature contains a “Your mobile device is now connected Portal” screen, confirming that name for the Facebook video chat smart speaker device.]

Facebook declined to comment on the video, with its spokesperson Ha Thai telling me, “We test stuff all the time — nothing to share today but my team will be in touch in a few weeks about hardware news coming from the AR/VR org.” It unclear if that hardware news will focus on voice and Aloha or Portal, or if it’s merely related to Facebook’s Oculus Connect 5 conference on September 25th.

A source previously told me that years ago, Facebook was interested in developing its own speech recognition software designed specifically to accurately transcribe how friends talk to each other. These speech patterns are often more casual, colloquial, rapid and full of slang than the way we formally address computerized assistants like Amazon Alexa or Google Home.

Wong also found the Aloha logo buried in Facebook’s code, which features volcano imagery. I can confirm that I’ve seen a Facebook Aloha Setup chatbot with a similar logo on the phones of Facebook employees.

If Facebook can figure this out, it could offer its own transcription features in Messenger and elsewhere on the site so users could communicate across mediums. It could potentially let you dictate comments or messages to friends while you have your hands full or can’t look at your screen. The recipient could then read the text instead of having to listen to it like a voice message. The feature also could be used to power voice navigation of Facebook’s apps for better hands-free usage.

Speaker and camera patents

Facebook awarded patent for speaker

Facebook’s video chat smart speaker was reportedly codenamed Aloha originally but later renamed Portal, Alex Heath of Business Insider and now Cheddar first reported in August 2017. The $499 competitor to the Amazon Echo Show was initially set to launch at Facebook’s F8 in May, but Bloomberg reported it was pushed back amid concerns that it would exacerbate the privacy scandal ignited by Cambridge Analytica.

A new patent filing reveals Facebook was considering building a smart speaker as early as December 26th, 2016 when it filed a patent for a cube-shaped device. The patent diagrams an “ornamental design for a speaker device” invented by Baback Elmieh, Alexandre Jais and John Proksch-Whaley. Facebook had acquired Elmieh’s startup Nascent Objects in September of that year and he’s now a technical project lead at Facebook’s secretive Building 8 hardware lab.

The startup had been building modular hardware, and earlier this year he was awarded patents for work at Facebook on several modular cameras. The speaker and camera technology Facebook has been developing could potentially evolve into what’s in its video chat speaker.

The fact that Facebook has been exploring speaker technology for so long and that the lead on these patents is still running a secret project in Building 8 strengthens the case that Facebook has big plans for the voice space.

Patents awarded to Facebook show designs for a camera (left) and video camera (right)

Instagram voice messaging

And finally, Instagram is getting deeper into the voice game, too. A screenshot generated from the code of Instagram’s Android app by Wong reveals the development of a voice clip messaging feature heading to Instagram Direct. This would allow you to speak into Instagram and send the audio clips similar to a walkie-talkie, or the voice messaging feature Facebook Messenger added back in 2013.

You can see the voice button in the message composer at the bottom of the screen, and the code explains that to “Voice message, press and hold to record.” The prototype follows the recent launch of video chat in Instagram Direct, another feature on which TechCrunch broke the news thanks to Wong’s research. An Instagram spokesperson declined to comment, as is typical when features are spotted in its code but aren’t publicly testing yet, saying, “Unfortunately nothing more to share on this right now.”

The long road to Voicebook

Facebook has long tinkered in the voice space. In 2015, it acquired natural language processing startup Wit.ai that ran a developer platform for building speech interfaces, though it later rolled Wit.ai into Messenger’s platform team to focus on chatbots. Facebook also began testing automatically transcribing Messenger voice clips into text in 2015 in what was likely the groundwork for the Aloha feature seen above. The company also revealed its M personal assistant that could accomplish tasks for users, but it was only rolled out to a very limited user base and later turned off.

The next year, Facebook’s head of Messenger David Marcus claimed at TechCrunch Disrupt that voice “is not something we’re actively working on right now,” but added that “at some point it’s pretty obvious that as we develop more and more capabilities and interactions inside of Messenger, we’ll start working on voice exchanges and interfaces.” However, a source had told me Facebook’s secretive Language Technology Group was already exploring voice opportunities. Facebook also began testing its Live Audio feature for users who want to just broadcast sound and not video.

By 2017, Facebook was offering automatic captioning for Pages’ videos, and was developing a voice search feature. And this year, Facebook began trying voice clips as status updates and Stories for users around the world who might have trouble typing in their native tongue. But executives haven’t spoken much about the voice initiatives.

The most detailed comments we have come from Facebook’s head of design Luke Woods at TechCrunch Disrupt 2017 where he described voice search saying it was, “very promising. There are lots of exciting things happening…. I love to be able to talk to the car to navigate to a particular place. That’s one of many potential use cases.” It’s also one that voice transcription could aid.

It’s still unclear exactly what Facebook’s Aloha will become. It could be a de facto operating system or voice interface and transcription feature for Facebook’s smart speaker and apps. It could become a more full-fledged voice assistant like M, but with audio. Or perhaps it could become Facebook’s bridge to other voice ecosystems, serving as Facebook’s Alexa Skill or Google Assistant Action.

When I asked Woods “How would Facebook on Alexa work?,” he said with a smile “That’s a very interesting question! No comment.”

Powered by WPeMatico

Speech recognition triggers fun AR stickers in Panda’s video app

Posted by | Apps, augmented reality, Entertainment, funding, instagram, Mobile, panda, snap inc, Snapchat, Social, Speech Recognition, Startups, TC, video chat | No Comments

Panda has built the next silly social feature Snapchat and Instagram will want to steal. Today the startup launches its video messaging app that fills the screen with augmented reality effects based on the words you speak. Say “Want to get pizza?” and a 3D pizza slice hovers by your mouth. Say “I wear my sunglasses at night” and suddenly you’re wearing AR shades with a moon hung above your head. Instead of being distracted by having to pick effects out of a menu, they appear in real-time as you chat.

Panda is surprising and delightful. It’s also a bit janky, created by a five person team with under $1 million in funding. Building a video chat app user base from scratch amidst all the competition will be a struggle. But even if Panda isn’t the app to popularize the idea, it’s invented a smart way to enhance visual communication that blends into our natural behavior.

It all started with a trippy vision. Panda’s 18-year-old founder Daniel Singer had built a few failed apps and was working as a product manager at peer-to-peer therapy startup Sensay in LA. When Alaska Airlines bought Virgin, Singer scored a free flight and came to see his buddy Arjun Sethi, an investor at Social Capital in SF. That’s when suddenly “I’m hallucinating that as I’m talking the things I’m saying should appear” he tells me. Sethi dug the idea and agreed to fund a project to build it.

Panda founder Daniel Singer

Meanwhile, Singer had spent the last 6 years FaceTiming almost every day. He loved telling stories with his closest friends, yet Apple’s video chat protocol had fallen behind Snapchat and Instagram when it came to creative tools. So a year ago he raised $850,000 from Social Capital and Shrug Capital plus angels like Cyan (Banister) and Secret’s David Byttow. Singer set out to build Panda to combine FaceTime’s live chat with Snapchat’s visual flare triggered by voice.

But it turns out, “video chat is hard” he admits. So his small team settled for letting users send 10-second-max asynchronous video messages. Panda’s iOS app launched today with about 200 different voice activated stickers from footballs to sleepy Zzzzzs to a “&’%!#” censorship bar that covers your mouth when you swear. Tap them and they disappear, and soon you’ll be able to reposition them. As you trigger the effects for the first time, they go into a trophy case that gamifies voice experimentation.

Panda is fun to play around with yourself even if you aren’t actively messaging friends, which is reminiscent of how teens play with Snapchat face filters without always posting the results. The speech recognition effects will make a lot more sense if Panda can eventually succeed at solving the live video chat tech challenge. One day Singer imagines Panda making money by selling cosmetic effects that make you more attractive or fashionable, or offering sponsored effects so when you say “gym”, the headband that appears on you is Nike branded.

Unfortunately, the app can be a bit buggy and effects don’t always trigger, fooling you that you aren’t saying the right words. And it could be tough convincing buddies to download another messaging app, let alone turn it into a regular habit. Apple is also adding a slew of Memoji personalized avatars and other effects to FaceTime in its upcoming iOS 12.

Panda does advance one of technology’s fundamental pursuits: taking the fuzzy ideas in your head and translating them into meaning for others in clearer ways than just words can offer. It’s the next wave of visual communication that doesn’t require you to break from the conversation.

When I ask why other apps couldn’t just copy the speech stickers, Singer insisted “This has to be voice native.” I firmly disagree, and can easily imagine his whole app becoming just a single filter in Snapchat and Instagram Stories. He eventually acquiesced that “It’s a new reality that bits and pieces of consumer technology get traded around. I wouldn’t be surprised if others think it’s a good idea.”

It’s an uphill battle trying to disrupt today’s social giants, who are quick to seize on any idea that gives them an edge. Facebook rationalizes stealing other apps’ features by prioritizing whatever will engage its billions of users over the pride of its designers. Startups like Panda are effectively becoming outsourced R&D departments.

Still, Panda pledges to forge on (though it might be wise to take a buyout offer). Singer gets that his app won’t cure cancer or “make the world a better place” as HBO’s Silicon Valley has lampooned. “We’re going to make really fun stuff and make them laugh and smile and experience human emotion” he concludes. “At the end of the day, I don’t think there’s anything wrong with building entertainment and delight.”

Powered by WPeMatico