IoT voice interaction was once the stuff of Sci-Fi films but now, lots of us no longer bat an eye. From computers to telephones to digital assistants, speaking to a device has gone from a futuristic dream to an in-our-homes reality. And it appears this area has just begun to scratch the surface of its widespread potential. According to a new OC&C Strategy Consultant Study, voice buying could surpass $40 billion across the US and UK by 2022 (up from $2 billion now ).
For Elastic Path’s recent Hackdays, our team looked at voice-enabled commerce powered by Cortex. Specifically, we focused on allowing expert users to socialize as they normally would when putting orders that are complex, such as coffee orders. We wanted them to speak to the system as opposed to through the conventional digital interactions.
While Cortex ran the trade side of things, we used Google’s Dialogflow to manage voice recognition and produced a little NodeJS server to paste it all together. The conceptual secret sauce however, was a context-driven approach, complementing catalog-driven language processing.
Behind the words, buying things in real life is quite complex. When a client says,”I would like a triple shot espresso, please,” the underlying concepts at play, interpreted for a trade system, include the desire for the product, the thing variety itself, the intent to order the product, and a desire (or willingness) to pay.
“I would like a triple espresso, please” <==>”I need the espresso merchandise, but I need it of this triple shot variety. Also, I want to order this configured thing and I am prepared to pay for it.”
This is one of the important challenges for eCommerce voice interactions: context sensitivity. The ability to recognize key points from one control makes transactions smoother, encouraging adoption, reducing friction, and enabling voice interactions to mimic real-world experiences. To get a trade system,”circumstance” roughly translates to”what else” or the”next actions”. This just so happens to be Cortex’s specialization.
From Context to Commerce: Cortex Zooms to Next Actions
When you request an espresso, the set of inherent requirements include identifying the item, ordering, and paying. Cortex, with its flexibility in presenting the customer with following actions (adhering to the best practices of a mature REST Level 3 API), provides zoom parameters to connect between desired activities (Cortex documentation accessible here).
Search for your’espresso’ merchandise ==> include the’triple shot’ option ==> add it to my purchase ==> buy the Purchase
This series of activities fulfills a happy path model for ordering an espresso and the overall actions (“locate a solution and add it to the cart”) are naturally encouraged by Cortex. But a crucial piece in providing flexible interactions is the ability to configure their add-ons on-the-fly, producing bundled products that reflect changing prices and prices. Furthermore, we will need to supply this functionality in a manner that’s predictable and constant enough to establish a programmatic pattern (i.e. a determined series of source calls/zooms which we may use for almost any queried product), but flexible enough to support various sorts of configurations (additional shots, drink sizes, etc.). To achieve this, we used a customized implementation for Dynamic Bundles in our APIs, which provided support for choosing from a list of package constituent alternatives and dynamically adjusting accompanying products. Applying this accelerator in concert with out-of-the-box Cortex endpoints provided lively product configuration, but within a predictable pattern for adding all desired options and obtaining”next actions”.
Given this translation from context to trade, the next challenge is recognizing circumstance in voice commands.
The Gift of Gab: Natural Language Processing
Many large technology businesses provide NLP solutions to extract intents and details. We picked Google’s Dialogflow over Facebook’s Wit.ai and IBM’s Watson because of the ease-of-testing, development, and extensibility. While both Wit.ai and Watson offer strong language processing attributes, Dialogflow’s comprehensive feedback, deep community service, and compact connectivity with Android apparatus supported our rapid development and ultimate demos with minimal additional configuration.
Dialogflow utilizes”intents” and”entities” to decode and label input text. From a top level, intents explain the objective of the input — that aligns very closely with the notion of”context”. By executing an”I need” intent and coaching the NLP model with paragraphs that indicated this resolve, we joined the context with different input possibilities. This provided programmatic contextualization of voice input. Training the model can be accomplished through the Dialogflow GUI by providing sample inputs and assigning them to an intent. Since the model receives additional input, it becomes smarter and more precise about recognizing past, in addition to book, similar inputs. Behind the scenes, these intents and their trained inputs are represented as JSON (and might even be uploaded in a similar fashion ). Below is a JSON sample extracted from the”I need” intent’s list of inputs that are trained. This input associates the sentence”I need a triple espresso” with the desired intent, tagging the different pieces of this sentence.
Further, orders are seldom straightforward and recognizing variations within an arrangement requires not only context comprehension, but also detail recognition and relevancy knowledge. Product variations, like additional shots, different dimensions, milk varieties, etc., need the NLP system to know which details to flag. These dynamic parts of the input controls constitute”entities”, which are also characterized through the Dialogflow GUI and related to appropriate intents. This is where the vital details of a catalog get involved. With manually-imported catalog data and specified, corresponding configuration options, Dialogflow learned to parse certain products and variants from inputs, providing this information in the JSON output, also. From the case above, we see the system tagging things like”size” and”merchandise” — these are predefined entities connected with the”I need” intent. Hence, once we offer training input that resolves to this intent, the system picks up the related, anticipated entities and validates them for greater specificity and precision going forward.
Dialogflow also provides tools for analyzing new inputs and imagining this output in JSON. After training the model, providing the input”I’d like a triple espresso, please” supplies (sample) output.
This provides the necessary structural predictability, allowing us to consume, tag, and decode vocal input data. We used Dialogflow’s Fulfillment module to pass these details on to our NodeJS server, which parsed this information and kicked off the anticipated Cortex flow to meet these desires.
Taking this a step farther, Dialogflow enables users to import detail recognition knowledge (i.e. entity definitions) as JSON data. By way of instance, the following provides a snippet of the JSON definition for a”size” entity.
Given this capability, an individual can group catalog-based add-ons under specific entities to provide dynamic and automated catalog-driven language processing. By way of instance, if espresso products are connected to some SKU choices concerning dimensions, we could write a script which parses this source information (in our case, an XML file) and outputs a JSON entity definition for”size”, assigning the parsed SKU alternatives as thing”values” and”synonyms”.
What It All Means
Context-aware voice trade fills a very clear function in an increasingly voice-enabled digital world. By combining natural language processing with the elastic commerce functionality of Cortex, we easily gleaned contextual information and leveraged natural voice commands into frictionless purchasing experiences for clients. Looking forward, context-awareness and catalog-driven language processing may enable complex purchases across industries, while the organic interactions driving these activities further bridge the gap between real-world interactions and compact digital commerce.