INTRODUCTION
CHAT-BOT
According to the Oxford dictionary [1], a chat-bot is a computer program designed to simulate conversations with human users, especially over the Internet
To put it in a simple way, it is a bot that is able to automatically understand and reply to users’ queries
Eliza, the first conversational bot created by Joseph Weizenbaum at MIT, revolutionized human-computer interaction by providing remarkably authentic communication, often misleading users into believing they were conversing with a real person.
In the early 1990s, the Turing Test was created to assess chat-bots by having a person interact with both a human and a computer, hidden from view, to determine which is which This evaluation method remains relevant today, with many conversational bots successfully passing the test Remarkably, there have been instances where users developed romantic feelings for chat-bots, only to later realize they were interacting with sophisticated programs, not real people.
Until now, chat-bot technology has experienced 3 generations [3]:
1 st Generation: The developer will set the rule of speech, if users say this, the chat- bot will reply with relevant answers
2 nd Generation: Based on supervised learning in Machine Learning, we can train a model with labeled data so it can understand how to talk in a speech
3 rd Generation: Based on supervised and unsupervised learning, the model can handle more complex sentence
The rapid advancement of AI technology has enabled chatbots to effectively mimic human conversation, facilitating the completion of various tasks Messaging applications such as WhatsApp, Slack, and Skype are witnessing significant user growth, with Facebook Messenger boasting over 1.2 billion monthly users.
Chatbots are increasingly being used as customer service assistants on websites, allowing users to communicate with companies more efficiently When customers need assistance, they can engage with chatbots that provide automatic responses to simple inquiries For more complex issues, chatbots categorize messages, enabling human representatives to address user concerns more effectively.
Chatbots are predominantly used as personal assistants on smartphones, capable of handling a range of tasks based on user inquiries For instance, Google Assistant not only manages smart home functions but also engages users in friendly conversations.
A study presented at the 4th International Conference on Internet Science in November 2017 examined the benefits users experience when utilizing chatbots The research identified key motivators for chatbot usage, highlighting the advantages that drive user engagement.
Productivity : Chat-bots provide the assistance or access to information quickly and efficiently
Entertainment : Chat-bots amuse people by giving them funny tips, they also help killing time when users have nothing to do
Social and relational factors : Chat-bots bring more social experiences Chatting with bots also helps to avoid loneliness, gives a chance to talk without being judged and improves conversational skills
Curiosity : The novelty of chat-bots sparks curiosity People want to explore their abilities and to try something new
A chat-bot system comprises three key components: the Responder, Classifier, and Graph-master The Responder serves as the interface between users and the chat-bot, managing input and output while transferring user data to the Classifier The Classifier processes data from both the Responder and the Graph-master, breaking down user input into logical components and delivering the processed information accordingly Lastly, the Graph-master functions as the chat-bot's brain, housing all pattern matching algorithms essential for generating appropriate responses.
Figure 1.1.1 Early Chat-bot components [6]
The rapid advancement of technology, particularly in artificial intelligence, has led to significant improvements in chatbot architecture, resulting in a variety of sophisticated models with enhanced performance These models can be categorized into two main types: generative models and retrieval-based models.
Figure 1.1.2.b Retrieval-based models architecture [7]
The Generative model, rooted in Deep Learning, enhances chat-bot intelligence by training on a vast array of examples, enabling it to generate appropriate responses after numerous interactions However, constructing this architecture is challenging due to the necessity for extensive training data In contrast, the Retrieval-based model is simpler to implement; it relies on predefined responses based on user expressions and contextual rules, ensuring grammatical accuracy and contextual relevance, even if it doesn't guarantee 100% response accuracy Consequently, most chat-bot platforms currently utilize Retrieval-based models as their foundational technology.
MACHINE LEARNING
Machine Learning (ML) is a subfield of AI This field focuses on researching algorithms to give computers the ability to learn from data and information autonomously without being explicitly programmed
To get amazing achievement today, Machine Learning has been through a great history with the contribution of great scientists Everything started from the 50s of the
1950 – “Turing test” was created by Alan Turing to determine the learning ability of a computer, mostly for conversation between computers and human
1952 – Computer learning program was first designed as a game of checkers by Arthur Samuel The performance of the computer improved after each time it played
In 1957, Frank Rosenblatt introduced the Perceptron, the first neural network lacking hidden layers, which was initially thought to be capable of solving a wide range of problems However, this belief was challenged when Marvin Minsky demonstrated that the Perceptron could not solve the simple EXOR mathematical function.
In 1967, the development of the "nearest neighbor" algorithm marked a significant advancement in computer recognition tasks This algorithm operates by memorizing all available data and subsequently predicting the output that is most similar to the input.
1979 – Students at Stanford University invented the “Stanford Cart” which can navigate obstacles in a room on its own
1981 – Gerald Dejong introduced the concept of Explanation-Based Learning (EBL), in which a computer analyses training data to find the general rule by discarding unimportant information
1985 – NetTalk was invented by Terry Sejnowski It allows the computer to learn to pronounce words the same way a baby does
1986 – Geoffrey Hinton, who is known as Godfather of Deep Learning, with his
Two researchers published a paper on the backpropagation algorithm, enabling neural networks with hidden layers to autonomously adapt and learn from data This advancement allows these networks to tackle complex mathematical problems, such as the EXOR function, which was previously unsolvable.
1997 – IBM’s Deep Blue beaten the world champion at chess
2006 – Hinton, once again, introduced unsupervised pre-training through Deep Belief Nets (DBN) From there, the term “deep learning” referring to the algorithms using neural networks was born
Since then, Machine Learning and its subfields, especially Deep Learning, have grown up tremendously and gotten great achievements
Imagine if you need to implement a spam filter, you would have 2 approaches, one is the traditional way, the another is ML-based
Consider the first one, you would have a flow chart like Figure 1.2.1
Creating an effective spam filter requires a comprehensive set of rules to address various potential scenarios, as the challenge is complex and nuanced Despite your best efforts, it's possible that some rules may still be overlooked, impacting the filter's overall performance.
A machine learning-based spam filter can autonomously learn to identify specific words and phrases that effectively predict whether an email is spam This approach enhances the accuracy of spam detection, as illustrated in Figure 1.2.2.
Machine learning algorithms significantly shorten program lengths, allowing developers to focus on implementing these algorithms and inputting data, streamlining the development process.
Figure 1.2.2 The ML-based approach [10]
1.2.4 Type of Machine Learning tasks:
ML systems are very diverse, however, they are mainly categorized into two types: Supervised Learning and Unsupervised Learning
Supervised Learning involves data samples represented as (x, y), where x is the input value and y is the corresponding ground-truth output This approach can be likened to a teaching method where a mentor provides feedback on your answers, helping you learn to solve similar problems independently.
For Unsupervised Learning, samples of data just contain (x) instead of (x, y) like Supervised Learning This means the system try to learn without a teacher to correct answers
Dive into these categories, there are different tasks in each other:
In Supervised Learning, Regression is a key task focused on predicting numeric values, such as estimating house prices By analyzing various features, including location and the number of bedrooms, machine learning models can accurately forecast the cost of a house.
Classification is the most prevalent task in Supervised Learning, where the goal is to assign discrete categories to input data A prime example of this is a spam mail filter, which uses a labeled dataset of emails classified as spam or non-spam to train a model This model is designed to accurately predict classifications for both the training dataset and new incoming emails.
Figure 1.2.4 Classification example: Spam mail filter [10]
The task involves determining the optimal method for categorizing samples into random groups or clusters For instance, consider a dataset of 1,000 essays focused on the US Economy; the goal is to automatically group these essays based on similarities across various variables, including word frequency, sentence length, and page count.
The special thing in the non-clustering problem is that dataset is messy and the duty of our system is to find the structure in that “Cocktail party
11 algorithm” is an example In a cocktail party, the system is trying to identify individual voices and music from a mesh of sounds
Machine Learning has evolved significantly due to the groundbreaking research of great scientists, enabling it to perform tasks more efficiently than ever before Today, we encounter various applications of Machine Learning in our daily lives, showcasing its widespread influence and effectiveness.
Virtual personal assistants like Siri, Alexa, and Google Assistant are designed to help users find information easily, both online and offline Interacting with these assistants is straightforward; users can simply ask questions through voice or text and receive quick responses For instance, by asking, "What is the score of the Germany vs Brazil match?" users can instantly get updates, such as "The score of Germany vs Brazil match is 7-1." Additionally, these assistants can perform specific tasks based on user instructions.
“Set an alarm for 6 AM next morning”, “Remind me to visit Visa Office day after tomorrow”
Machine Learning is crucial in personal assistants, enabling them to gather and enhance information based on your past interactions This data is then used to provide results that are customized to your preferences.
Traffic Predictions: GPS navigation services can store our location and velocities at moments and save information in a central server to manage traffic and create maps of
Utilizing machine learning can enhance traffic analysis and congestion prevention, particularly during peak times However, the accuracy of these analyses may be compromised if a significant number of vehicles do not utilize GPS navigation services By focusing on areas with frequent congestion, machine learning techniques can provide valuable insights for better traffic management.
Machine learning is essential in online transportation networks for accurately estimating ride prices during cab bookings and minimizing detours As highlighted by Uber ATC's chief engineer, predicting rider demand allows for the strategic implementation of price surges during peak hours.
NATURAL LANGUAGE PROCESSING
Natural languages, including Chinese, English, German, and Vietnamese, are the languages spoken by humans Natural Language Processing (NLP) is a branch of artificial intelligence dedicated to enabling computers to analyze and comprehend human language in both text and speech forms.
Natural language poses significant challenges for computers due to its inherent ambiguity and the lack of strict rules governing human interpretation, including grammar Consequently, Natural Language Processing (NLP) must go beyond mere word recognition to effectively comprehend the complexities of human language.
13 but also understand whole context of a writing or conversations When you go through the next parts, you will understand how ambiguous it is [12]
Natural Language Processing (NLP) addresses various challenges, one of which is extracting information from unstructured data Information extraction involves automatically obtaining structured information from unstructured or semi-structured machine-readable documents, primarily focusing on processing human language texts Recent advancements in multimedia document processing, such as automatic annotation and content extraction from images, audio, and video, also exemplify the concept of information extraction.
Figure 1.3.1 The simple pipeline of information extraction [14]
The Figure 1.3.1 shows a common approach to information extraction that goes through 5 processing steps We will dive into what each step means:
In this task, we aim to identify sentence boundaries within text documents to effectively split them into coherent sentences While the period (".") is typically a reliable indicator for distinguishing sentences, it is not always sufficient There are instances where a period appears in the text but does not signify the end of a sentence.
The IP address format can be exemplified by "192.108.168.08" or in decimal as "622." A fundamental algorithm utilizing the regular expression [.?!][()"]+[A-Z]—which identifies punctuation followed by a space and an uppercase letter—has been shown to produce a 6.5% error rate when applied to the Brown corpus and the Wall Street Journal, which are essential datasets for natural language processing research.
In today's landscape, advanced Machine Learning techniques are revolutionizing data processing Unsupervised learning models can be trained across various languages and genres, while supervised learning necessitates the annotation of a training set, integrating specific rules to enhance accuracy and effectiveness.
Tokenization is the task that divides a sentence into linguistic units called token such as words, punctuation or numbers [16] For example:
“Dogs eat bones.” => “Dog | eat | bones |.”
The example above contains 4 tokens in total, which are “Dogs”, “eat”,
Tokenization is more complex than it appears, as simply splitting text by spaces can lead to significant errors with special tokens like "C++", email addresses such as abc@gmail.com, and date formats like "23/05/96" or "23-May-1996" To effectively handle these challenges, we typically implement a rule-based system that incorporates dictionaries and machine learning techniques.
3 Part of speech tagging (POS tagging):
Part of speech tagging involves assigning a label to each word in a sentence based on its grammatical role This process generates a list of tuples for each sentence, with each tuple formatted as (word, tag) For instance, the output will indicate the specific part of speech associated with every word, enhancing linguistic analysis and understanding.
“Dogs eat bones.” => {{‘Dogs’, Noun}, {‘eat’, Verb}, {‘bones’, Noun}}
POS tagging is a crucial step in natural language processing that helps in recognizing entities, extracting themes, and analyzing sentiment This process faces two main challenges: ambiguity, where a single word can have multiple meanings—like "duck," which can refer to either a bird or a downward motion—and the presence of words not included in the training corpus, making accurate tagging difficult To address these challenges, we commonly employ Supervised Learning techniques.
Entity recognition involves identifying and extracting short phrases from sentences that have been tagged for parts of speech, aiming to determine the specific types of words or noun phrases present Named entities refer to these distinct elements that can include names of people, organizations, locations, and other specific items within the text.
Named entities (NE) are specific noun phrases that identify distinct individuals, including people, organizations, locations, times, dates, percentages, and scores Challenges arise due to the various forms in which names can appear, making it difficult to determine the adequate number of entities needed for effective information extraction Additionally, the extensive range of potential NEs exceeds what can be feasibly included in dictionaries Consequently, a machine learning-based approach, particularly classification, is highly effective for Named Entity Recognition (NER).
“Duc is at Sky Tower” => {{“Duc”, Noun, Person}, {“Sky Tower”, Proper Noun, Location}}
In the process of extracting relationships between named entities, we generate a list of tuples formatted as {X, a, Y} Here, X and Y are the named entities of specified types, while 'a' is a string that describes the relationship between them For instance, this method effectively captures the connections among various entities.
“Windy married to Damian.” => {[Person: Windy] ‘married to’ [Person: Damian]}
In general, there are often 4 types of relationships that we want to extract from text [17]:
Role: relates a person to an organization or a geopolitical entity subtypes: member, owner, affiliate, client, citizen
PART: generalized containment subtypes: subsidiary, physical part-of, set membership
AT: permanent and transient locations subtypes: located, based-in, residence
SOCIAL: social relations among person subtypes: parent, sibling, spouse, grandparent, associate
Information extraction is crucial for chatbots, enabling them to efficiently comprehend user requests by identifying key terms For instance, in our project, we focus on extracting time-related information from customer inquiries In a straightforward request such as “Make an alarm at 6 p.m tomorrow,” the system can recognize “6 p.m.” as 18:00:00 and interpret “tomorrow” as the following day.
Google Calendar
Google Calendar is a time management and scheduling plan service developed by Google It first released as a beta version on April 13 th , 2006, then exited the beta stage
17 in July 2009 Initially available on the web and on the Android operating system only, later available on the IOS on March 10, 2015, through phone applications
Google Calendar received many compliments from critics around the world After
2015, the app went through big changes after multiple updates from developer Google Nowadays, for some people, Google Calendar becomes one of the must-have apps for their daily life
Google Calendar enables users to effortlessly create and modify events, each defined by a specific start and end time Users can set events to recur by selecting various parameters, and they can also utilize different colors to easily identify and differentiate each event from others.
Users can choose from various event types to add to their digital calendar, enhancing organization and planning The "Reminder" feature allows users to set notifications for selected events on their mobile phones, offering customizable options based on event type and timing.
APPLICATION PROGAMMING INTERFACE (API)
Google Calendar can also show up in other Google services like Gmail inbox, Google Now and Google Keep
Google has enhanced its services with machine learning by introducing the "event from email" feature, which automatically extracts event details from users' Gmail accounts Additionally, the "Smart suggestion" tool provides recommendations for titles, contacts, and locations when organizing events, making it easier to arrange meetings with groups.
Connectivity has revolutionized our lives, enabling us to access the world at our fingertips through personal devices that allow us to purchase, post, and share from anywhere This unprecedented level of connection is made possible by APIs, or application programming interfaces, which facilitate the seamless transfer of data between different devices and applications, allowing us to organize and manage information with just a few clicks.
On February 2nd, 2000, API web was published to the public
API is a tool to connect devices/application In other words, API is a link to transfer information
APIs serve as essential building blocks for developers, simplifying programming by linking functions together Understanding the request and response formats specific to the connected application is crucial APIs enable devices or applications to receive data, process it, and return relevant information to the connected device or application When a company develops an API, they are essentially creating a bridge for seamless data interaction.
19 a receive-response tool composed of only data (i.e JSON…) so that API’s user can use those data to process data in their device/website
APIs empower developers to create distinctive features for their products by handling essential functions, allowing for greater innovation Designed by professional companies, APIs offer high value and quality, making them versatile solutions applicable in various scenarios.
In today's digital landscape, the majority of websites leverage third-party APIs to enhance efficiency and streamline operations These APIs offer essential functionalities that enable programmers to develop applications and websites more quickly and with greater ease.
Providers send billions of API requests every day For example, Facebook, Google send 5 billion API requests every day; with Twitter, it sends 15 billion API requests every day…
The number of APIs has significantly increased in recent years, with ProgrammableWeb reporting a rise from 105 APIs in 2005 to 15,000 APIs by 2016, and this growth continues daily.
MATERIALS AND METHODS
SYSTEM STRUCTURE
Our system consists of three main components: the Dialogflow API, the Google Calendar API, and the back-end system For a clearer understanding of this structure, please refer to Figure 2.1.1 below.
The system operates by allowing customers to input their requests or questions into a text box within the user interface This input is then forwarded to the Dialogflow API by the back-end system, where the natural language processing (NLP) system processes the requests.
Dialogflow is utilized to extract essential information from customer requests, including content, title, date, time, and action It then sends a JSON response to the back-end system, allowing the information to be transformed into the appropriate data structure The back-end system selects suitable actions to communicate with the Google Calendar API, such as uploading, listing, or deleting events For actions that require data from Google Calendar, like listing events, the back-end system retrieves responses from the API Ultimately, users can access all necessary information and announcements through the user interface (UI).
DIALOGFLOW
Dialogflow, Google's AI-driven natural language understanding tool, enables the creation of conversational experiences like voice applications and chatbots This powerful system seamlessly connects with users across various platforms, including websites and mobile apps.
Dialogflow is a powerful tool for creating NLP-based applications such as virtual assistant and chat-bot There are many reasons for using Dialogflow as a NLP developing tool:
1 Powered by Google’s machine learning:
In today's landscape, Natural Language Processing (NLP) tasks are primarily driven by Machine Learning (ML), but developing effective ML models demands significant resources, including time and data To address these challenges, many leading technology companies offer pre-trained models that allow developers to fine-tune them for specific applications Among these options, Google's machine learning solutions stand out as a top choice due to their pioneering advancements in the field.
Leverage Google's robust AI framework, TensorFlow, and its extensive data resources to enhance your projects By utilizing Dialogflow, you can tap into Google's pre-trained natural language processing (NLP) models, streamlining your workflow and bridging the gap between your expertise and advanced AI technology.
Google infrastructure is tremendously giant and Dialogflow is backed by Google Specifically, Dialogflow runs on Google Cloud Platform, letting you run large-scale applications
3 Optimized for the Google Assistant:
Google Assistant is the leading virtual assistant globally, available by default on all Android devices By integrating Dialogflow with Google Assistant, you can expand your application's capabilities and perform a variety of pre-existing tasks seamlessly.
Dialogflow offers a user-friendly interface that simplifies the development process, with comprehensive instructions available on its website By handling most technical aspects, Dialogflow allows users to save time and avoid the need to build applications from the ground up.
Many brands today utilize Dialogflow for diverse business applications, with Domino's Pizza serving as an excellent example of how this technology can enhance operations.
Domino’s was founded in 1960 and now becomes a worldwide pizza brand with 14000+ restaurants in 85+ countries Determined always to innovate and keep pace with
23 ever-changing consumer behavior, the company turned to build rich conversational experiences powered by natural language understanding (NLU) and ML
In August 2016, Domino's selected NLU solutions for their flexibility and scalability to handle various ordering intents According to Galluch, the program leader for digital experience, the conversational design allows for numerous ordering paths, accommodating the extensive menu options The team values Dialogflow's user interface for its user-friendly and intuitive design, making it an enjoyable tool for enhancing customer interactions.
With over 50 years of customer service experience, Domino's has developed a chatbot utilizing Dialogflow's natural language understanding (NLU) capabilities to handle both basic customer interactions and more complex ordering scenarios A significant benefit of using Dialogflow is its compatibility with devices featuring Google Assistant, allowing customers to simply say, "Hey Google, talk to Domino's," to place orders seamlessly.
Figure 2.2.1 Domino’s chat-bot example [18]
Source: https://dialogflow.com/case-studies/dominos/
Domino's pizza bot has exceeded initial performance expectations, and the company continues to enhance and refine the conversational experience to better serve customer needs.
2.2.4 How Dialogflow chat-bot works?
The way Dialogflow chat-bot works can be described in the following diagram in Figure 2.2.2:
Figure 2.2.2 Dialogflow chat-bot model architecture
The model operates by first processing user input and directing the query to the relevant intent It then utilizes Entity Recognition to identify and tag important keywords and entities within the query Next, the response generator assesses the context of the user's request, gathering pre-defined responses to create potential reply candidates Finally, the response selector evaluates these candidates using specific rules or algorithms, ultimately selecting the most appropriate response to deliver to the user.
Dialogflow contains many features for you to build your own chat-bot easily Walking through these most important features of Dialogflow helps you build a simple natural language understanding model
An agent in a Dialogflow-based project serves as a fundamental unit that processes natural language requests across diverse devices and platforms, including apps and services It activates when an input request aligns with one of its defined intents, enabling the retrieval of actionable data The workflow of agents is illustrated in Figure 2.2.3.
2.2.5.1.2 Create and configure the agent
To manage your Dialogflow console, sign in and click "GO TO CONSOLE" in the upper right menu Then, select "Create new agent" from the left menu, which will display the Dialogflow UI as illustrated in Figure 2.2.4.
Figure 2.2.3 Handling of a user request An agent encompasses the Dialogflow components (Note: DB: Database) Source: https://dialogflow.com/docs/agents
Figure 2.2.4 Dialogflow’s first agent UI
In Figure 2.2.4, you can customize your agent by setting its name, language, and time zone To integrate your chat-bot with Google Cloud Platform (GCP), create a Google Project and enter its name, ensuring both your agent and project are linked to the same Google account Complete the setup of your agent by clicking the blue CREATE button located at the top right center of the user interface.
To configure additional features for an agent, click the gear icon next to the agent's name, which will display the settings screen as shown in Figure 2.2.5.
Figure 2.2.5 The setting screen for agent [19]
Because there are so many features so we just focus on explaining the key features that we mainly care in our project
Dialogflow currently offers two API versions: V1 and V2, with V2 being the latest and featuring enhanced capabilities The request and response formats in V2 differ from those in V1 For our project, we have developed the back-end system to utilize data from the V2 response format, necessitating the selection of API V2.
The token is used for security and authority in API access
This token only allows users to access API to use agent through calling matching built-in intent You can refresh this key to restart the authority
This token allows users to do everything with the agents, including use and configure
This mode is just applied fully ML to our model This method can get the best accuracy in many cases, however, it requires a lot of training data
Hybrid (Rule-based and ML):
Google Calendar API
The Google Calendar API (GCAL API) is a powerful tool developed by Google Developers that enables third-party developers to integrate and manage tasks within the Google Calendar application With the Google Calendar API, users can discover and view public calendar events, and if authorized, they can also access and modify private calendars and their associated events.
When you want to access any APIs services or apps of Google, we need to go through a getting authority process (as shown in figure 2.2.16) [22]
Figure 2.2.16 OAuth 2.0 protocol for authentication and authorization [22]
The client credential is a unique code that enables users to grant applications access without sharing their username and password For a deeper understanding of its functionality, refer to this resource [22] Google offers the oauth2client library to facilitate the implementation of OAuth 2.0 in backend systems Below is a sample code snippet utilizing the oauth2client library in Python SDKs: ```pythonfrom apiclient import discoveryfrom httplib2 import Httpfrom oauth2client import file, client, tools```
SCOPES = 'https://www.googleapis.com/auth/calendar' store = file.Storage('storage.json') creds = store.get() if not creds or creds.invalid:
40 flow = client.flow_from_clientsecrets('client_secret.json', SCOPES) creds = tools.run_flow(flow, store)
GCAL = discovery.build('calendar', 'v3', http=creds.authorize(Http()))
2.3.3 Methods for event in Google Calendar
An event is a specific unit within a calendar plan, marked by a designated date or time range When you create a plan or set an alarm, you are essentially setting up an event The Calendar API offers various methods to manage these events In our project, we utilize three primary methods to interact with the Calendar API effectively.
To create an event in our calendar, the insert method requires the starting and ending times of the event Additionally, users can customize their events by adding reminders or inviting guests via email Below are the relevant code snippets for a clearer understanding of this method.
GCAL.events().insert(calendarId='primary', sendNotifications=True, body=EVENT).execute()
The delete method enables the removal of an event using its eventID, which can be obtained via the list method For instance, you can execute the following code: service.events().delete(calendarId='primary', eventId='eventId').execute().
The List method retrieves a collection of events and their details when triggered You can customize the event collection process and the output format using parameters such as timeMin and timeMax to specify the desired time range For example, the sample code provided demonstrates how to use the List method to fetch events from a primary calendar, specifying the time period and ensuring that events are returned in chronological order The code processes the retrieved events, extracting the start and end times as well as the event summaries into structured arrays for further use.
BACK-END SYSTEM
Dialogflow is designed for information extraction and response generation, while Google Calendar serves to store and manage plans The back-end system operates between these two platforms, processing data from both and providing a user interface for direct interaction with users.
The Main part includes functions needed for basic application operation
The code below is essential for the app, as it utilizes the Dialogflow Token, manages the client's login process, and sets up the necessary components for the Google Calendar API to function effectively.
The library contains various built-in functions to shorten the code programmers have to write Some basic functions are given by Google and continue to be updated
Packages used in library part are:
1) Apiai: is Python SDK of Dialogflow It contains functions that help send data to the database, receive data from the database It also helps taking messages from the user, sending them to Dialogflow and then receive the reply given from database
2) Json: to process JSON files, which is the response from API, containing processed data
3) Apiclient: necessary for using Google Calendar API and Google Calendar related functions
4) Outh2client: google library that helps to manage user authority
5) Datetime: To process the datetime type data, later used to compare, to add a period of time, to get the time in the needed format…
6) Numpy: the create numpy format, which helps create variable string such as a[0], a[1]…
Functions here are almost manually created to shorten the actual code, and most importantly, to shorten the time that needed to run the whole application
The back-end code is composed of various functions that perform essential tasks for the application, ultimately contributing to its development Once the supporting library is imported, the code utilized in these functions becomes accessible for use.
Overall, the function is divided into 2 main groups:
1) Functions that support analyzing - associating with the Dialogflow
2) Functions that support working with Google Calendar API
2.4.3.1 Function for Dialogflow get_usermsg():
Define function: message taken from “User:” to the double space “ “
The function `get_information(user_message)` processes messages received from users, extracting essential information for use in Dialogflow.
This function is used to get the respond of Dialogflow for each message sent to Dialogflow server get_date(user_message):
The function user_message retrieves user input through get_usermsg() and sends it to the Dialogflow server for processing It extracts the date from the message, defaulting to "today" if no date is provided.
TimeS and TimeE here later used in function get_plan() to display plans between timeS and timeE period
Input data: string type user_message, which is what user type in the chat box
Output data: string type date and time start, date and time end with time zone added get_datetimeset(user_message):
When a user intends to schedule an event in Google Calendar, this function is activated to capture the desired time for their plan.
The process of taking out time to set plan is following these step (as shown in Figure 2.4.1):
1 Taking out the date part
2 Taking out time start part
3 Taking out time end part
4 Reconstruct format for timeS and timeE (like adding date, adding timezone or change to string type data)
Figure 2.4.1 Function get_datetimeset() flowchart get_datetimerev(user_message)
When a user indicates their desire to remove plans from Google Calendar within a specific timeframe, this function is activated to identify and eliminate the selected time period from their schedule.
The process of taking out time to remove plan is following these step (as shown in Figure 2.4.2):
1 Taking out the date part
2 Taking out time start part
3 Taking out time end part
4 Reconstruct format for timeS and timeE (like adding date, adding timezone, change to string type data.)
Figure 2.4.2 Function get_datetimeremove() flowchart get_title(user_message)
Get the title that needed in Google Calendar
Show respond taken from Dialogflow after sending user message on python screen
2.4.3.2 Function for Google Calendar upload_plan(title, timeS, timeE)
Input: timeS and timeE taken from function get_datetimeset(), title taken from function get_title() TimeS and timeE must be the right format taken from the right function
Output: Plan is uploaded onto Google Calendar Each message from the user is sent, only 1 plan can be uploaded get_plan(timeS, timeE)
Input: timeS and timeE in the same data type and format as the previously explained function
Output: every plan with has “start time” or “end time” in the timeS to timeE period Their Start time, their end time, their title
Uses: list all the plan in the period of time from the plan that will be met first get_id(timeS, timeE)
Retrieve all event IDs for plans where the "start time" and "end time" fall within the specified timeS and timeE period These event IDs can subsequently be used to delete a specific plan using the delete_plan(eventID) function.
Plan with EventID taken from get_id when put in delete_plan will be deleted from Google Calendar show_event(timeS, timeE, summary)
Input: string date/time with the right format, string summary
Uses/output: use the output of function get_plan When data is put in this function the plan for the day is displayed compare( )
Input: Start time and end time of the event that the user wants to make a plan on Google Calendar
Uses: When a demand is stated, the function then compare the plan user recently made to every existed plan of the day If any of those plans are
The app displays all daily plans, allowing users to delete or reschedule any duplicated plans If a plan is unique, it is seamlessly uploaded to Google Calendar.
The functions were initially developed to test various available features, but they may become unnecessary over time or could later be utilized to enhance the application, such as the send_event() function.
This function is for activating an event in the agent
The time values obtained from get_date, get_datetimeset, and get_datetimerev are in string format Initially, a function was utilized to convert these strings into datetime format for comparison purposes; however, this conversion is no longer necessary.
Table 2.1 List of back-end built-in functions in the library
-get_datetime user_message (TYPE: string)
Date and time (TYPE: string)
Get date and time when user type in the chat box get_plan Time start, time end (*datetimetz)
Time start, time end, event title (string)
Get all the plan between time start and time end get_id Time start, time end (datetimetz)
Event id (string) Get all the event
ID of events between time start and time end
48 delete_plan EventID (string) Delete the plan of the event ID declared show_event Time start
(*datetime), time end (datetime), summary (string)
Time start, time end, summary (string)
Show date/time and event summary in a convenient format str2time Time string
Time (datetime) Convert time from string to datetime get_title user_message
(string) title (string) Get the title of the plan from users’ requests
*datetime: the data type of datetime library which indicates date and time
*datetimetz: datetime with time zone information
RESULT
DEMO
Our intents are categorized into two main groups: context intents and non-context intents Both types share similar characteristics, but they differ in how responses are generated and how parameter values are extracted Context intents derive their extraction method from previous intents, while non-context intents extract parameter values directly from the user's requests Figure 3.1.1 illustrates the flow chart for setting a plan and its associated follow-up intents, which can inspire the development of additional similar intents.
Dialogflow efficiently extracts essential information from user messages, with each feature requiring specific must-have data If any required information is missing, the chatbot prompts the user to provide the necessary details for optimal functionality Developers can select the must-have information while utilizing the Dialogflow platform.
Different pieces of information that can be extracted from the user are all in string type data Those pieces of information are:
Time Start: hh:mm:ss
Time End: hh:mm:ss
Title: a string that states the summary of the event
A.M or P.M Example: some works of the agent
When the user provides all the information
Today is Friday, July 13 th
Figure 3.1.2.b Information from the agent
Dialogflow can also reply to some different case like a greeting, asking for an explanation from the user through built-in intents
These friendly talks are this chatbot’s major so it sometimes cannot understand the phrase This can be improved through collecting data
The required information to create an event is timeS If the user didn’t provide a specific date, the back-end will set the date default which is “today” After considering
52 all possible time that the user would mean by saying the shortest possible phrase like
“wake me up at 5”, “set a plan 4am”, then comparing to the current time, timeS extracted from an agent will now be processed in the back-end
Final timeS and timeE after being processed in the back end will be used to upload the event onto Google Calendar
For example: given the time is 3am of July 14 th (as shown in Figure 3.1.4)
Information if chatbot is given everything:
Figure 3.1.4.a Information extracted for example 1
Figure 3.1.4.b Chatbot interaction for example 1
Figure 3.1.4.c Google Calendar for example 1
Another example: (not given am or pm, the time is not yet to come) (as shown in Figure 3.1.5)
Figure 3.1.5.a Information extracted for example 2
Here, Date and TimeE is not given by the user The default is timeS plus 59 minutes and the date is today
Figure 3.1.5.b Chatbot interaction for example 2
Figure 3.1.5.c Google Calendar for example 2
When a user inputs a duplicate event, the chatbot will display the day's schedule, allowing the user to either delete the previous event or reschedule the new one for a different time.
Google Calendar and information extracted remain the same (as shown in Figure 3.1.6)
Figure 3.1.6 Interaction for duplicated event 3.1.1.3.2 Remove events:
To remove events, users must specify the exact start date and time (timeS) for the search period If the end time (timeE) is not provided, it defaults to 23:59:59 on the same day, resulting in the deletion of all events for that day starting from timeS If timeS is not given, the chatbot will prompt the user to provide it Users must confirm their request to delete plans For instance, if a user has created multiple plans for July 14th and requests, "Delete plan today," the system will process this request accordingly.
Figure 3.1.7.a Google Calendar before removing
Figure 3.1.7.b Google Calendar after removing Google Calendar: All plans listed on
July 14 th from 9:30 am is removed
You can also show what you have for the day by giving the chatbot the day you want to see (as shown in Figure 3.1.8)
Figure 3.1.8 Comparing chatbot and Google Calendar
TEST RESULTS
In this part, we use the method based on the test method in [23] to evaluate our NLU system
Table 3.1: Test results (Hybrid training)
Intent confidence scores Make a plan at
Go to school from 7am to 4pm T T T T 0.93
Notify me about the meeting at
I wanna go shoping at 7pm T T T T 0.83
I will get a gift from my friend next Monday
I need you to m ake a plan for m e, to write 4000 words in 3 days
July 20th, 7am go to school F N/A N/A N/A 1
The results taken from user message is changed following the current time which has been included in the test result chart
Table 3.2: Test result (ML only)
Go to school from 7am to 4pm T T T T 0.97
Notify me about the meeting at
I wanna go shoping at 7pm T T T T 0.93
I will get a gift from my friend next Monday
I need you to m ake a plan for m e, to write 4000 words in 3 days
July 20th, 7am go to school F N/A N/A N/A 1
Note: The results taken from user message is changed following the current time which has been included in the test result chart
The evaluation of two training methods reveals that while the Machine Learning (ML) approach shows slightly higher intent confidence scores, it struggles with recognizing unseen expressions In contrast, the Hybrid method generally yields better results, particularly due to its built-in rules that enhance the app's ability to understand natural language Although there are occasional errors in date and time recognition, these issues can be addressed by collecting more data from users.
Google Assistant is a helpful service designed to assist Android users with simple tasks on their devices, including setting alarms The following chart compares the outcomes of using the same phrase with both our chatbot and Google Assistant.
Table 3.3: Comparing to google assistant:
User message App Google Assistant
Go to school from 7am to 4pm T F
Notify me about the meeting at
I wanna go shoping at 7pm T F
July 20th, 7am go to school F F
According to Table 3.3, Google Assistant struggles to recognize expressions as effectively as our agent, needing specific keywords like "alarm" and a designated time to function properly Typically, natural language queries lead users to Google's search service It's important to note that our evaluation method does not imply that our model surpasses Google's; rather, our agent's enhanced performance stems from its focused approach on a specific domain Ultimately, Google Assistant remains more advanced due to its capability to operate across multiple domains.
DISCUSSION
NLU SYSTEM’S TRAINING LIMITATION
Although NLU Dialogflow’s pre-built model is much powerful, it turns out that there have been issues that it hasn’t solved well, especially when it deals with a specifical task
Table 3.1 illustrates that the NLU model exhibits vulnerabilities when expressions lack the word "at." For instance, although the model was trained with the phrase "Create a plan tomorrow 4am," it misinterpreted a similar phrase like "Create a plan tomorrow 10pm," assigning the wrong intent In contrast, when we modified the phrase to "Create a plan tomorrow 10am," replacing "pm" with "am," the agent performed accurately, as demonstrated in Figure 4.1.1.
Figure 4.1.1 Wrong recognition with phrases without “at”
The results were evaluated using a Hybrid training method with a threshold of 0.5 Despite the presence of grammatical errors and typos, humans can still comprehend the content, and in some instances, agents can as well The exact reasons for this phenomenon remain unclear; however, we can address the issue by incorporating more relevant data into our training.
FUTURE WORKS
There are many potential features that we want to implement in our agent to improve its performance in planning task Below here is the list of those features:
Our project utilizes Google Calendar, which is compatible across multiple platforms and environments Currently, users can only access our chatbot feature for planning on their laptops Expanding our agent's availability to mobile devices and the web would provide users with more options and enable us to gather additional data to enhance our program.
User behavior varies, making it challenging to establish a fixed estimated time for plans By leveraging additional data features, such as GPS location, our agent can provide more accurate time estimates for specific tasks For instance, if User A plans to attend two conferences at different locations, the current agent defaults to a one-hour gap between events unless otherwise specified However, with an enhanced time estimator, the agent can recommend an optimal arrival time, ensuring User A stays on schedule for all engagements.
Building a model in Dialogflow presents certain limitations, including a lack of transparency regarding the model's architecture and the inability to fine-tune hyperparameters While creating a model from scratch offers more control, it often requires significant resources, including time and energy Additionally, data limitations pose a challenge that should not be overlooked.