5 Figure 3: Description of the last 5 lines of the data table .... 7 Figure 5 Unique data description for each data column .... 11 Figure 8: Description of the column of redundant data t
IDENTIFY PROBLEM
OVERVIEW
Our team leverages social media data analysis to enhance sales by examining user activity on platforms such as Facebook, Twitter, Instagram, and LinkedIn We identify peak engagement times and popular platforms to provide businesses with effective strategies for reaching their target audience This approach enables organizations to improve their sales and marketing efforts efficiently and successfully.
PROBLEM STATEMENT
Vietnam's economy has experienced significant and steady growth in recent years, driven by government support for small and medium-sized enterprises This increase in businesses fosters innovation and encourages entrepreneurs to continuously improve their operations A key advancement in modern business is the integration of database-driven information technology into corporate systems, which helps administrators manage and analyze data for informed decision-making Decision support systems have proven advantageous for companies, especially as rapid industry growth necessitates quick identification and forecasting of market trends To assist firms in their development, our team actively monitors social media traffic and provides tailored business advice.
Businesses may gain a lot from analyzing social media data to maximize sales, including improved customer understanding, higher engagement, performance measurement, and lower marketing expenses.
OBJECTIVES OF THE STUDY METHODOLOGY
• Identify trends and patterns: Analyze user-generated material to determine popular themes, hashtags, and interaction patterns across many social media sites
• To understand user behavior: Examine how people interact with material, including their preferences, feelings, and engagement metrics like as likes, shares, and comments
• To measure impact and influence: Determine the effect of social media initiatives, influencers, or viral material on brand perception, customer behavior, and audience attitude
• Explore Audience Segmentation: To better understand your social media audience's requirements and preferences, segment them by demographics, interests, and habits
• Identify Opportunities and Challenges: Identify new opportunities or risks in the social media ecosystem, such as algorithm updates, regulatory difficulties, or changes in customer behavior
• To Inform Decision Making: Provide stakeholders with information and suggestions so they can make educated decisions about content strategy, marketing campaigns, customer service, and product development
• To enhance customer engagement: Create methods for increasing consumer engagement and loyalty via tailored content, timely communication, and community management on social media platforms
• To support research and innovation: Contribute to scholarly research and innovation in social media analysis methodology, tools, and best practices, helping to develop the discipline and handle emergent difficulties.
REPORT THE STRUCTURE
• Provide an overview of the study's objectives and rationale
• Introduce the importance of social media analysis in understanding user behavior, brand perception, and market trends
• Outline the structure of the report
• Provide an overview of the social media platforms and datasets used in the study
• Explain the criteria for selecting data sources and the timeframe of data collection
• Present the results of the social media analysis based on the research objectives
• Analyze trends, patterns, and insights derived from the data
• Use visualizations such as charts, graphs, and heatmaps to illustrate key findings
• Interpret the findings in the context of existing literature and theoretical frameworks 1.4.4 RECOMMENDATIONS
• Provide actionable recommendations based on the study findings
• Suggest strategies for optimizing social media engagement, improving brand reputation, or addressing identified challenges
• Prioritize recommendations based on their potential impact and feasibility of implementation 1.4.5 CONCLUSION
• Summarize the key findings of the study and their implications
• Reflect on the contribution of the study to the field of social media analysis
• Highlight avenues for future research and areas for further exploration
• Provide a list of references cited throughout the report, following the appropriate citation style.
DATA DESCRIPTION
IMPORT PYTHON LIBRARIES
The first step related to ML using Python is to understand and test our data using libraries
Import all the libraries needed for our analysis, such as Data Loading, statistical analysis, Visualization, Data Transformation, Merge, and Join, etc.
READING DATASET
The Pandas library provides many possibilities for loading data into a Pandas DataFrame from files such as JSON, csv, xlsx, sql, pickle, html, txt, images, etc
CSV files are a popular and accessible format for data storage, often presented in table format Utilizing the read_csv() function allows for the seamless conversion of this data into a pandas DataFrame.
This article explores social media user data to analyze communication network behaviors and identify the factors influencing business performance The analysis is conducted using exploratory data analysis (EDA) techniques, with the dataset stored in a DataFrame for effective evaluation.
The data.head() method in pandas is used to showcase the initial rows of a DataFrame or Series, with a default display of five rows Users can customize the number of rows shown by providing a specific parameter to the method.
• Preview data: data.head() helps preview a part of the data without displaying the entire DataFrame or Series, saving time and increasing productivity
• Check the data structure: By displaying the first few rows, you can check the structure of the data, including the columns and their data types
• Check input data: You can use data.head() to check data after reading it from some source, such as from a CSV file or database
• Inspect data after processing: If you have performed data transformations, you can use data.head() to examine the results of those transformations
The data.tail() method in pandas is used to show the last few rows of a DataFrame or Series By default, it displays 5 rows, but users can customize this by specifying the desired number of rows as a parameter.
• Preview data: data.tail() helps preview the last part of the data without displaying the entire DataFrame or Series, saving time and increasing productivity
• Check the data structure: By displaying the last few rows, you can check the structure of the data, including the columns and their data types
• Check input data: You can use data.tail() to check data after reading it from some source, such as from a CSV file or database
• Inspect data after processing: If you have performed data transformations, you can use data.tail() to examine the results of those transformations
Figure 3: Description of the last 5 lines of the data table
The data.info() method in the Python pandas library provides essential descriptive information about a DataFrame, such as the number of rows and columns, the data types of each column, and the total count of non-missing values across all columns This functionality is crucial for understanding the structure and completeness of your dataset.
• Total number of rows (entries) and columns (columns)
• The name of each column and the data type of each column
• Total number of non-null values in each column
Figure 4: Description of data columns
DATA CLEANING
WHAT IS DATA CLEANING?
Data cleaning in analytical data transfer involves processing and normalizing data to prepare it for analysis The primary objective of this process is to eliminate or rectify inaccurate and unreliable values, ensuring that the data used for analysis is both accurate and trustworthy.
Below are some of the methods and operations commonly performed when cleaning data during analytics streaming:
• Eliminate data loops: Check and eliminate data records that have identical values in rows or columns of data
Effectively managing missing data is crucial for accurate analysis Common techniques for addressing this issue include filling in missing values with statistical measures such as the mean, median, or mode Additionally, predictive modeling can be employed to estimate and replace missing values based on existing data patterns.
To prepare your data effectively, it is crucial to ensure that all values in the columns are presented in consistent units or on the same scale This may involve converting measurements, such as changing Fahrenheit to Celsius, or standardizing data formats, such as converting strings to either lower or upper case.
• Noise Removal: Identifies and removes value noise or imprecision in data, such as value margins or outliers that may be the result of recording errors or inaccurate recording
• Error checking and error correction: Check data to detect invalid or unprocessable values and correct errors if any
• Reformat data: Ensure that column data is properly formatted as numbers, strings, dates, or other data types appropriate to their content
• Identify minimum properties: Check minimum properties between columns of data or between data records, to ensure that data is recorded or collected correctly and without integration
• Data cleansing is an important part of the analytics data pipeline, ensuring that analytics results are accurate and meaningful.
CHECK FOR DUPLICATION
The data.nunique() method is essential in data analysis as it counts the unique values in each column of a DataFrame This method returns a Series that displays the number of unique values per column, indexed by the column names An analysis of the resulting social media data highlights the distinct values present in the dataset.
Figure 5: Unique data description for each data column
Missing Values Calculation involves assessing the extent of missing data within a dataset, a common occurrence in real-world data analysis Factors contributing to missing values include errors during data collection, issues in data conversion, or unavailability of data for certain observations To identify these missing values, one can utilize the command data.isnull().sum().
Figure 6: Depicting no missing dataAfter checking the above data, there is no blank data.
DATA REDUCTION
Some columns or variables can be dropped if they do not add value to our analysis
The data table indicates that the columns "Unnamed: 0.1" and "Unnamed: 0" contain identical information To streamline our analysis of social media data, we will remove one of these duplicate columns without compromising the integrity of our findings.
Figure 7: Description of columns 0 and 1 having duplicate data
The code line data.drop('Unnamed: 0.1', axis=1, inplace=True) is used to remove the column named 'Unnamed: 0.1' from the DataFrame data This line does the following:
Figure 8: Description of the column of redundant data that
FEATURE ENGINEERING
Feature engineering involves leveraging domain expertise to identify and modify the most pertinent variables from raw data, essential for developing predictive models through machine learning or statistical analysis The primary objective of feature engineering is to generate valuable insights from unprocessed data.
CREATING FEATURES
"Creating features" involves generating new variables from original data to enhance model performance and gain deeper insights In the realms of data analytics and machine learning, the selection and generation of relevant features play a crucial and innovative role in the modeling process.
Figure 10 Description of datetime data table :
Develop a "datetime" feature to facilitate the retrieval and analysis of social media user data, enhancing data-driven decision-making for sales optimization.
EDA EXPLORATORY DATA ANALYSIS
STATISTICS SUMMARY
Summary statistics in social media analysis are crucial for synthesizing and presenting key numerical data from the analysis process This section aims to provide an overview of data collected from social media platforms, including metrics such as the number of posts, views, interactions, and participants It offers detailed insights into user behavior, including time spent on platforms, geographical location, and popular social networks By analyzing this statistical data, important trends and patterns can be identified, such as user engagement over time, posting frequency, share rates, and community interests.
A statistical summary offers insights into the distribution of data, indicating whether it is normally distributed, skewed left or right, or contains outliers In Python, this can be achieved using the describe() function, which provides a comprehensive statistical overview of numeric data types, including floats and integers.
Figure 11: Describe data without non-numeric columns
User engagement on social media platforms demonstrated significant variability in counts, likes, and retweets, indicated by a high standard deviation compared to the mean Additionally, time variables including Year, Month, Day, and Hour were recorded, revealing patterns in user activity over time.
Figure 12: Describe data with all type of columns
The statistical analysis reveals a diverse user base across four social media platforms, with a total of 685 users Instagram emerged as the leading platform, garnering 258 mentions, while the hashtag '#Compassionate #TearsOfEmpathy' was the most utilized, appearing three times Engagement levels varied, with an average of 42 likes and a maximum of 80 Geographically, the majority of mentions originated from the United States, totaling 59 Notably, the emotion 'Positive' was the most frequently recorded, appearing 44 times, highlighting the uplifting nature of social media interactions.
EDA UNIVARIABLE ANALYSIS
The bar chart compares the number of posts across three social media platforms, revealing that Instagram has the highest volume, followed by Facebook and Twitter This difference may indicate the varying popularity and strategic focus of brands in engaging their audiences on these platforms The data highlights the effectiveness of content communication on social media and identifies which platforms serve as the most influential marketing channels for customer outreach strategies.
The graph illustrates the distribution of "Likes" on social media posts, revealing multiple peaks at various intervals, indicating distinct groups of posts with concentrated "Likes" ranging from low to high Notably, there are significant peaks around 20 to 30 "Likes" and another at approximately 40 "Likes." Beyond this point, the number of "Likes" gradually declines, with few posts exceeding 70 "Likes." This chart offers valuable insights into user engagement on social media, aiding in the analysis of content strategies and social media marketing techniques.
The chart illustrates the total "Likes" accumulated by posts from various countries, highlighting the data categorized by nation Notably, the United States leads with the highest count of "Likes," showcasing its dominance in social media engagement.
The chart indicates that the United States leads in social media engagement, followed by the United Kingdom, Canada, Australia, and India Although there seems to be a duplication of country names, the data highlights the popularity of social media content across these nations This insight is crucial for businesses aiming to refine their marketing strategies, allowing them to target regions with higher interaction potential to enhance the effectiveness of their advertising and content campaigns.
Figure 16: Chart of 2 variables “Top country likes”
The bar graph illustrates the frequency of posts linked to the top 10 hashtags, revealing an uneven distribution in their usage This data highlights current social media trends, enabling marketers to identify popular topics and public interests Hashtags like '#Compassionate' and '#TearsOfEmpathy' are notably prevalent, indicating a strong user interest in emotionally resonant and sympathetic content.
DATA TRANSFORMATION
Following the Univariate Analysis during the Exploratory Data Analysis (EDA), we observed inconsistencies in the data visualizations, with numerous overlapping variables Consequently, we will implement Data Transformation to align the data characteristics with our analytical and modeling objectives This process includes the separation of numerical variables from categorical classifications.
To clean the data in the 'Text' column of the DataFrame, apply the strip() method, which effectively removes unnecessary spaces from both ends of each string This process eliminates unwanted characters such as extra spaces, tabs, and carriage returns, resulting in cleaner and more accurate data.
EDA BIVARIATE ANALYSIS
Figure 19: Bivariate graph between Hour and Like
The scatter graph illustrates the correlation between the number of "Likes" on social media posts and the time of day they are published While the data shows considerable variation in "Likes" at different hours, the trendline indicates a slight upward trend, suggesting that posts made during specific times may garner more engagement However, the significant dispersion of data points highlights that various other factors can influence the number of "Likes," making it challenging to draw definitive conclusions from this chart alone.
Figure 20 Schedule 2 variables "like" and "platform" :
The chart illustrates the distribution of "Likes" across social media platforms, featuring a blue histogram complemented by a red estimated density (KDE) line The histogram's blue columns represent the number of posts that garner a specific number of "Likes," while the KDE line offers a smooth representation of the overall distribution trend Notably, the graph reveals a concentration of posts with a low number of "Likes," contrasted by a few posts that achieve a high count, as indicated by the histogram's peak and the KDE line This pattern highlights the disparities in popularity and user engagement among various posts.
EDA MULTIVARIATE ANALYSIS
Figure 21 Multivariate chart "platform" Instagram, Facebook, Twitter :
Figure 21 highlights the complex relationships between engagement metrics like 'Likes' and 'Retweets' across Twitter, Instagram, and Facebook, revealing right-skewed distributions that suggest most posts receive limited engagement, while a small number achieve significantly higher interaction, especially on Instagram The strong positive correlation between 'Likes' and 'Retweets' across all platforms indicates that captivating posts tend to be shared more, showcasing the interdependent nature of these metrics Instagram stands out as a leading platform for 'Likes', reflecting deeper user engagement compared to Twitter and Facebook However, the analysis of posting times does not reveal a clear correlation with engagement, suggesting a more intricate landscape of social media interactions that warrants further exploration These findings are crucial for developing tailored content strategies that leverage Instagram's engagement potential, underscoring the importance of content optimization for enhancing social media presence.
Figure 22: Multivariate chart “Percentages of Platforms”
A pie chart in Figure 22 illustrates the distribution of social media platform usage among users, revealing that Instagram leads with 35.2%, followed by Twitter at 33.2% and Facebook at 31.6% This nearly balanced distribution indicates a diverse social media landscape, with user attention spread relatively evenly across these platforms.
This analysis highlights the multivariate comparison of various platforms, showcasing their market shares through visual representation Although Instagram holds a slight edge in this dataset, all three platforms demonstrate a substantial presence among users, as indicated by their closely aligned percentages.
The close percentage points highlight the potential for shifts in user behavior or platform features to impact the leading position among social media platforms This underscores the competitive landscape of social media and emphasizes the necessity for businesses to cultivate a flexible and adaptive online presence.
To enhance social media impact, businesses should analyze the distribution of market share among platforms, suggesting a balanced strategy Given Instagram's slight advantage, it can be particularly effective for engagement-focused campaigns, thanks to its visual appeal and established engagement trends.
Figure 22 offers a comprehensive overview of platform usage, emphasizing the need for a flexible and well-informed social media marketing strategy due to the balanced user engagement across various platforms.
RECOMMENDATIONS
According to the research results, Instagram is the most popular social media network
As a result, we have the following advice for businesses:
To enhance engagement on Instagram, it's crucial to optimize your posting schedule by analyzing when users are most active By sharing compelling content during these peak times, you can significantly increase the likelihood of user interaction and attention.
To enhance engagement, focus on curating content that evokes emotions and fosters empathy Utilize storytelling, images, and videos that create an emotional bond with your audience, as this can significantly increase user interaction and build a positive community around your brand.
To enhance your marketing strategy, prioritize Instagram as it is recognized as the most engaging platform Invest in creating original content and improving account management while actively engaging with your community to maximize your marketing efforts on this social media channel.
Optimize your content and marketing strategies by leveraging geographic significance data to enhance engagement in key regions Focus on creating localized content and targeted advertising to increase brand visibility and connect with audiences effectively in those areas.
Regularly monitor and evaluate the effectiveness of your social media marketing strategies to identify necessary improvements Leverage community feedback and data analytics to refine your approach continuously Tailor these recommendations to suit the unique needs of your business and market environment for optimal results.
CONCLUSIONS
SUMMARIZE THE PROJECT
Our analysis reveals that leveraging social media analytics is essential for enhancing sales and marketing strategies Key insights highlight Instagram as the most engaging platform, surpassing Facebook and Twitter, which underscores its importance in digital marketing Additionally, the timing of posts is crucial for maximizing user engagement, with specific time frames identified that significantly boost visibility and interaction.
Geographical targeting reveals that the United States is a key area for high engagement, indicating that marketers should concentrate their strategies here for greater effectiveness Furthermore, content that evokes empathy and compassion significantly boosts user engagement, highlighting the importance of incorporating emotional resonance into content strategies.
LIMITATIONS
The study's findings are accompanied by significant limitations that should be acknowledged The analysis, which concentrated on a limited number of social media platforms, may not fully represent the entire spectrum of digital engagement within the wider social media landscape This highlights the necessity for future research to encompass a broader range of platforms to achieve a more comprehensive understanding.
The ever-changing nature of social media trends presents a significant limitation, as insights gained often reflect a fleeting moment and may quickly become outdated due to evolving user behaviors and platform algorithms Additionally, the focus on quantitative data analysis in the study overlooks essential qualitative factors, such as content quality, sentiment, and the motivations behind user interactions These qualitative elements are vital for a comprehensive understanding of social media engagement and should be included in future research.
The study's findings may not be universally applicable across various industries and business models due to their distinct characteristics and target audiences This underscores the necessity for customized analyses to effectively adapt the insights to specific contexts.
Future research should broaden its analysis by incorporating a diverse array of social media platforms and data types Additionally, utilizing real-time data and qualitative analysis will strengthen the findings, ensuring their relevance in the ever-evolving landscape of social media.