Pedro f handbook of research on big data clustering and machine learning 2020

Chithambaramani /TJS Engineering College, India

BigDataandClusteringTechniques

Bhanu Chander, Pondicherry University, India

This chapter outlines the clustering and Bayesian schemes utilized in data mining and machine learning, emphasizing the importance of grouping standardized data for better understanding and learning Clusters are formed by entities that share similarities, while those from different clusters exhibit distinct differences Although representing data in fewer clusters may result in the loss of some important details, it provides greater simplification Clustering is often employed when class labels are unknown, as it does not require a training phase On the other hand, Bayesian networks are highly regarded classification methods, utilizing a graphical probabilistic model composed of interconnected nodes that represent variables and their causal relationships Belief networks effectively model knowledge by conveying probabilistic information across various assumptions.

BigDataAnalyticsandModels

Analysis of Gravitation-Based Optimization Algorithms for Clustering and Classification 74

Sajad Ahmad Rather, Pondicherry University, India

P Shanthi Bala, Pondicherry University, India

In recent years, heuristic algorithms inspired by natural phenomena and swarm behaviors have emerged as effective solutions for numerous optimization problems, outperforming traditional methods Among these, the gravitational search algorithm (GSA), grounded in Newton’s law of universal gravitation and motion, has gained significant traction within the research community and has found applications across diverse fields such as electrical science, power systems, computer science, and engineering This chapter highlights the significance of GSA, its hybridization with other optimization techniques to address challenges like the curse of dimensionality and local optima, and its role in enhancing clustering and classification tasks, particularly in pattern recognition, feature extraction, and improving classification accuracy.

Analytics and Technology for Practical Forecasting 100

William Fox, College of William and Mary, USA

Forecasting plays a crucial role in business, leading to the development of various automated methods and tools However, this advancement has resulted in forecasting becoming increasingly opaque, resembling a black box function This chapter aims to demystify forecasting across different application contexts by providing a systematic approach with step-by-step examples It presents several methods and compares them to traditional forecasting techniques, such as regression and time series, using various software technologies Additionally, the chapter includes three case studies: simple supply forecasting, homicide forecasting, and demand forecasting for Walmart sales.

Chapter 7 Modern Statistical Modeling in Machine Learning and Big Data Analytics: Statistical Models for Continuous and Categorical Variables 135

Niloofar Ramezani from George Mason University emphasizes the growing significance of machine learning, big data, and high-dimensional data, often referred to as the future of technology To enhance machine learning, it is essential to utilize established statistical models that have proven their effectiveness over time This chapter explores various algorithms applicable to machine learning, tailored to the nature of the variables involved, and presents suitable statistical techniques for their modeling.

Chapter 8 Enhanced Logistic Regression (ELR) Model for Big Data 152

Rathipriya R from Periyar University, India, discusses the significance of regression models in data analysis The proposed model consists of three phases: the first phase focuses on sampling techniques to obtain the optimal sample for regression model construction The second phase involves predicting the residuals of the logistic regression (LR) model through autoregressive time series analysis The final phase aims to create an enhanced logistic regression (ELR) model by integrating the LR model with the residual prediction (RP) model An empirical study conducted on a large diabetic dataset demonstrates that the ELR model outperforms traditional logistic regression in terms of accuracy.

Chapter 9 On Foundations of Estimation for Nonparametric Regression With Continuous Optimization 177

Pakize Taylan, Dicle University, Turkey

Parametric regression models, such as linear and nonlinear regression, aim to establish a reasonable relationship between response and independent variables based on linearity and predetermined nonlinearity assumptions In contrast, nonparametric regression techniques relax these assumptions, allowing for greater flexibility without needing a predefined functional form, making them suitable for handling higher-dimensional problems and larger sample sizes This article discusses the construction of PRSS problems for MARS, ADMs, and CR, and the application of CQP, a well-known convex optimization method, to obtain solutions, which will be referred to as CMARS, CADMs, and CKR, respectively.

TechnologiesforHandlingBigData

An Overview of Methodologies and Challenges in Sentiment Analysis on Social Networks 204

Aditya Suresh Salunkhe, Ramrao Adik Institute of Technolgy, India

Pallavi Vijay Chavan, Ramrao Adik Institute of Technolgy, India

The rapid growth of social media over the past decade has made it essential to analyze public attitudes and opinions toward various entities With the advent of Web 2.0, platforms such as blogs, forums, and microblogs have emerged, allowing users to share their thoughts through messages, comments, and status updates Sentiment analysis streamlines this process by assessing the emotional tone of text, eliminating the need for manual review of every user comment This analysis has numerous applications, including targeted advertising, product recommendations, quality assessments, and real-time data insights, as well as evaluating public sentiment during election campaigns.

ClusteringandBayesianNetworks

Evaluation of Optimum and Coherent Economic-Capital Portfolios Under Complex Market

Mazin A M Al Janabi, EGADE Business School, Tecnologico de Monterrey, Mexico

This chapter explores the effectiveness of liquidity-adjusted risk modeling in achieving optimal economic-capital structures while adhering to significant operational and financial constraints set by portfolio managers It introduces a robust method for economic-capital allocation within a liquidity-adjusted value at risk (L-VaR) framework, enhancing previous models by explicitly considering the liquidation of trading portfolios over time Utilizing a scaled multiple-assets L-VaR matrix and the GARCH-M technique for forecasting conditional volatility and expected returns, the authors present a dynamic nonlinear portfolio selection model This model, combined with an optimization algorithm, aims to minimize the L-VaR objective function while allocating economic capital and trading assets Empirical findings underscore the critical role of incorporating financially and operationally meaningful nonlinear and dynamic constraints into the L-VaR optimization process.

AnalysisofGravitation-BasedOptimizationAlgorithmsforClusteringandClassification

Data-Driven Stochastic Optimization for Transportation Road Network Design Under Uncertainty 231

Suh-Wen Chiou, National Dong Hwa University, Taiwan

This chapter presents a data-driven stochastic program for bi-level network design focused on hazardous material (hazmat) transportation, aiming to regulate associated risks and minimize total travel costs under uncertainty A multi-objective stochastic optimization model is introduced to calculate generalized travel costs for hazmat carriers To address the non-convex nature of the bi-level program, a data-driven bundle method is employed to stabilize solutions and reduce iteration gaps Numerical comparisons with existing risk-averse models reveal that the proposed model is more resilient in minimizing total travel costs and mitigating risk exposure Additionally, this chapter empirically examines the trade-offs between maximum risk exposure, generalized travel costs, and equitable risk distribution across network links.

AnalyticsandTechnologyforPracticalForecasting

Examining Visitors’ Characteristics and Behaviors in Tourist Destinations Through Mobile Phone Users’ Location Data 279

Masahide Yamamoto, Nagoya Gakuin University, Japan

This chapter explores the use of Mobile Kukan Toukei™ (mobile spatial statistics) to analyze location data from mobile phone users, enabling the counting of visitors at various tourist destinations and the examination of their demographics By leveraging operational data from mobile networks, this statistical service provides insights into regional population structures based on gender, age, and residence The data collection process ensures anonymity through non-identification, aggregation, and concealment, making it impossible to pinpoint individual users The chapter aims to quantify visitor numbers over different time periods and characterize them using the collected mobile data, while also presenting an alternative method for more accurately estimating visitor counts in specific locations.

ModernStatisticalModelinginMachineLearningandBigDataAnalytics:StatisticalModelsfor ContinuousandCategoricalVariables

Machine Learning for Smart Tourism and Retail 311

Carlos Rodríguez-Pardo, Grupo de Inteligencia Artificial Aplicada Universidad Carlos III de Madrid, Spain

Miguel A Patricio, Universidad Carlos III de Madrid, Spain

Antonio Berlanga, Grupo de Inteligencia Artificial Aplicada Universidad Carlos III de

José M Molina, Grupo de Inteligencia Artificial Aplicada Universidad Carlos III de

The rapid expansion of data storage capabilities alongside advancements in machine learning algorithms has revolutionized our understanding of customer behavior and preferences This chapter examines the potential of various cutting-edge machine learning techniques in retail and smart tourism, two sectors that share similarities in contextual dependence and data utilization for customer insights The authors delve into the application of supervised, unsupervised, and recommender systems to effectively profile, segment, and enhance customer value.

OnFoundationsofEstimationforNonparametricRegressionWithContinuousOptimization

Predictive Analysis of Robotic Manipulators Through Inertial Sensors and Pattern Recognition 334

Jorge Alonso Moro, Universidad Europea de Madrid, Spain

Carlos Quiterio Gúmez Muủoz, Universidad Europea de Madrid, Spain

Fausto Pedro García Márquez, University of Castilla-La Mancha, Spain

Industrial robotics is rapidly advancing, with an estimated installation of 2 million new robots in 2020 While predictive maintenance for these robots is gaining traction, its full potential remains untapped This study explores the role of inertial sensors in industrial robotics, particularly in accurately measuring the "real" rotation angle of robotic arms and assessing the backlash in their gearboxes By analyzing data from these sensors using wavelet transform techniques, the mechanical state of the robotic system can be effectively evaluated This non-destructive, dynamic sensing method enables remote measurements under operational conditions, enhancing predictive maintenance capabilities across various applications.

AnOverviewofMethodologiesandChallengesinSentimentAnalysisonSocialNetworks

Call Masking: A Worrisome Trend in Nigeria’s Telecommunications Industry 345

Benjamin Enahoro Assay, Delta State Polytechnic, Ogwashi-Uku, Nigeria

Call masking and related infractions have become a significant issue in Nigeria, posing serious security threats and resulting in substantial revenue losses for the government and telecom companies In response, the Nigerian Communications Commission (NCC) suspended six interconnect exchange licenses and barred 750,000 lines from 13 operators suspected of involvement in these illegal activities in February 2018 Despite these efforts, the problem persists, highlighting the urgent need for effective solutions to address these infractions and protect the telecom industry from potential collapse.

EvaluationofOptimumandCoherentEconomic-CapitalPortfoliosUnderComplexMarket Prospects

An Optimized Three-Dimensional Clustering for Microarray Data 366

This chapter introduces a discrete firefly optimization algorithm (FA) designed for microarray data analysis, leveraging the bio-inspired flashing behavior of fireflies The algorithm enhances global communication among fireflies, making it particularly effective for solving triclustering problems This research presents a novel FA-based triclustering model aimed at identifying highly correlated triclusters within microarray datasets The proposed model is both reliable and robust, benefiting from the efficient global communication among the swarming particles, or fireflies, which enhances its optimization capabilities.

Identifying Patterns in Fresh Produce Purchases: The Application of Machine Learning

Timofei Bogomolov, University of South Australia, Australia

Malgorzata W Korolkiewicz, University of South Australia, Australia

In this chapter, Svetlana Bogomolova from the Ehrenberg-Bass Institute at the University of South Australia explores consumer food choices through machine learning techniques, focusing on purchasing patterns of fresh fruit and vegetables, which are vital for both health and economic reasons due to their high profit margins for supermarkets Utilizing the Nielsen Consumer Panel Dataset, the study employs various machine learning methods, including hierarchical clustering and artificial neural networks A key finding reveals that households with children tend to purchase less fresh produce, highlighting significant public health implications Interestingly, the predictive modeling indicates that neural networks did not outperform linear regression models in forecasting spending on fresh produce, challenging initial expectations.

Chapter 19 Urban Spatial Data Computing: Integration of GIS and GPS Towards Location-Based Recommendations 409

Jayanthi Ganapathy from Anna University, India, highlights the significance of urban spatial data in assessing risks from natural disasters, aiding evacuation planning, and conducting risk mapping Global Positioning System (GPS) technology, which relies on satellite navigation, works in tandem with Geographical Information System (GIS) software to provide valuable insights across various fields, including agriculture, ecology, and urban studies By analyzing spatial and temporal relationships of real-time activities, it becomes possible to predict future trends and recommend places of interest This integration of GPS mapping and data analytics through GIS not only enhances personalized experiences but also fosters substantial commercial and business growth.

The book "Big Data and Management Science" integrates analytical principles with business practices, focusing on the intersection of engineering and technology with management skills in organization, administration, and planning It serves as a valuable complement to various sub-disciplines, including economics, finance, marketing, and decision and risk analysis.

Advances in Analytics in Big Data integrates analytic principles with Big Data, creating a bridge between engineering, economics, and the management skills essential for organizational and administrative planning This book also complements various fields, including finance, marketing, and decision-making, with each chapter exploring distinct topics related to advancements in business analytics.

This book aims to enhance professionals' understanding of the strategic role of trust within the information and knowledge society It will explore trust at various levels, including the global economy, networks and organizations, teams and work groups, information systems, and individual actors in networked environments By integrating relevant theoretical frameworks and the latest empirical research findings, this resource will serve as a comprehensive guide for those seeking to deepen their insights into trust dynamics.

This book is designed for engineers, economists, and researchers eager to enhance their management skills or integrate management principles into their work The authors present their innovative research and offer case studies that effectively demonstrate the application of management strategies in real-world scenarios utilizing Big Data.

The concept of big data has emerged as a pivotal force for innovation in academia, government, and corporate sectors, encompassing vast amounts of sensor data, raw and semi-structured IT log data, and the ever-growing volume of social media data The sheer scale of big data necessitates substantial storage solutions, complicating analytical, processing, and retrieval operations To tackle these challenges, clustering big data into a more compact format is essential Chapter 1 delves into the background of big data and clustering, exploring its diverse applications, related research efforts, challenges, and future directions.

Uncertainty refers to a situation where multiple outcomes are possible in decision-making, yet their probabilities remain unknown When viewed as a surprise, it allows individuals to reshape their future perspectives In today's world, risks and uncertainties span across various domains, including market, financial, operational, social, environmental, institutional, and humanitarian aspects Life is characterized by randomness and volatility, influencing both historical events and personal experiences.

The preface highlights that uncertainty often leads to financial instability for those affected by various risks Chapter 2 delves into big data analytics as a powerful method for analyzing vast datasets to extract valuable insights It begins by discussing traditional techniques such as financial econometrics and optimization models that were previously employed Subsequently, it examines how big data analytics has transformed analytical approaches, concluding with examples that advocate for the adoption of big data analytics.

Big Data is essential for modern companies, serving as a foundation for strategic intelligence and competitive growth across startups and enterprises Its significance lies not in the volume of data, but in its effective utilization The impact of big data extends to scientific research, exemplified by the Sloan Digital Sky Survey's pivotal role for astronomers Furthermore, big data has the potential to transform research and education Chapter 3 will explore the technologies that are vital for harnessing the power of big data.

Chapter 4 explores clustering and Bayesian schemes in data mining and machine learning, emphasizing the importance of grouping standardized data to enhance understanding and learning Clusters consist of similar entities, while differences between clusters highlight their uniqueness Although representing data with fewer clusters can lead to the loss of crucial information, it simplifies analysis Clustering typically does not require a training phase and is useful when classes are unknown The Bayesian network, a prominent classification method, employs a graphical probabilistic model with interconnected nodes representing variables and their causal relationships Belief networks effectively convey probabilistic information to model complex scenarios As social media's popularity surges, the resulting vast databases prompt enterprises to transition from costly proprietary machines to cloud-based analytical solutions This shift highlights the relevance of MapReduce, which offers enhanced scalability, fault tolerance, and flexibility for managing unstructured analytical data.

Chapter 5 highlights the significance of Gravitational Search Algorithm (GSA) and its hybridization in addressing clustering and classification challenges By integrating GSA with other optimization techniques, it effectively mitigates issues like the curse of dimensionality, local optima entrapment, and restricted search spaces that often hinder traditional clustering methods Additionally, GSA proves valuable in classification tasks, enhancing pattern recognition, feature extraction, and overall classification accuracy.

Forecasting plays a crucial role in business, leading to the development of various automated methods and tools However, this advancement has resulted in forecasting becoming increasingly opaque, resembling a black box function Chapter 6 aims to demystify forecasting by providing a systematic approach and practical examples across different contexts It explores several methods, comparing them to traditional techniques like regression and time series analysis using various software technologies The chapter includes three case studies: simple supply forecasting, homicide forecasting, and Walmart's sales demand forecasting As machine learning, big data, and high-dimensional data gain prominence, it is essential to employ established statistical models that have proven their efficiency over the years.

The preface highlights the significance of contributing to the advancement of machine learning, a rapidly evolving field It discusses various algorithms applicable to machine learning based on variable characteristics and presents suitable statistical techniques for their modeling, as detailed in Chapter 7.

Data-DrivenStochasticOptimizationforTransportationRoadNetworkDesignUnderUncertainty231 Suh-Wen Chiou, National Dong Hwa University, Taiwan

Big data has emerged as a pivotal force for innovation in academia, government, and corporate sectors, encompassing vast amounts of sensor data, IT industry log data, and the exponential growth of social media information The immense volume of big data necessitates significant storage solutions, complicating analytical, processing, and retrieval operations A potential solution to these challenges is clustering big data into a more compact format This chapter explores the fundamentals of big data and clustering, detailing various applications, related research efforts, challenges, and future directions in the field.

Big data is a dynamic research field that engages academia, industry, and government, driving the need for innovative learning methods to tackle current challenges Characterized by its complexity, scalability, and diversity, big data requires new approaches, designs, analytics, and algorithms to uncover hidden insights and derive value Big data analytics involves the processes of collecting, organizing, and analyzing extensive datasets with various data types to identify concealed patterns, customer preferences, market trends, and other valuable commercial insights.

Clustering is a crucial method for identifying structure within unlabeled data by organizing it into groups, or clusters, where members exhibit similar characteristics This process helps in uncovering patterns and relationships within the data, making it essential for data analysis and interpretation.

Big Data and clustering techniques are increasingly important across various fields, as they enable the identification of dissimilarities among data in different clusters (Jacob & Vijayakumar, 2018) Numerous clustering algorithms have been developed, each varying in complexity based on factors such as data dimensionality, clustering set volume, and application scope As an unsupervised learning technique, clustering proves valuable in diverse applications, including marketing studies, DNA analysis, text mining, and web document classification (Goel, 2014).

This chapter is structured to cover several key areas: Section 2 offers an overview of big data and clustering techniques, while Section 3 reviews relevant literature In Section 4, we address the challenges associated with big data and clustering Section 5 outlines future research directions, and finally, Section 6 wraps up the chapter with a conclusion.

Big data encompasses a range of techniques and technologies designed to integrate and analyze vast, complex datasets, revealing significant hidden insights These methods are essential for managing the diverse and massive scale of information, enabling organizations to unlock valuable data-driven opportunities.

Volume signifies the vast amount of diverse data produced from various sources, which continues to grow Collecting extensive data offers the advantage of uncovering hidden insights and patterns through effective data analysis.

Variety in data encompasses the diverse types of information gathered from sensors, smartphones, and social networks This includes various formats such as video, images, text, audio, and data logs, which can be organized in either structured or unstructured forms.

Velocity pertains to the rate at which data is transferred, constantly evolving due to the integration of complementary data sets, the inclusion of archived or legacy data, and the influx of streamed data from various sources (Berman, 2013).

4 Value refers to the process of discovering huge hidden values from large datasets with various types and rapid generation (Chen, 2014).

Importance of Big Data (Sasi Kiran 2015)

The government highlights the significance of big data in generating value across various disciplines and domains This value is derived from the capacity to analyze data effectively, transforming it into actionable insights.

2011) suggests five generic ways that big data can support value creation for organizations.

1 Creating transparency by making big data openly available for business and functional analysis.

2 Supporting experimental analysis in individual locations that can test decisions or approaches, such as specific market programs.

3 Assisting, based on customer information, in defining market segmentation at more narrow levels.

4 Supporting real-time analysis and decisions based on sophisticated analytics applied to data sets from customers and embedded sensors.

Big Data and Clustering Techniques

5 Facilitating computer-assisted innovation in products based on embedded product sensors indicat- ing customer responses.

Machine learning includes various topics such as regression analysis, feature selection, and classification, which can be categorized into three main approaches: supervised, semi-supervised, and unsupervised classification In supervised classification, known class labels define the training set, allowing an algorithm to derive classification criteria Semi-supervised classification utilizes both labeled and unlabeled data, making it a cost-effective solution when manual labeling is expensive Unsupervised classification, or clustering, identifies groups of similar objects without prior knowledge of class labels, effectively partitioning data into meaningful sub-classes Clustering is particularly advantageous for processing Big Data, as it helps to reveal inherent structures within large datasets.

The Various Hierarchical Clustering Algorithms (Sajana et al 2016)

• BIRCH - Balanced Iterative Reducing and Clustering using Hierarchies.: It is an agglomerative hierarchical algorithm which uses a Clustering Feature (CF-Tree) and incrementally adjusts the quality of sub clusters.

CURE (Clustering Using REpresentatives) employs a divisive hierarchical approach to clustering, selecting well-scattered points from each cluster It then gradually shrinks these points toward the cluster's center using a specified function This process continues as adjacent clusters are successively merged until the total number of clusters reaches the desired amount.

ROCK (Robust Clustering algorithm for Categorical attributes) is a hierarchical clustering algorithm designed specifically for categorical data It employs a linking strategy that merges links from the bottom up to create clusters, effectively grouping similar data points.

• Chameleon: It is an agglomerative hierarchical clustering algorithm of dynamic modeling which deals with two phase approach of clustering

• ECHIDNA: It is an agglomerative hierarchical approach for clustering the network traffic data

• SNN - Shared Nearest Neighbors: A hierarchy of top to bottom approach is used for grouping the objects.

• GRIDCLUST - GRID based hierarchical Clustering algorithm.: A clustering algorithm of hierarchical method based on grid structure.

STING (Statistical Information Grid), developed by Deshmukh and Ramteke in 2017, is a multi-resolution clustering technique that organizes the spatial area of input objects into rectangular grid cells Each cell captures essential statistical information, including mean, maximum, and minimum values of the attributes, which are stored as parameters to facilitate effective data analysis and clustering.

Big Data and Clustering Techniques

The Various Density-Based Clustering Algorithms (Lakshmi et al 2018)

DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a density-based clustering algorithm that relies on two key parameters: ɛ (Eps) and MinPts The ɛ parameter defines the neighborhood around a point, while MinPts indicates the minimum number of points required within that neighborhood to form a cluster Users must specify both parameters to effectively utilize the algorithm.

• Ordering Points to Identify Clustering Structure (OPTICS): OPTICS is an extension to DBSCAN

ExaminingVisitors’CharacteristicsandBehaviorsinTouristDestinationsThroughMobilePhone Users’LocationData

Uncertainty, often perceived as a surprise, influences individuals' perspectives on the future, highlighting the inherent risks and uncertainties in our modern world, including market, financial, operational, social, environmental, institutional, and humanitarian factors Life is characterized by randomness and volatility, with significant historical and personal events stemming from uncertainty, which can lead to financial instability for those affected by various risks This chapter aims to examine big data analytics as a powerful tool for processing vast amounts of data to derive valuable insights It begins by discussing traditional techniques such as financial econometrics and optimization models, then explores how big data analytics has transformed analytical methods, concluding with case studies that advocate for its application.

Uncertainty refers to situations where various outcomes are possible during decision-making, yet their probabilities remain unknown When viewed as a surprise, uncertainty can influence individuals' perceptions of the future In today's world, market, financial, operational, social, environmental, institutional, and humanitarian risks are commonplace Life is filled with randomness and volatility, with significant events in history and personal experiences stemming from uncertainty This inherent unpredictability has critical implications, particularly regarding financial instability.

Big Data Analytics and Models

Ferdi Sửnmez https://orcid.org/0000-0002-5761-3866

Mimar Sinan Fine Arts University, Turkey

This chapter delves into big data analytics as a powerful method for processing vast datasets to reveal valuable insights, addressing the challenges faced by victims of various threats It begins by discussing traditional techniques such as financial econometrics and optimization models that preceded big data analytics Subsequently, it highlights how big data analytics has transformed analytical approaches, enhancing the ability to derive meaningful conclusions from complex data Finally, the chapter presents case studies that advocate for the adoption of big data analytics in diverse applications.

This section presents a detailed exploration of financial econometrics techniques and their practical applications, addressing essential aspects of finance, forecasting, and sampling, including continuous-time-period sampling and inference introduction Key topics covered in financial econometric models include Market Efficiency, Return Predictability, ARCH, GARCH, value at risk, volatility clustering, asset returns, Single and Variable linear Models, Cointegration, Conditional Heteroskedasticity, Market Microstructure, Event Analysis, Case Study Analysis Predictability, Capital Asset Pricing Models, Multi-Factor Pricing Models, Present-Price Relations, Intertemporal Equilibrium Models, and Maturity Structure Models.

Research on conditional return predictability based on the significance of information signals can be categorized into two main groups The first group focuses on price patterns following significant one-day price changes, revealing mixed results; while many studies indicate overreactions to both large positive and negative price changes, only a few suggest underreactions to negative changes Despite these findings, the economic relevance of such predictable patterns diminishes when transaction costs are considered The second group investigates the profitability of contrarian versus momentum investment strategies Contrarian strategies involve selling past winners and buying past losers, anticipating price reversals, while momentum strategies entail selling past losers and buying winners, expecting price continuations Although many studies observe both price continuations and reversals across various return intervals, the profitability of these transaction-heavy strategies is questionable when accounting for transaction costs Additionally, contrarian strategies may simply reflect the bid-ask bounce effect associated with short-term price movements.

Market Efficiency (ME), introduced by Fama in 1965, is a fundamental concept in financial economics that asserts securities in active markets are optimally allocated by participants The Efficient Market Hypothesis posits that achieving returns above market averages using only historical price data is challenging Financial econometric models and machine learning techniques have been developed to test ME experimentally This concept suggests that all available information and expectations are quickly reflected in asset prices, making it impossible for investors to consistently devise strategies that yield abnormal profits Bachelier (2011) emphasized that market prices incorporate past, present, and anticipated future events, often without a clear correlation to price changes ME is underpinned by the "random walk theory," which states that price fluctuations are independent and influenced solely by new market information.

Big Data Analytics plays a crucial role in understanding efficiency within capital markets, which can be categorized into three types: operational efficiency, allocational efficiency, and informational efficiency Operational efficiency focuses on how resources are utilized to enhance market operations Allocational efficiency refers to the market's ability to achieve a Pareto optimal allocation of resources, ensuring that funds are directed towards the most productive investments This mechanism is essential for stock markets to effectively channel scarce resources into real investments (Saraoglu, 2017).

Financial time series analysis focuses on understanding the mechanisms governing time series data, which are generated frequently and hold significant economic value (Lkhagva et al., 2006; Ican, 2013) Despite the appearance of independence in weekly or higher frequency return series, these observations often exhibit a high degree of dependence (Terọsvirta, 2009) To model this dependence, autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models are commonly employed, particularly when time series display volatility (Terọsvirta, 2009) The distinction between conditional and unconditional variance is crucial, as asset holders are more concerned with conditional variance over specific holding periods Predicting volatility differs from forecasting the conditional average, as volatility itself is unobservable, making measurement a key challenge The GARCH model, independently developed by Bollerslev and Taylor in 1986, serves as a robust tool for modeling financial time series, where the conditional variance is a linear function of its past values.

The GARCH process exhibits a constant average and is characterized as a weak stationary process when variance is present While it can also be a precise stationary process, this does not necessitate the weak stationarity condition, which demands that mean, variance, and autocovariance remain constant over time Precise stationarity entails that the distribution function of any set of ε t remains unchanged during time transitions, and it is important to note that finite moments are not a prerequisite for this type of stationarity (Yang, 2001).

The evolution of database technology and information infrastructure has enabled the effective storage and transmission of vast amounts of financial data This progress has been significantly enhanced by the advent of big data analytics Concurrently, professionals in the field have increasingly adopted machine learning techniques to tackle more complex challenges, driven by the need for innovative solutions.

Big Data Analytics and Models are essential for interpreting the characteristics of real-life data and forecasting future values in time series analysis The features derived from financial time series observations highlight the unique attributes of the dataset and are categorized under three main headings.

• Kurtosis: This criterion is a parameter that describes the graphical representation of the probabil- ity distribution of real-valued random variables Kurtosis is calculated as: kurt X E X

Kurtosis (kurt) measures the distribution of financial assets, which tends to be more uniform and exhibit greater averages than a normal distribution This leads to a higher standard deviation due to thicker tails, reflecting increased fluctuations in financial markets As a result, time series data often deviates from stationary behavior While normal distribution is a common method for analyzing financial time series, understanding the overall risk level requires auxiliary functions, with kurtosis and skewness being key indicators of this distribution.

Volatility clustering, as described by Mandelbrot in 1963, refers to the phenomenon where significant price changes in financial markets are often succeeded by further large changes, while minor price fluctuations are typically followed by small changes This suggests that price movements are interdependent rather than random The presence or absence of volatility clustering can be illustrated through two distinct time series graphs, with one exhibiting no clustering and the other demonstrating clear volatility clustering.

The impact of volatility on financial assets is significant, as it indicates that price decreases lead to greater fluctuations than price increases This means that the volatility associated with falling asset prices surpasses that of rising prices, highlighting the asymmetric nature of market reactions to price changes.

In 1997, the Black-Scholes model was introduced, earning Fisher Black and Myron Scholes the Nobel Prize in Economics for its innovative approach to pricing European-based options on non-dividend stocks This model has since evolved to assess the fairness of the prices paid or received in option transactions.

Pondicherry University, India

Định dạng
Số trang	500
Dung lượng	18,53 MB

Pedro f handbook of research on big data clustering and machine learning 2020

Chithambaramani ﻿/﻿TJS Engineering College, India

Big﻿Data﻿and﻿Clustering﻿Techniques

Big﻿Data﻿Analytics﻿and﻿Models

Technologies﻿for﻿Handling﻿Big﻿Data

Clustering﻿and﻿Bayesian﻿Networks

Analysis﻿of﻿Gravitation-Based﻿Optimization﻿Algorithms﻿for﻿Clustering﻿and﻿Classification

Analytics﻿and﻿Technology﻿for﻿Practical﻿Forecasting

Modern﻿Statistical﻿Modeling﻿in﻿Machine﻿Learning﻿and﻿Big﻿Data﻿Analytics:﻿Statistical﻿Models﻿for﻿ Continuous﻿and﻿Categorical﻿Variables

On﻿Foundations﻿of﻿Estimation﻿for﻿Nonparametric﻿Regression﻿With﻿Continuous﻿Optimization

An﻿Overview﻿of﻿Methodologies﻿and﻿Challenges﻿in﻿Sentiment﻿Analysis﻿on﻿Social﻿Networks

Evaluation﻿of﻿Optimum﻿and﻿Coherent﻿Economic-Capital﻿Portfolios﻿Under﻿Complex﻿Market﻿ Prospects

Data-Driven﻿Stochastic﻿Optimization﻿for﻿Transportation﻿Road﻿Network﻿Design﻿Under﻿Uncertainty﻿231 Suh-Wen Chiou, National Dong Hwa University, Taiwan

Examining﻿Visitors’﻿Characteristics﻿and﻿Behaviors﻿in﻿Tourist﻿Destinations﻿Through﻿Mobile﻿Phone﻿ Users’﻿Location﻿Data