The business of predicting the future, by observing the past.

I seem to have gravitated towards time and this niche of machine learning problem. Initially struggling with the statistical jargon of this field, I wanted to build one place for all the terms involved in this thing called Time Series Analysis. Essentially we are looking at some variable, metric, measurable quantity, a number, evolve over time, noticing patterns and anomalies in it and using that insight to predict its future trajectory. People starting off in machine learning generally start at Linear regression, learn about classification next, neural networks third, the fancy CNNs, RNNs after that, some natural language processing and preliminary LLMs maybe sprinkled for the zeitgeist, maybe a hint of network analytics and visualizations - or at least that has been my journey for better or for worse? We shall see. Here in this space I want to talk about a very close cousin of linear regression - time series analysis. Primarily what all those statistical terms mean, why they could be useful for the right problem.

The Glossary

The Data

Time Indexed: The ID of each row in tabular time series data is the Date or timestamp of observation - this means data points or observations are associated with specific points or intervals in time, allowing for temporal ordering and analysis.

Frequency: The rate at which observations or events occur within a given time period, often expressed as the number of occurrences per unit time.

Trend: A long-term pattern or tendency observed in time series data, indicating overall upward or downward movement over time.

Seasonality: Regular and predictable fluctuations or patterns that occur within a time series at fixed intervals, typically corresponding to seasons, months, or other calendar periods.

Stationarity: The statistical property of a time series where key characteristics such as mean, variance, and autocorrelation structure remain constant over time, indicating a stable behavior.

Unit Root: A unit root is a characteristic of certain stochastic processes where a single random event can have a lasting impact, leading to non-stationarity in a time series. Its presence necessitates careful analysis and transformation to achieve stationarity, crucial for meaningful econometric analysis and inference in time series models.

Volatility: The degree of variation or dispersion in the values of a time series, often associated with the magnitude of fluctuations or changes observed over time.

Cyclicality: Patterns of fluctuation or movement observed in a time series that occur at irregular intervals and are not necessarily tied to calendar periods, representing periodic but non-seasonal behavior.

Noise: Random or irregular fluctuations in a time series that are not attributable to underlying trends, patterns, or structural components, often representing measurement errors or unexplained variability.

Ergodicity: A property of a stochastic process where the statistical properties observed over time converge to their theoretical or ensemble averages, enabling meaningful analysis of long-term behavior based on finite observations.

Causality: The relationship between cause and effect in a time series context, where changes in one variable are directly responsible for changes in another variable, indicating a causal link between the two.

Anomaly: Unusual or unexpected observations or events within a time series that deviate significantly from the expected or normal behavior, often indicating underlying abnormalities, errors, or noteworthy occurrences.

Heteroskedasticity: The presence of non-constant variability or dispersion in the residuals of a regression model, indicating that the spread of residuals changes as a function of predictor variables.

Correlation: A measure of the strength and direction of the linear relationship between two variables in a time series, indicating how changes in one variable correspond to changes in another.

Variance Inflation Factor: A statistical measure used to detect multicollinearity in regression analysis, quantifying the degree to which the variance of an estimated regression coefficient is inflated due to correlation with other predictor variables.

Co-integration: A statistical property of multiple time series that indicates a long-term equilibrium relationship among them, even though the individual series may be non-stationary, enabling the analysis of long-term dependencies and relationships.

Pre-Processing

Autocorrelation: Autocorrelation measures the degree of linear relationship between a time series and a lagged version of itself. It assesses the extent to which past values of the series influence its current values. A high autocorrelation indicates a strong correlation between adjacent observations, suggesting the presence of underlying patterns or trends in the data.

Covariance: Covariance quantifies the extent to which two random variables vary together. In the context of time series analysis, covariance measures the degree of linear association between two time series. A positive covariance indicates that the variables move in the same direction, while a negative covariance suggests they move in opposite directions. Covariance is a key metric for understanding the relationship between different variables and assessing their joint variability over time.

Smoothing: Smoothing techniques are used to reduce the noise or variability in a time series by averaging out short-term fluctuations. These methods help reveal underlying trends or patterns in the data by removing random fluctuations that may obscure the signal of interest. Common smoothing techniques include moving averages and exponential smoothing, which apply weighted averages to consecutive observations to create a smoother representation of the data.

Imputation methods: Imputation methods are used to estimate or fill in missing values in a time series dataset. Mean imputation replaces missing values with the mean of the observed data, median imputation uses the median, and interpolation estimates missing values based on the values of neighboring data points. These methods ensure that the dataset remains complete and suitable for analysis, minimizing the impact of missing data on statistical results.

Transformation: Transformation techniques modify the structure or distribution of a time series to make it more suitable for analysis. Detrending removes long-term trends from the data, differencing calculates the differences between consecutive observations to stabilize the variance, and normalization scales the data to a common range. These transformations help stabilize the statistical properties of the data and make it easier to interpret and model.

Differencing: Differencing is a transformation technique that calculates the differences between consecutive observations in a time series. It is commonly used to stabilize the variance and remove trends or seasonality from the data, making it stationary and suitable for analysis.

Detrending: Detrending is a preprocessing step that removes long-term trends or systematic patterns from a time series. It aims to isolate the short-term fluctuations or irregularities in the data, making it easier to analyze and interpret. Detrending techniques include polynomial fitting, moving averages, and regression analysis, which identify and remove the trend component from the series.

Stationarizing: Stationarizing a time series involves transforming it into a stationary process, where the statistical properties such as mean, variance, and autocorrelation remain constant over time. Stationarity is desirable for many time series models as it simplifies the modeling process and ensures reliable forecasts. Techniques for stationarizing a time series include detrending, differencing, and transformation.

Anomaly Detection: Anomaly detection involves identifying unusual or unexpected patterns in a time series that deviate from normal behavior. These anomalies may indicate errors, outliers, or significant events that require further investigation. Anomaly detection techniques include statistical methods, machine learning algorithms, and rule-based approaches, which flag observations or patterns that fall outside the expected range of values.

Regime Shifts: Regime shifts refer to sudden, significant changes in the behavior or dynamics of a time series. These shifts mark transitions between different states or regimes, often resulting from external factors or underlying structural changes in the system.

Normalization: Normalization is a preprocessing step that scales the values of a time series to a standard range, typically between 0 and 1. It ensures that all variables contribute equally to the analysis and prevents biases due to differences in scale or magnitude. Normalization techniques include min-max scaling, z-score normalization, and robust scaling, which transform the data to a common scale while preserving its relative relationships.

Resampling: Resampling involves generating new samples or subsets from an existing time series dataset. It is used to validate statistical estimates, assess the stability of models, and test hypotheses about the underlying data distribution.

Windowing: Windowing involves dividing a time series into overlapping or non-overlapping windows or segments for analysis. It allows for localized analysis of temporal patterns or trends within specific intervals of the data. Windowing techniques include fixed-size windows, sliding windows, and exponential windows, which facilitate targeted analysis and visualization of different temporal behaviors or phenomena.

Modeling

Autoregression: A linear predictive model that uses past values of a variable to forecast its future values.

Moving average: A time series model that uses the weighted average of past errors to predict future values.

Exponential smoothing: A forecasting method that assigns exponentially decreasing weights to past observations.

ARIMA (Autoregressive Integrated Moving Average): A class of models that combines autoregression, differencing, and moving average components to capture complex time series patterns.

State-space models: A flexible class of time series models that represent the observed data as the output of a dynamic system driven by unobserved state variables.

Error correction models: Time series models that correct for deviations from long-run equilibrium relationships between variables.

Neural Networks (RNN, LSTM): Flexible nonlinear models that can capture complex patterns in time series data, particularly for tasks like sequence prediction and time series classification.

Vector autoregression (VAR): A multivariate time series model that generalizes autoregression to capture the interdependencies between multiple variables.

Transfer function models: Time series models that relate the output of a system to its input through a linear filter.

Wavelet analysis: A time-frequency analysis technique that decomposes a time series into different frequency components, allowing the identification of localized patterns.

Time Series Clustering: Unsupervised method to categorize data points based on similarity, aiding in identifying archetypes or trends in sequential data.

Post-Modeling

Forecasting: The process of making predictions about the future based on analyzing past and present data.

Prediction intervals: Statistical estimates that provide a range of likely values for a future observation, accounting for the uncertainty in the prediction.

Backtesting: The practice of evaluating a model's performance by applying it to historical data to see how well it would have predicted the past.

Monte Carlo simulations: A technique that uses random sampling to obtain numerical results, often used to model the probability of different outcomes in a process that cannot be easily predicted.

Rolling window: A method of analyzing time series data by using a fixed-size window that is moved sequentially through the data, allowing for dynamic model updates.

Bootstrapping: A resampling technique that creates multiple samples from the original data to estimate the sampling distribution of a statistic, providing a way to quantify uncertainty.

Conformal prediction: A framework for constructing prediction sets that are guaranteed to contain the true value with a pre-specified probability, without making strong assumptions about the data-generating process.

Confidence intervals: Statistical estimates that provide a range of likely values for an unknown parameter, reflecting the precision of the estimate.

Model drift: The phenomenon where a model's performance degrades over time due to changes in the underlying data-generating process, requiring the model to be updated or retrained.

The hope is that this might help someone starting on analysing time series data know the terms experts use, and also an easy link for me to share with someone to explain an idea.

Reference