Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Santtu Tikka

Clustering and Pruning in Causal Data Fusion

May 21, 2025

Otto Tabell, Santtu Tikka, Juha Karvanen

Abstract:Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific scenarios, do-calculus remains the only general-purpose tool for causal data fusion, particularly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increases and the causal graph grows in complexity. Consequently, there exists a need to reduce the size of such models while preserving the essential features. For this purpose, we propose pruning (removing unnecessary variables) and clustering (combining variables) as preprocessing operations for causal data fusion. We generalize earlier results on a single data source and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identifiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corresponding identifying functional for identifiable causal effects. Examples from epidemiology and social science demonstrate the use of the results.

Via

Access Paper or Ask Questions

Transition Network Analysis: A Novel Framework for Modeling, Visualizing, and Identifying the Temporal Patterns of Learners and Learning Processes

Nov 23, 2024

Mohammed Saqr, Sonsoles López-Pernas, Tiina Törmänen, Rogers Kaliisa, Kamila Misiejuk, Santtu Tikka

Figure 1 for Transition Network Analysis: A Novel Framework for Modeling, Visualizing, and Identifying the Temporal Patterns of Learners and Learning Processes

Figure 2 for Transition Network Analysis: A Novel Framework for Modeling, Visualizing, and Identifying the Temporal Patterns of Learners and Learning Processes

Figure 3 for Transition Network Analysis: A Novel Framework for Modeling, Visualizing, and Identifying the Temporal Patterns of Learners and Learning Processes

Figure 4 for Transition Network Analysis: A Novel Framework for Modeling, Visualizing, and Identifying the Temporal Patterns of Learners and Learning Processes

Abstract:This paper proposes a novel analytical framework: Transition Network Analysis (TNA), an approach that integrates Stochastic Process Mining and probabilistic graph representation to model, visualize, and identify transition patterns in the learning process data. Combining the relational and temporal aspects into a single lens offers capabilities beyond either framework, including centralities to capture important learning events, community finding to identify patterns of behavior, and clustering to reveal temporal patterns. This paper introduces the theoretical and mathematical foundations of TNA. To demonstrate the functionalities of TNA, we present a case study with students (n=191) engaged in small-group collaboration to map patterns of group dynamics using the theories of co-regulation and socially-shared regulated learning. The analysis revealed that TNA could reveal the regulatory processes and identify important events, temporal patterns and clusters. Bootstrap validation established the significant transitions and eliminated spurious transitions. In doing so, we showcase TNA's utility to capture learning dynamics and provide a robust framework for investigating the temporal evolution of learning processes. Future directions include advancing estimation methods, expanding reliability assessment, exploring longitudinal TNA, and comparing TNA networks using permutation tests.

* Accepted at Learning Analytics & Knowledge (LAK '25)

Via

Access Paper or Ask Questions

Simulating counterfactuals

Jun 27, 2023

Juha Karvanen, Santtu Tikka, Matti Vihola

Abstract:Counterfactual inference considers a hypothetical intervention in a parallel world that shares some evidence with the factual world. If the evidence specifies a conditional distribution on a manifold, counterfactuals may be analytically intractable. We present an algorithm for simulating values from a counterfactual distribution where conditions can be set on both discrete and continuous variables. We show that the proposed algorithm can be presented as a particle filter leading to asymptotically valid inference. The algorithm is applied to fairness analysis in credit scoring.

Via

Access Paper or Ask Questions

Clustering and Structural Robustness in Causal Diagrams

Nov 08, 2021

Santtu Tikka, Jouni Helske, Juha Karvanen

Figure 1 for Clustering and Structural Robustness in Causal Diagrams

Figure 2 for Clustering and Structural Robustness in Causal Diagrams

Figure 3 for Clustering and Structural Robustness in Causal Diagrams

Figure 4 for Clustering and Structural Robustness in Causal Diagrams

Abstract:Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagram but it may erroneously change the essential properties of the causal relations if implemented arbitrarily. We define a specific type of cluster, called transit cluster, that is guaranteed to preserve the identifiability properties of causal effects under certain conditions. We provide a sound and complete algorithm for finding all transit clusters in a given graph and demonstrate how clustering can simplify the identification of causal effects. We also study the inverse problem, where one starts with a clustered graph and looks for extended graphs where the identifiability properties of causal effects remain unchanged. We show that this kind of structural robustness is closely related to transit clusters.

Via

Access Paper or Ask Questions

Identifying Causal Effects via Context-specific Independence Relations

Sep 21, 2020

Santtu Tikka, Antti Hyttinen, Juha Karvanen

Figure 1 for Identifying Causal Effects via Context-specific Independence Relations

Figure 2 for Identifying Causal Effects via Context-specific Independence Relations

Figure 3 for Identifying Causal Effects via Context-specific Independence Relations

Figure 4 for Identifying Causal Effects via Context-specific Independence Relations

Abstract:Causal effect identification considers whether an interventional probability distribution can be uniquely determined from a passively observed distribution in a given causal structure. If the generating system induces context-specific independence (CSI) relations, the existing identification procedures and criteria based on do-calculus are inherently incomplete. We show that deciding causal effect non-identifiability is NP-hard in the presence of CSIs. Motivated by this, we design a calculus and an automated search procedure for identifying causal effects in the presence of CSIs. The approach is provably sound and it includes standard do-calculus as a special case. With the approach we can obtain identifying formulas that were unobtainable previously, and demonstrate that a small number of CSI-relations may be sufficient to turn a previously non-identifiable instance to identifiable.

* Appeared at 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

Via

Access Paper or Ask Questions

Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach

Feb 28, 2019

Santtu Tikka, Antti Hyttinen, Juha Karvanen

Figure 1 for Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach

Figure 2 for Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach

Figure 3 for Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach

Figure 4 for Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach

Abstract:Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and experimental source distributions. The search is enhanced via a heuristic and search space reduction techniques. The approach, called do-search, is provably sound, and it is complete with respect to identifiability problems that have been shown to be completely characterized by do-calculus. When extended with additional rules, the search is capable of handling missing data problems as well. With the versatile search, we are able to approach new problems such as combined transportability and selection bias, or multiple sources of selection bias. We also perform a systematic analysis of bivariate missing data problems and study causal inference under case-control design.

* 32 pages, 11 figures

Via

Access Paper or Ask Questions

Surrogate Outcomes and Transportability

Jun 21, 2018

Santtu Tikka, Juha Karvanen

Figure 1 for Surrogate Outcomes and Transportability

Figure 2 for Surrogate Outcomes and Transportability

Figure 3 for Surrogate Outcomes and Transportability

Figure 4 for Surrogate Outcomes and Transportability

Abstract:Identification of causal effects is one of the most fundamental tasks of causal inference. We consider a variant of the identifiability problem where a causal effect of interest is not identifiable from observational data alone but some experimental data is available for the identification task. This corresponds to a real-world setting where experiments were conducted on a set of variables, which we call surrogate outcomes, but the variables of interest were not measured. This problem is a generalization of identifiability using surrogate experiments and we label it as surrogate outcome identifiability and show that the concept of transportability provides a sufficient criteria for determining surrogate outcome identifiability for a large class of queries.

* Submitted to International Journal of Approximate Reasoning

Via

Access Paper or Ask Questions

Enhancing Identification of Causal Effects by Pruning

Jun 19, 2018

Santtu Tikka, Juha Karvanen

Figure 1 for Enhancing Identification of Causal Effects by Pruning

Figure 2 for Enhancing Identification of Causal Effects by Pruning

Figure 3 for Enhancing Identification of Causal Effects by Pruning

Figure 4 for Enhancing Identification of Causal Effects by Pruning

Abstract:Causal models communicate our assumptions about causes and effects in real-world phe- nomena. Often the interest lies in the identification of the effect of an action which means deriving an expression from the observed probability distribution for the interventional distribution resulting from the action. In many cases an identifiability algorithm may return a complicated expression that contains variables that are in fact unnecessary. In practice this can lead to additional computational burden and increased bias or inefficiency of estimates when dealing with measurement error or missing data. We present graphical criteria to detect variables which are redundant in identifying causal effects. We also provide an improved version of a well-known identifiability algorithm that implements these criteria.

* Journal of Machine Learning Research (JMLR), 18(194):1-23, 2018
* This is the version published in JMLR

Via

Access Paper or Ask Questions

Simplifying Probabilistic Expressions in Causal Inference

Jun 19, 2018

Santtu Tikka, Juha Karvanen

Figure 1 for Simplifying Probabilistic Expressions in Causal Inference

Figure 2 for Simplifying Probabilistic Expressions in Causal Inference

Figure 3 for Simplifying Probabilistic Expressions in Causal Inference

Figure 4 for Simplifying Probabilistic Expressions in Causal Inference

Abstract:Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the R package causaleffect.

* Journal of Machine Learning Research (JMLR), 18(36):1-30, 2017
* This is the version published in JMLR

Via

Access Paper or Ask Questions