Identifying the Possibility of Indirect Leaks in AI systems
Konyavskiy V*, Agapitov D
Moscow Institute of Physics and Technology, Moscow, Russia
Submission:December 04, 2025;Published:December 10, 2025
*Corresponding author:Konyavskiy V, Moscow Institute of Physics and Technology (National Research University), Moscow, Russia
How to cite this article: Konyavskiy V, Agapitov D. Identifying the Possibility of Indirect Leaks in AI systems. Robot Autom Eng J. 2025; 7(1): 555702.DOI: 10.19080/RAEJ.2025.07.555702
Annotation
The trustworthiness of artificial intelligence systems is determined, in particular, by the implementation of a set of technical information security measures, which must be reinforced with protection against new types of attacks aimed at extracting protected data from collections of reports (outputs). Leaks associated with this type of attack are identified as “indirect leaks.” For AI systems, determining the possibility of indirect data leakage is a relevant scientific problem. To detect the possibility of indirect data leaks, a new method for analyzing sets of queries is proposed, based on the use of Jacobian matrices.
Problem statement
We consider the use of external (acquired) data. Here, to improve model quality, we use data collected by different operators such as banks, retailers, telecom operators, insurance companies, etc. Thus, the data is first enriched (in our case-by merging datasets from different operators) and then used in model development process.
At each of these stages there are specific features of working with datasets. An Enriched Dataset (ED) is formed by combining the existing datasets (D) of various operators. During model development, the developer’s access to data may be restricted by classical information-protection mechanisms. However, this does not exclude the possibility of computing protected data (and/or gaining access to them) by other means. Therefore, it is necessary to determine whether an indirect data leak from the ED is possible during model development.
Application of Jacobian matrices and their extension to detect possible indirect data leaks
We assume the use of external data accumulated by kdata Operators (DOs). Let’s denote them as
accumulates a dataset Di,1,k. Clearly, over time each DOi grows constantly as relationships with new data subjects appear, thus, the size of the dataset does not decrease. If Di contains pi
features for each of mi subjects, then Di obviously contains di=mi.pi data items.
At the beginning it is necessary to form an ED. Let, for definiteness DS1 , be called the dataset being enriched. As a result of merging DS1 with DSi,i=2,k, we obtain an ED, which contains
data items. It’s clear, that
and m<< p. Hereis the number of rows (subjects) in the ED, and pis the number of columns (features) in the ED.
Note that enrichment is called horizontal when p>p1, and vertical when m>m1. In both cases, ED contains dm=. pdata items. Denote the entire collection of these data by 
Next comes model selection and training. The model developer chooses the best model from a family of standard models using the ED. Machine-learning technologies are well described in the literature . We only note that model quality is evaluated by computing M values of a finite set of performance metrics . Denote the vector of metric values by 
To compute these values, Mso-called IT-pipelines are used - understood as a fixed sequence of operations on data (in accordance with [1]). In essence, this overall transformation can be described by a set of functions. Let’s introduce the notation for these functions: 
Using the introduced notation, the goal of model developer can be described as follows: evaluate the quality of the model based on estimates:

This is a formalization of standard practice. From the viewpoint of information security in terms of indirect leaks, we pose the following problem:
Is it possible to determine the value of
for some
given F and Y ?
In the form of (*), this problem can be reduced [12] to computing the rank of a Jacobian matrix.

....

Where
are the parameters of a subject of interest
x∈ X from the dataset.
Suppose the value of function from (1), say yi , is uniquely determined by the values of the remaining functions:

In this case, it is said that the function yi depends on the others, and the functions in (1) are called dependent. Otherwise, the functions in (1) are called independent on the domain under consideration.
Now consider the Jacobian matrix composed of the partial derivatives of these functions with respect to all independent variables:

The rank rk(J ) of the Jacobian matrix J characterizes the independence of the functions; specifically, if rk(J ) = m , then the functions in (1) are independent [2].
We can now determine whether an indirect leak of the parameter value is possible when applying the IT-pipelines described by (1). To do this, we add to (1) a function corresponding to a trivial pipeline - denote it by ym+1= xi.
Now we construct the matrix (3) for the extended set of IT
pipelines, and compute the rank of the extended Jacobian matrix
J + . If rk(J + ) = m+1, then all functions are independent,
and an indirect leak is excluded. If, however, rk(J + ) = m
, then the added trivial IT-pipeline ym+1= xi belongs to the set
of dependent IT-pipelines, and we must conclude that an indirect
leak of xi is possible. In other words, there exists a function such
that
, thus, the value of xi is computable.
We perform these steps for each value
, thereby
checking for the presence of an indirect leak for every xi .
Thus, using this method, for any set of IT-pipelines whose corresponding functions are continuously differentiable, one can determine whether an indirect leak is possible - thereby solving the problem of detecting indirect leaks for such IT-pipelines.
After detecting a potential indirect leak, one can either apply a noise-injection mechanism to the outputs of those IT-pipelines that permit the detected leak, or simply block execution of that set of IT-pipelines - thus providing protection against the exploitation of indirect leaks.
If, for some reason, it is impossible to eliminate the dependence between the functions constituting the IT-pipelines, mechanisms of differential privacy can be applied to the corresponding report data.
Conclusion
The introduced notion of “indirect leak” reveals specific properties of AI systems that must be taken into account when building trusted systems. Based on the general theory of functional dependence, a new analysis method using extendable Jacobian matrices is proposed to study the possibility of indirect leaks. The proposed method can be extended to work with metrics (ITpipelines) whose associated functions are, in the general case, not continuously differentiable over their entire domain. This solves the posed problem of detecting the possibility of indirect data leaks in AI systems.
References
- Konyavsky VA, Konyavskaya-Schastnaya SV, Ross GV, Raigorodskiy AM, Trenin SA, et al. (2024) “Blind” processing technology for external data in machine learning systems. Voprosy zashchity informatsii [Information Security Issues]. Moscow 2: 17-32.
- Fikhtengolts GM (1997) Course of Differential and Integral Calculus. Saint Petersburg: Lan 1.

















