Data Selves & Doubles in Digital Identity

Updated 22 September 2025

Data selves and data doubles are conceptual frameworks defining personal digital footprints and externally assembled profiles that reshape digital identity.
They are formed through self-tracking, algorithmic aggregation, and comparative analytics that translate raw data into actionable insights.
This topic explores the ethical, governance, and privacy challenges arising from the dual roles of self-curated and externally constructed digital identities.

Data selves and data doubles are conceptual frameworks used to describe how personal identity is represented, constructed, and operationalized in the digital era. These constructs span self-cultivated digital footprints, externally assembled data profiles, ethically charged governance architectures, and speculative futures involving highly realistic digital copies. The following sections detail their definitions, formation processes, philosophical tensions, technical mechanisms, governance issues, and ethical ramifications, drawing on multidisciplinary research spanning philosophy of data, participatory design, policy studies, computational analytics, and AI security.

1. Conceptual Distinctions and Historical Evolution

Data selves refer to digital traces intentionally generated or curated by individuals—such as self-tracking logs, social media posts, and quantified self artifacts—through which users enact, monitor, and reflect on their own identities (Gorichanaz, 15 Sep 2025). These are aggregations of person-relevant data, shaped by agency and self-reflection, and are historically rooted in practices like journaling and diary keeping but technologically scaled in the age of wearables and pervasive computing.

Data doubles signify externally constructed, opaque digital profiles—assembled by algorithms, corporations, or third parties—used to infer, predict, and manipulate user behaviors. This externalization includes behavioral dossiers built from shopping history, demographic attributes, and browsing patterns, generally beyond individual control and often serving commercial or bureaucratic interests (Gorichanaz, 15 Sep 2025, Gutierrez, 2017).

Gutierrez (Gutierrez, 2017) traces the genealogy of data from its Latin origin ("datum") and identifies the transformation of data into both an abstract, context-dependent substance and an objective material entity. The formation of data selves and doubles is a modern extension of the separation between the act of data production and its subsequent interpretative use.

2. Formation, Aggregation, and Representation Processes

The construction of data selves and data doubles operates across several mechanisms:

Self-Tracking and Data-Object Design: Artifacts such as data-objects embed self-tracking logs in tactile or meaningful physical objects (e.g., ski boots triggering activity tracking or mugs encoding sleep data), offering narrative-rich, contextually grounded venues for self-reflection and bodily interplay (Karyda et al., 2020). These artifacts translate numerical metrics into lived experiences, mediating between embodied existence and externalized representation.
Algorithmic Profiling and Inference: Aggregation functions synthesize discrete data fragments ( ${D_1, D_2, ..., D_n}$ ) into comprehensive profiles ( $F = f(D_1, D_2, ..., D_n)$ or $F = g(V, P)$ separating value and privacy components), thereby operationalizing data doubles as predictive and analytic resources (Kleek, 2020).
Comparative Analytics: Systems such as CALTREND enable the juxtaposition of personal logs with anonymized data from others (using t-SNE for dimensionality reduction), expanding understanding of selfhood via comparative visualizations and differential analysis (Shin et al., 1 May 2025).

Personal data may accrue through explicit interactions (manual logging, social media sharing) or implicit mechanisms (sensor data, behavioral tracking via connected devices) (Banerjee et al., 2020, Tran et al., 2024). Aggregation at scale leads to data doubles capable of reflecting, challenging, or even imposing new behavioral norms, as illustrated in lifelogging paradigms and algorithmic recommendations.

3. Philosophical and Sociological Implications

The material-abstract duality of data (Gutierrez, 2017) anchors several philosophical tensions:

Identity and Authenticity: Machine-assembled data doubles threaten to reduce multidimensional human identity to algorithmically convenient patterns. The decontextualization and fragmentation inherent in large-scale databases risks misrepresenting lived experience (Gutierrez, 2017, Gorichanaz, 15 Sep 2025).
Agency and Control: Recurrent separation between the production and usage of personal data means individuals rarely direct the creation or activation of their data doubles (Gutierrez, 2017, Verhulst, 2022, Banerjee et al., 2020). The resulting asymmetries (data, information, agency) produce structural vulnerabilities, particularly for marginalized groups (Verhulst, 2022).
Collectivization and the Commons: The management of data as a shared resource (data co-ops, citizen data commons) reconfigures the locus of control from privatized corporate silos to collective stewardship, raising questions of equitable access, responsibility, and negotiation (Banerjee et al., 2020, Verhulst, 2022).

These tensions play out in cultural contexts where practices such as self-tracking serve as both tools for self-improvement and as mechanisms of workplace self-surveillance, as observed in the overworking culture in China (Zheng, 2024).

4. Technical Architectures and Analytical Models

The representation and analysis of data selves/doubles employ a variety of computational and visualization techniques:

Dimensionality Reduction and Visualization: t-SNE and UMAP project high-dimensional behavioral data into interpretable clusters, enabling comparative analytics and critical algorithmic literacy (Kondo et al., 23 Apr 2025, Shin et al., 1 May 2025). The t-SNE cost function, for example,

$C = \sum_{i \neq j} p_{ij} \log \left( \frac{p_{ij}}{q_{ij}} \right)$

operationalizes similarity modeling for clustering behaviors and schedules.

Embedding-based Search and Retrieval: Deep neural architectures (AlexNet, VGG, ResNet) and multimodal transformers (CLIP) underpin lifelog search systems, permitting cosine similarity-driven cross-modal retrievals (Tran et al., 2024).
Explanatory Interfaces: LLM-based ‘hypothetical inference’ generates interpretable summaries or simulated algorithmic recommendations, illuminating the translation of digital footprints into inferred platform profiles (Kondo et al., 23 Apr 2025).

These tools facilitate self-reflection, critical comparison, and temporal tracking of self-representation, but also reinforce the power of digital doubles to influence perceptions and decisions.

5. Governance, Policy, and Data Control

Effective management of data selves and doubles requires coordinated action across individual, technical, organizational, and policy spheres:

Individual Control Frameworks: Proposed mechanisms include data co-ops (collective bargaining and shared gains), federated personal data stores (granular control over access/sharing), and trusted data spaces (Banerjee et al., 2020, Verhulst, 2022, Tran et al., 2024).
Digital Self Determination (DSD): DSD is a principle and practice encompassing both individual and collective rights to self-govern digital identity, using operational frameworks involving processes, people/organizations, policies, and products/technologies (Verhulst, 2022). The operational formula

$\mathrm{DSD} = f(\text{Processes}, \text{People %%%%4%%%% Organizations}, \text{Policies}, \text{Products %%%%4%%%% Technologies})$

encapsulates the four-pronged approach toward agency and accountability.

Policy Recommendations: Questions of who profits from data, which rights are enforceable, and how fairness and transparency are guaranteed are central. Regulatory solutions span charters, codes of conduct, and enforceable social licenses, particularly for vulnerable and marginalized populations (Verhulst, 2022, Banerjee et al., 2020).

The passage from private self-tracking to societal-scale “data commons” reconfigures the balance between autonomy and societal benefit, urging re-examination of ownership, consent, and stewardship.

6. Ethical Risks and Future Challenges

Ethical considerations encompass privacy, fairness, manipulation, and surveillance:

Privacy and Agency Risks: Re-identification, bias, and manipulation are prominent threats when data doubles are deployed for decisions, surveillance, or behavioral engineering. Even deidentified datasets are vulnerable to inference attacks (Banerjee et al., 2020, Helbing et al., 2022).
Identity Theft and AI Security: AI amplifies the scale and sophistication of identity theft via neural fuzzing, deepfake vishing, and synthetic identity fraud. The process of identification can be formalized as

$R(I) = f(O, A, T)$

where $O$ is the partial identity object, $A$ agency over the data, and $T$ techniques for verification (Gorichanaz, 15 Sep 2025). The resulting arms race between criminals and law enforcement increasingly leverages AI on both sides, creating dynamic risks for end users.

Digital Twins and Bi-directional Modeling: The emergence of digital twins—dynamic, real-time digital replicas—expands the scope of data doubles into predictive and prescriptive domains, with applications from healthcare to urban policy, but also with threats of exacerbated agency loss and discrimination (Helbing et al., 2022).

Future architectures may employ blockchain for tamper-resistant identity, AI for anomaly detection, and enhanced participatory approaches for governance. Nevertheless, the need for systematic, regulatory solutions and privacy-preserving practices remains critical amidst rapid technological change (Gorichanaz, 15 Sep 2025, Verhulst, 2022).

7. Comparative and Contextual Analytics

Comparative analytics systems operationalize the duality of data selves and doubles by contextualizing individual behaviors within broader reference groups, employing visual juxtaposition and domain-adaptable interfaces (Shin et al., 1 May 2025). These approaches enable domain experts (e.g., in healthcare, marketing) to extract mental-model-specific insights, facilitate the discovery of hidden patterns, and support enriched self-understanding. The inherent challenges include privacy (risk of re-identification), interpretative complexity (adaptation to various professional frameworks), and ethical requirements for user-centered customization and fairness.

Data selves and data doubles are dynamic, interdependent constructs whose technical, ethical, and philosophical complexities underpin modern debates on digital identity, agency, governance, and social justice. Technological developments—ranging from lifelogging and object-theater artifacts to comparative analytics and digital twins—continuously reshape their contours and implications. The management and protection of digital selfhood requires robust frameworks for individual/collective agency, participatory governance, privacy, and ongoing adaptation to emergent threats and possibilities in the datafied society.