Model Details

Research Project: GDS (Geometrodynamic Semantics)
Version: 1.1 (as of October 2025)
Researcher: Mihai A. Mateescu
Initiative: Independent Research & Development Genesis
Contact: mihai.mateescu@web.de
Core Languages: Rust, Julia
Technology Stack:
- Core Logic & Orchestration: Rust
- Numerical Backend (HDC): Julia (via FFI)
- Storage Format: Apache Parquet (with ZSTD compression)
- Data Pipeline / In-Memory: Apache Arrow
- Checkpointing & Dynamic Overlay: LMDB (via heed)
- k-NN Indexing (planned): FAISS (Binary)

About This Research

This document presents a research prototype exploring physics-inspired artificial intelligence. This work is conducted independently without institutional affiliation. For background and collaboration opportunities, see About the Researcher.

Executive Summary & Vision

GDS (Geometrodynamic Semantics) is a research prototype exploring an alternative to traditional, statistics-based Transformer architectures. Rather than predicting the next token, GDS models semantic reasoning as a physical phenomenon.

Inspired by Einstein’s theory of General Relativity, GDS treats concepts as “semantic particles” possessing intrinsic properties: mass (semantic importance), charge (the hyperdimensional vector), and spin (affective value). These particles are generated by the CSI-HDC (Conceptual State Injector using Hyperdimensional Computing)—a semantic tokenizer that replaces traditional token sequences with 20,000-dimensional binary hypervectors.

The CSI-HDC’s output is not a flat sequence of tokens, but a dynamic field of interacting particles. When processed by the GDS engine, this field warps a high-dimensional “conceptual space”. Reasoning is then modeled as finding the path of least resistance—a geodesic—through this curved semantic manifold.

Learning occurs not through backpropagation, but through a Hebbian-style mechanism that modifies the geometry of the space itself. A dynamic Overlay layer adds contextual adjustments to edge costs in the graph. Successful reasoning paths are reinforced, making them “cheaper” and more likely in future queries. This process is governed by internal evaluation and a ValidationGate, enabling autonomous learning based on coherence principles rather than direct supervision.

The result is a research prototype demonstrating efficient, scalable, and—most importantly—explainable semantic reasoning, where every path can be audited and understood step-by-step.

Detailed Architecture (How the System Reasons)

The GDS cognitive architecture is a multi-layered research prototype where each layer has a distinct responsibility, from static data storage to dynamic, adaptive learning. The reasoning process emerges from the interaction of these layers.

The 5 Layers of GDS

Semantic Base (The Static Universe)
- Component: A large-scale, compressed Parquet file containing the lexicon of all “semantic particles”.
- Role: This is the foundational, long-term memory of the system. It contains millions of concepts, each with its pre-calculated mass (m0), affective spin (VAD), and a unique 20,000-bit HDC vector (q).
- Includes: A static graph of structural edges derived from curated knowledge bases (e.g., ConceptNet’s IsA relation).
Proximity Graph (The Implicit Network)
- Component: A graph layer constructed on top of the Semantic Base. Its crucial feature is the inclusion of proximity edges.
- Role: These edges are not explicit in the source data. They are discovered by performing a k-Nearest Neighbors (k-NN) search (using FAISS) on the HDC vectors. This allows the model to create novel connections between semantically similar concepts, even if they were not explicitly linked in any knowledge base. This graph represents the fabric of the “conceptual space”.
Context Overlay (The Ephemeral Mind)
- Component: A dynamic key-value store (LMDB) that maps graph edges to a delta value.
- Role: This is the model’s short-term, contextual memory. It holds temporary adjustments to the “cost” of traversing an edge. When the model learns, it doesn’t modify the static graph; it simply adds a small positive (penalty) or negative (reinforcement) delta to this overlay. It is volatile and session-specific by default.
Geodesic Runtime (The “Thinker”)
- Component: The Reasoner module, which implements a graph-traversal algorithm (A*).
- Role: This is the active part of the model. When given a start and a goal concept, the Reasoner does not just find the shortest path; it finds the path of least cost. The cost function is a sophisticated, weighted sum that makes the process “geodesic”:
  - \(Cost(edge) = \alpha \cdot (1/m_0) + \beta \cdot (\Delta VAD) + \gamma \cdot (1/rel_{strength}) + \lambda \cdot (Overlay_{\Delta})\)
- This means the Reasoner naturally prefers paths that go through important concepts (high m0), avoid sharp emotional shifts (low ?VAD), follow strong structural relations, and are influenced by recent learning (Overlay_Delta).
Learning Loop (The “Neuroplasticity”)
- Component: The learn and gating modules.
- Role: This layer implements the model’s ability to adapt. After a reasoning task, an internal or external evaluation can trigger a learning event. The learn_edges function applies Hebbian-style updates to the Context Overlay. A ValidationGate then determines if these temporary changes have improved the model’s overall performance on a set of evaluation tasks before they are consolidated into a more permanent, versioned overlay.

Summary of a “Thought”

A GDS “thought” process can be summarized as:

A query initiates a search for a low-cost path between two concepts in the Proximity Graph.
The Reasoner explores the graph, calculating the cost of each potential step using the multi-faceted cost function, which reads from both the static Semantic Base and the dynamic Context Overlay.
The resulting lowest-cost path is returned as the “thought” or solution.
Based on the outcome, the Learning Loop can be triggered to update the Context Overlay, reinforcing or penalizing edges, thus altering the geometry of the space for the next, similar thought.

Learning Paradigm

The GDS learning paradigm is fundamentally different from the backpropagation and gradient descent methods that power traditional Large Language Models. It is a form of autonomous, Hebbian-style learning that modifies the geometry of the conceptual space in response to experience.

Core Principles

No Backpropagation: The model does not compute gradients across a massive neural network. Learning is a local, lightweight process.
Learning by Modifying Costs: Instead of adjusting neuron weights, GDS learns by adjusting the “cost” of traversing specific edges in the semantic graph. This is done by writing small delta values to the dynamic Context Overlay.
Reinforcement and Penalization: Paths that lead to successful or “coherent” outcomes are reinforced (their edges receive a negative delta, making them cheaper and more attractive to the Reasoner). Paths that are evaluated as poor alternatives are penalized (their edges receive a positive delta, making them more expensive).
Internal Evaluation: The model does not strictly require external, supervised labels to learn. As demonstrated in our simulation, it can employ internal heuristics (such as a “coherence score” based on concept mass) to decide which paths are “better” and thus worthy of reinforcement.
Stability and Explainability: Because learning only affects the overlay, the foundational knowledge graph remains stable. The changes are auditable (one can inspect the deltas in the overlay) and their effect is directly observable in the Reasoner’s behavior and cost calculations.

Case Study: The Simulation

Our simulation provided a perfect, concrete example of this paradigm in action:

Initial State: The Reasoner initially chose the cheapest, most obvious path: king -> power.
Internal Evaluation: An internal metric, the “coherence score” (sum of concept masses), evaluated the alternative path king -> crown -> power as being semantically richer, despite its higher initial cost.
Autonomous Learning: This internal evaluation triggered a learning event. The learn_edges function was called to apply a strong negative delta (reinforcement) to the king -> crown and crown -> power edges, and a positive delta (penalty) to the king -> power edge.
Behavioral Change: When the query was run again, the Reasoner, factoring in the new deltas from the Overlay, found that the path through crown was now the new cheapest path.

This demonstrates a complete, autonomous cycle: Reason → Evaluate → Self-Reinforce → Reason Differently. The system adapts its reasoning based on internal evaluation principles, a process resembling neuroplasticity more than traditional supervised learning.

Data & Lexicon Construction (CSI-HDC)

The foundation of the GDS system is the Semantic Base, a large-scale lexicon of “semantic particles” generated by the CSI-HDC (Conceptual State Injector using Hyperdimensional Computing) pipeline. This pipeline processes and synthesizes information from multiple data sources to generate concept representations with physics-inspired properties.

Primary Data Sources

Data Source	Location in Project	Role & Contribution
ConceptNet	`data/raw/assertions.csv`	Provides the primary structural graph of common-sense relationships (e.g., `UsedFor`, `CapableOf`, `PartOf`). It forms the backbone of explicit knowledge.
Numberbatch	`data/raw/numberbatch.txt`	A set of pre-trained 300-dimensional word embeddings. It is the primary source for generating the 20,000-dimensional HDC vectors and serves as a fallback for calculating affective scores.
NRC-VAD Lexicon	`data/raw/nrc_vad/`	Provides affective scores for English words across three dimensions: Valence (pleasure/displeasure), Arousal (intensity), and Dominance (control). This is the source for the `spin` property of English particles.
German Norms	`data/raw/german_norms/`	The German equivalent of the NRC-VAD lexicon, providing affective scores for German words.
OEWM Lexicons	`data/oewm_lexicons/`	Open English, German, and Romanian WordNet data. This is a crucial source for normalization, synonymy (aliases), and word frequency priors. It significantly boosts the quality of mass calculation and the coverage of other lookups.
BabelNet Cache	`data/enrichment/babelnet_cache.db`	A local SQLite database that caches results from the BabelNet API. This is used in a daily enrichment loop to add new, high-quality multilingual relations to the graph, expanding its knowledge base over time.

The Generation Pipeline (`LexiconBuilder`)

The process is orchestrated by the LexiconBuilder in the Rust codebase and follows several key stages:

Aggregation: Raw assertions from ConceptNet are streamed and aggregated into a per-concept map, building a preliminary list of relations.
Normalization & Enrichment: Lemmas are normalized using OEWM. This step also discovers aliases (synonyms) that will be used in later stages.
Quality Scoring: Each potential concept is scored based on a set of heuristics: its connectivity in the graph, whether it has a Numberbatch embedding, and its coverage in affective lexicons.
Filtering: Concepts that do not meet a minimum quality threshold (e.g., min_relations) are discarded.
Property Calculation: For each high-quality concept:
- Mass (m0) is calculated based on its graph connectivity, boosted by its frequency from OEWM.
- Spin (s) is calculated from the affective lexicons (NRC-VAD, German Norms).
- Charge (q) is generated by passing its 300D Numberbatch embedding to the Julia HDC server, which expands it into a 20,000-bit binary hypervector.
Export: The final collection of SemanticParticle objects is written to a compressed Parquet file, which becomes the Semantic Base for the GDS runtime.

Evaluation, Ethics, and API

Evaluation

Evaluation of the GDS model is two-fold, targeting both the quantitative performance of the system and the qualitative relevance of its reasoning.

System Performance: As detailed in the OPTIMIZATION_ROADMAP.md, the lexicon construction pipeline is evaluated on metrics such as storage efficiency (bytes/particle), compression ratio, and throughput (particles/sec). The live system is evaluated on query latency and memory footprint.
Semantic Quality: The quality of the model’s reasoning is evaluated through controlled tests. The simulation we performed is a prime example of a qualitative evaluation, designed to verify that the model’s behavior aligns with its core theoretical principles. Formal evaluation suites are planned to measure performance on tasks like:
- Cross-lingual Retrieval: Testing if dog (en) is correctly identified as being close to Hund (de).
- Guided Analogy: Testing the quality of typed compositions (e.g., king - man + woman = queen).
- Path Coherence: Measuring the semantic consistency of paths found by the Reasoner.

Model API

The GDS runtime exposes a clear, function-oriented API for interaction. The primary methods are:

search_semantic(query, k): Performs a k-NN search in the HDC space to find the k concepts most similar to a query.
compose(vector_a, vector_b, operation): Creates a new concept by composing two existing vectors using typed HDC operations.
reason(start_concept, goal_concept, constraints): The core function. It invokes the Reasoner to find the lowest-cost path between two concepts, returning the full PathExplain object.
learn(path, negatives): Triggers a learning event. It takes a path to be reinforced and a set of negative edges to be penalized, which then updates the Context Overlay.