GDS Technical Architecture

GDS Technical Architecture Overview

This document provides a technical overview of the Geometrodynamic Semantics (GDS) research prototype architecture as implemented.

System Design Philosophy

GDS models semantic reasoning as a physics-inspired process operating over a hyperdimensional concept space. Rather than treating language as statistical patterns, the system represents meaning through:

  • Semantic particles: Concepts encoded as 20,000-dimensional binary HDC vectors
  • Geometric relationships: Knowledge graph edges with physics-inspired properties (mass, valence, affective dimensions)
  • Geodesic reasoning: Path-finding through semantic space using composite cost functions

Core Architecture Layers

1. Semantic Base Layer (Static)

The foundational lexicon built from multiple knowledge sources:

  • Storage Format: Parquet with ZSTD compression (~7.5GB for 340k concepts)
  • Conceptual State Injector (CSI-HDC): Replaces traditional tokenization with 20,000-dimensional binary hypervectors
  • Vector Expansion: Julia HDC server generates concept hypervectors from 300D embeddings
  • Graph Structure: Curated assertions (ConceptNet) + proximity edges (k-NN via FAISS binary indexing)

Key Properties Per Concept:

  • particle_id, concept_id, lemma, language
  • m0 (semantic mass/centrality)
  • VAD (Valence-Arousal-Dominance) affective dimensions
  • 20,000-bit binary HDC vector
  • Graph connectivity (edges with relation types)

2. Dynamic Context Overlay (Runtime)

LMDB-backed delta store for runtime learning:

  • Edge weight adjustments during reasoning sessions
  • Degree-normalized updates to maintain graph stability
  • Checkpointing for session recovery
  • Telemetry monitoring (drift detection, margin analysis)

3. Reasoning Engine

Path-based semantic reasoning through the concept graph:

  • Geodesic solver: Multi-component cost function combining:
    • α·inv_m0: Semantic mass preference (favor central concepts)
    • β·vad: Affective dimension matching
    • γ·rel: Relation type appropriateness
    • λ·overlay: Runtime learned preferences
  • Explainability: JSON traces (gds.explain.path@v1) documenting:
    • Traversed nodes and edges
    • Cost breakdown per component
    • Reasoning margin (confidence)

4. Learning Subsystem

Physics-inspired autonomous learning:

  • Local Hebbian updates: Edge strengthening based on usage patterns
  • Validation gates: Telemetry-based checks before consolidating learned changes
  • Diffusion smoothing: L2 decay prevents overfitting to specific paths
  • Margin-gain optimization: Focus learning on high-uncertainty decisions

Technology Stack

Component Technology Purpose
Core Logic Rust Reasoning engine, graph operations, I/O
Vector Expansion Julia HDC hypervector generation from embeddings
Storage Parquet + ZSTD Compressed columnar lexicon storage
Overlay LMDB Runtime delta store with checkpointing
Indexing FAISS (binary) k-NN similarity search for graph construction
FFI JSON over stdin/stdout Rust ↔︎ Julia communication

Data Sources

The research prototype integrates multiple knowledge bases:

  • ConceptNet 5: ~2.1M curated semantic assertions (140k concepts after filtering)
  • ConceptNet Numberbatch: 300D multilingual embeddings
  • NRC-VAD Lexicon: Affective dimensions for emotional reasoning
  • OEWN (Open English WordNet): Lexical taxonomy and relationships

Implementation Highlights

Storage Optimization Journey

The project achieved ~800x storage reduction through progressive optimization:

  1. Initial projection: 6TB for full lexicon (300D floats + metadata)
  2. Binary HDC encoding: Reduced to ~68GB (20k-bit vectors)
  3. Parquet columnar storage: Final ~7.5GB (with ZSTD compression)

CSI-HDC Tokenization

Conceptual State Injector using Hyperdimensional Computing (CSI-HDC) replaces traditional neural tokenizers:

  • Input: Concept identifier (e.g., "en/cat", "ro/calculator")
  • Process: Julia HDC server expands 300D embedding → 20,000D binary vector
  • Output: Semantic particle ready for graph-based reasoning
  • Advantage: Interpretable, multilingual, compositional

Cross-Language Capabilities

The system demonstrates multilingual semantic reasoning:

  • English: Primary language (140k concepts)
  • Romanian: Secondary language (~200k concepts)
  • Shared HDC space: Cross-lingual concept alignment via Numberbatch embeddings
  • Relation preservation: Language-agnostic graph structure

Current Research Status

GDS is a proof-of-concept prototype demonstrating:

Operational CSI-HDC tokenization (20k-dimensional binary vectors)
Functional graph-based reasoning with explainable paths
Autonomous learning gates with validation
Efficient storage (Parquet + LMDB architecture)
Multilingual capabilities (EN/RO)

This architecture represents an exploration of physics-inspired semantic reasoning, not a production system. All design decisions prioritize research interpretability and experimental flexibility.

Further Reading


This architecture serves the Independent Research & Development Genesis initiative exploring novel AI paradigms. For questions or collaboration: mihai.mateescu@web.de

"