GDS Technical Architecture

GDS Technical Architecture Overview

This document provides a technical overview of the Geometrodynamic Semantics (GDS) research prototype architecture as implemented.

System Design Philosophy

GDS models semantic reasoning as a physics-inspired process operating over a hyperdimensional concept space. Rather than treating language as statistical patterns, the system represents meaning through:

Semantic particles: Concepts encoded as 20,000-dimensional binary HDC vectors
Geometric relationships: Knowledge graph edges with physics-inspired properties (mass, valence, affective dimensions)
Geodesic reasoning: Path-finding through semantic space using composite cost functions

Core Architecture Layers

1. Semantic Base Layer (Static)

The foundational lexicon built from multiple knowledge sources:

Storage Format: Parquet with ZSTD compression (~7.5GB for 340k concepts)
Conceptual State Injector (CSI-HDC): Replaces traditional tokenization with 20,000-dimensional binary hypervectors
Vector Expansion: Julia HDC server generates concept hypervectors from 300D embeddings
Graph Structure: Curated assertions (ConceptNet) + proximity edges (k-NN via FAISS binary indexing)

Key Properties Per Concept:

particle_id, concept_id, lemma, language
m0 (semantic mass/centrality)
VAD (Valence-Arousal-Dominance) affective dimensions
20,000-bit binary HDC vector
Graph connectivity (edges with relation types)

2. Dynamic Context Overlay (Runtime)

LMDB-backed delta store for runtime learning:

Edge weight adjustments during reasoning sessions
Degree-normalized updates to maintain graph stability
Checkpointing for session recovery
Telemetry monitoring (drift detection, margin analysis)

3. Reasoning Engine

Path-based semantic reasoning through the concept graph:

Geodesic solver: Multi-component cost function combining:
- α·inv_m0: Semantic mass preference (favor central concepts)
- β·vad: Affective dimension matching
- γ·rel: Relation type appropriateness
- λ·overlay: Runtime learned preferences
Explainability: JSON traces (gds.explain.path@v1) documenting:
- Traversed nodes and edges
- Cost breakdown per component
- Reasoning margin (confidence)

4. Learning Subsystem

Physics-inspired autonomous learning:

Local Hebbian updates: Edge strengthening based on usage patterns
Validation gates: Telemetry-based checks before consolidating learned changes
Diffusion smoothing: L2 decay prevents overfitting to specific paths
Margin-gain optimization: Focus learning on high-uncertainty decisions

Technology Stack

Component	Technology	Purpose
Core Logic	Rust	Reasoning engine, graph operations, I/O
Vector Expansion	Julia	HDC hypervector generation from embeddings
Storage	Parquet + ZSTD	Compressed columnar lexicon storage
Overlay	LMDB	Runtime delta store with checkpointing
Indexing	FAISS (binary)	k-NN similarity search for graph construction
FFI	JSON over stdin/stdout	Rust ↔︎ Julia communication

Data Sources

The research prototype integrates multiple knowledge bases:

ConceptNet 5: ~2.1M curated semantic assertions (140k concepts after filtering)
ConceptNet Numberbatch: 300D multilingual embeddings
NRC-VAD Lexicon: Affective dimensions for emotional reasoning
OEWN (Open English WordNet): Lexical taxonomy and relationships

Implementation Highlights

Storage Optimization Journey

The project achieved ~800x storage reduction through progressive optimization:

Initial projection: 6TB for full lexicon (300D floats + metadata)
Binary HDC encoding: Reduced to ~68GB (20k-bit vectors)
Parquet columnar storage: Final ~7.5GB (with ZSTD compression)

CSI-HDC Tokenization

Conceptual State Injector using Hyperdimensional Computing (CSI-HDC) replaces traditional neural tokenizers:

Input: Concept identifier (e.g., "en/cat", "ro/calculator")
Process: Julia HDC server expands 300D embedding → 20,000D binary vector
Output: Semantic particle ready for graph-based reasoning
Advantage: Interpretable, multilingual, compositional

Cross-Language Capabilities

The system demonstrates multilingual semantic reasoning:

English: Primary language (140k concepts)
Romanian: Secondary language (~200k concepts)
Shared HDC space: Cross-lingual concept alignment via Numberbatch embeddings
Relation preservation: Language-agnostic graph structure

Current Research Status

GDS is a proof-of-concept prototype demonstrating:

✅ Operational CSI-HDC tokenization (20k-dimensional binary vectors)
✅ Functional graph-based reasoning with explainable paths
✅ Autonomous learning gates with validation
✅ Efficient storage (Parquet + LMDB architecture)
✅ Multilingual capabilities (EN/RO)

This architecture represents an exploration of physics-inspired semantic reasoning, not a production system. All design decisions prioritize research interpretability and experimental flexibility.