System Overview¶
Rory is a distributed and privacy-preserving data mining system designed to execute clustering and classification tasks over encrypted datasets. The architecture leverages Post-Quantum Cryptography (PQC) and Homomorphic Encryption (HE) to ensure data confidentiality throughout the entire mining lifecycle.
Architecture¶
The system follows a decentralized orchestration model composed of four primary layers:
- Orchestration Layer (Manager): Acts as the brain of the system, managing worker registration, load balancing, and task distribution.
- Computational Layer (Workers): High-performance nodes that execute the data mining algorithms (SKMeans, DBSKMeans, SKNN) on encrypted data.
- Storage Layer (CSS/Mictlanx): A distributed and asynchronous storage service that handles data fragmentation and persistence.
- Interaction Layer (Client): The entry point for users to upload datasets, configure mining parameters, and retrieve results.
Quick Start¶
Follow these steps to deploy the complete ecosystem (Storage + Rory Nodes) using Docker.
-
Initialize the Storage Layer (Mictlanx)
Navigate to the Mictlanx directory and start the CSS routers:
Troubleshooting (Connection Refused): If you encounter the error
Failed to connect to localhost port 63666, it means the default port is occupied or blocked. Resolve this by starting the service on an alternative port (e.g., 64666): -
Deploy Rory Ecosystem
Return to the project root and launch the Client, Manager, and Worker nodes:
This command will build the images and orchestrate the three main nodes, linking them automatically to the Mictlanx storage network.
Security & Privacy Model¶
Rory integrates cryptographic schemes to maintain privacy:
- Liu Scheme: A conventional homomorphic encryption scheme used for secure distance approximation.
- CKKS: A post-quantum homomorphic encryption scheme that allows performing arithmetic operations directly on encrypted floating-point numbers.
- FDHOPE: An order-preserving encryption (OPE) scheme used to maintain data sorting and comparison capabilities without decryption.
General Workflow¶
The interaction between components follows a structured protocol to ensure efficiency and privacy. The following diagram illustrates the dynamic communication during a mining task:
Interaction Steps:
- Worker Request: The
Clientsends a request to theManagerto assign an availableWorkerfor a specific task. - Worker Assignment: The
Manageridentifies a suitableWorkerbased on load-balancing algorithms and returns theWorker ID/Addressto theClient. - Direct Communication: The
Clientestablishes a direct link with the assignedWorkerto transmit encrypted task parameters or data references. - Iterative Processing: For algorithms requiring multiple steps (like SKMeans), the
Workerprocesses data and returns partial results to theClient. This loop repeats forniterations until convergence or task completion.
Technology Stack¶
- Language: Python 3.11+
- Framework: Flask (Microservices)
- Containerization: Docker & Docker Swarm
- WSGI Server: Gunicorn
- Storage: Mictlanx (Distributed CSS)
- Crypto Libraries: Pyfhel (CKKS implementation)