The Sleeper Server: Running PostgreSQL and pgvector on a Smartphone

Why I compiled a production-grade database on my S20 FE to build a local-first, privacy-focused RAG system

May 10, 2026

The Incident: The “Sleeper Server” in My Pocket

I was investigating the costs of an RDS instance for a personal semantic search project when I looked down at my Samsung S20 FE. As engineers, we’ve become so accustomed to cloud abstraction that we forget a modern smartphone packs an 8-core ARM processor and 6GB of RAM that spend 90% of their time sitting idle.

An architectural question hit me: Is it possible to transform this device into an Edge Computing node capable of handling a relational database with vector capabilities? I’m not talking about a toy or a simple app calling an API; I’m talking about compiling the most robust data engine in the world (PostgreSQL) to live and process vectors locally. The goal of this POC was clear: to prove that Data Sovereignty and Local-First AI can start with the hardware we already own.

The Problem: The “Cloud-First” Fallacy for MVPs

Most current AI tutorials push you toward a standard stack: Pinecone for vectors, LangChain for orchestration, and OpenAI for everything else. This recipe makes sense for production at scale, but for a Proof of Concept (POC), it introduces engineering hurdles we often ignore:

Development Friction: Depending on a stable cloud connection for every single similarity test.
Data Opacity: Sending PII (Personally Identifiable Information) to third parties without a prior sanitization layer.
Unnecessary Abstraction: Using SaaS vector databases when 99% of initial use cases fit perfectly within a relational engine with the right extension.

To break this cycle, I decided the device should be self-sufficient for search and storage, using the cloud only as a “blind” inference engine.

Technical Deep Dive: The Sovereign AI Stack on ARM

Deploying PostgreSQL on Android (via Termux) requires understanding how software interacts with ARM architecture without the comforts of a standard Linux distribution.

1. PostgreSQL + pgvector: Compiling from Source

To make this a serious tool, I needed pgvector. Since I couldn’t find pre-built binaries for mobile environments, the only option was native compilation on the device.

# Environment: Termux (Android)
# Installing build tools
pkg install clang make postgresql

# Compiling the vector extension directly on the smartphone
git clone https://github.com/pgvector/pgvector.git
cd pgvector

# Optimizing for the S20 FE's ARM architecture
make
make install

# Initializing the engine
psql -d local_db -c "CREATE EXTENSION vector;"

2. Hybrid Orchestration: RAG with Local Sanitization

The POC flow was designed so the phone acts as the Privacy Guardian. To achieve this, I implemented an Anonymization & Re-hydration pattern.

Before any data leaves for the Gemini API, the local server processes the query, extracts sensitive information (PII) using a lightweight NER (Named Entity Recognition) model, and replaces it with temporary tokens. These are stored locally and only re-injected into the final response.

// Privacy Gateway implementation on mobile (Node.js)
async function queryPrivateRAG(userQuery: string) {
    // 1. Masking: "John Doe" -> "{{USER_0}}" (Local Process)
    const { cleanQuery, piiMap } = await localSanitizer.mask(userQuery);

    // 2. Local RAG: Vector search in PostgreSQL (Local)
    const vector = await localEmbedder.embed(cleanQuery);
    const context = await db.query(
        'SELECT text FROM docs ORDER BY vec <=> $1 LIMIT 3', 
        [vector]
    );

    // 3. Blind Inference: Gemini only receives anonymized data
    const anonymizedResponse = await gemini.generateResponse(cleanQuery, context);

    // 4. Re-hydration: "{{USER_0}}" -> "John Doe" (Local)
    return localSanitizer.rehydrate(anonymizedResponse, piiMap);
}

Trade-offs

A senior architect knows that “there is no such thing as a free lunch”:

PostgreSQL vs. SQLite: I chose Postgres for its robustness and because the pgvector extension is the current industry standard. I sacrificed some RAM (Postgres is heavier), but gained a production-ready engine.
Sanitization Latency: Adding a local text inspection layer before each API call adds milliseconds. However, it’s a fair price to pay compared to the risk of leaking private data.
Disk I/O: Smartphones use flash storage that doesn’t have the same durability as a server-grade SSD under massive write loads. For a POC, this is acceptable; for production, it is not.

Impact Analysis: The S20 FE as a Compute Node

The results proved that mobile hardware is more than capable of sustaining a modern AI workload if we strip away heavy abstractions.

+-----------------------+-----------------------+--------------------------+
| METRIC                | CLOUD APPROACH (SaaS) | LOCAL-FIRST APPROACH(POC)|
+-----------------------+-----------------------+--------------------------+
| Search Latency        | 150ms - 300ms (Network)| < 40ms (Local)           |
+-----------------------+-----------------------+--------------------------+
| Data Sovereignty      | Delegated to 3rd party| Full control at source   |
+-----------------------+-----------------------+--------------------------+
| Cost per Query        | Input Tokens + DB fees| Generation tokens only   |
+-----------------------+-----------------------+--------------------------+
| Maintainability       | Requires DevOps mgmt  | Single binary/file       |
+-----------------------+-----------------------+--------------------------+

Closing Thoughts: Simplicity as an Act of Rebellion

This POC reminded me that, sometimes, the best architecture is the one that makes the most of the resources already at hand. Technical leadership isn’t always about knowing how to scale up (more instances, more clusters); it’s about knowing how to scale inward.

Compiling PostgreSQL on a smartphone to create a private AI system isn’t just a curious experiment; it’s a validation that we can build resilient, fast, and sovereign systems without depending on the benevolence of cloud giants. In the end, if the code is efficient, the hardware is always enough.

Juan Francisco Herrero

Discussion about this post

Ready for more?