How to Build a Scalable SaaS in 2023
When I launched my first SaaS product back in 2018, I made almost every architectural mistake possible. We chose the wrong database, overengineered our infrastructure, and underestimated the complexity of multi-tenancy. The result? Countless late nights debugging production issues and a codebase that became increasingly difficult to evolve.
Fast forward to today, and I've helped build and scale several successful SaaS platforms. The landscape has changed dramatically in just a few years—new tools, platforms, and patterns have emerged that make building scalable SaaS applications more accessible than ever.
In this post, I'll share the hard-won lessons and current best practices for building a SaaS that can scale from your first customer to your first thousand, without requiring a complete rewrite along the way.
Fundamentals: Multi-Tenancy Approaches
The core architectural challenge in any SaaS is multi-tenancy—how your system handles data and resources for multiple customers (tenants). There are three primary approaches, each with specific trade-offs:
1. Silo Model (Tenant Per Database)
Each tenant gets their own database instance. This approach:
- Provides strong tenant isolation
- Simplifies compliance requirements (e.g., data residency)
- Makes tenant data backup/restore straightforward
- Allows for tenant-specific customizations
But comes with significant drawbacks:
- Higher infrastructure costs
- Operational complexity (managing many databases)
- Challenging cross-tenant analytics
- Difficult schema migrations across all tenant databases
This is how we structured our first SaaS, and while it worked initially, it became a maintenance nightmare once we hit about 50 customers.
// Example connection function in a silo model (Node.js)
async function getConnectionForTenant(tenantId) {
const tenantConfig = await getTenantConfig(tenantId);
return createConnection({
host: tenantConfig.dbHost,
database: `tenant_${tenantId}`,
user: tenantConfig.dbUser,
password: tenantConfig.dbPassword,
// other connection parameters
});
}
// Usage in an API route
app.get('/api/data', async (req, res) => {
const tenantId = getTenantIdFromRequest(req);
const db = await getConnectionForTenant(tenantId);
try {
const data = await db.query('SELECT * FROM some_table');
res.json(data);
} finally {
await db.release(); // Important: return connection to pool
}
});
2. Bridge Model (Shared Database, Separate Schemas)
All tenants share a database instance, but each tenant has its own schema. This approach:
- Reduces infrastructure costs compared to the silo model
- Provides good tenant isolation
- Simplifies some operational aspects
- Works well with databases that have strong schema support (PostgreSQL)
However:
- Still has operational overhead for schema management
- May hit database connection limits with many tenants
- Complicates database-level performance optimization
In a recent project, we used this approach with PostgreSQL's schema feature to achieve a good balance between isolation and manageability.
-- Example PostgreSQL setup for the bridge model
-- Create schema for a new tenant
CREATE SCHEMA tenant_123;
-- Create tables in tenant schema
CREATE TABLE tenant_123.users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Set search path to use tenant schema
SET search_path TO tenant_123;
-- Now queries will use the tenant's schema
SELECT * FROM users;
3. Pool Model (Shared Database, Shared Schema)
All tenants share both the database instance and the schema, with a tenant identifier column differentiating the data. This approach:
- Offers the lowest infrastructure costs
- Simplifies database operations and schema management
- Enables easy cross-tenant analytics
- Reduces operational complexity
But requires careful design to avoid:
- Data leakage between tenants
- "Noisy neighbor" performance problems
- More complex backup/restore for individual tenants
This is the model we've used most successfully for SaaS applications that need to scale to thousands of tenants while keeping costs manageable.
// Example TypeScript code for a pool model using Prisma ORM
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
// Middleware to inject tenant context into all queries
prisma.$use(async (params, next) => {
// Skip for operations that don't need tenant filtering
if (params.action === 'findUnique' || params.model === 'Tenant') {
return next(params);
}
// Get tenant ID from context (e.g., from request)
const tenantId = getTenantIdFromContext();
// Add tenant filter to query
if (params.action === 'findMany' || params.action === 'findFirst') {
if (!params.args) params.args = {};
if (!params.args.where) params.args.where = {};
params.args.where.tenantId = tenantId;
}
// Add tenant ID to create operations
if (params.action === 'create' || params.action === 'createMany') {
if (!params.args) params.args = {};
if (!params.args.data) params.args.data = {};
params.args.data.tenantId = tenantId;
}
return next(params);
});
// Usage example
async function getUserData(userId: string) {
// The middleware automatically adds tenantId filter
return prisma.user.findUnique({
where: { id: userId },
include: { profile: true }
});
}
Hybrid Approaches
In practice, we often implement a hybrid approach. For instance, in our current SaaS platform:
- Core tenant data uses the pool model for cost efficiency
- Large binary data (files, logs) uses a silo approach for isolation
- Certain tenant-specific configurations use the bridge model
The key is to choose the right model for different parts of your system based on their specific requirements.
Data Storage: Beyond Traditional Databases
Your choice of data storage technology has profound implications for scalability. Here's what's working well in 2023:
Primary Database Considerations
For your primary transactional database, these approaches have proven effective:
- PostgreSQL with Proper Indexing
PostgreSQL has become the gold standard for SaaS applications due to its reliability, feature set, and support for advanced multi-tenancy patterns. Key practices:
- Tenant-specific indexes for query optimization
- Partitioning by tenant ID for large tables
- Connection pooling with pgBouncer to manage connection limits
-- Example of tenant partitioning in PostgreSQL
CREATE TABLE events (
id SERIAL,
tenant_id INTEGER NOT NULL,
event_type VARCHAR(50) NOT NULL,
data JSONB,
created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY LIST (tenant_id);
-- Create partition for a specific tenant
CREATE TABLE events_tenant_123 PARTITION OF events
FOR VALUES IN (123);
-- Create partition for another tenant
CREATE TABLE events_tenant_456 PARTITION OF events
FOR VALUES IN (456);
-- Create an index that will be used by all partitions
CREATE INDEX ON events (created_at);
-- Create a tenant-specific index for a tenant with special needs
CREATE INDEX ON events_tenant_456 (event_type, created_at);
- Serverless Databases
The emergence of truly serverless databases (like Amazon Aurora Serverless v2, PlanetScale, and Neon) has been a game-changer for early-stage SaaS, offering:
- Automatic scaling (both up and down)
- Pay-per-use pricing that aligns with SaaS revenue models
- Reduced operational overhead
- Database-as-a-Service with Managed Scaling
For teams who prefer more control, services like Google Cloud SQL, Azure Database, and managed MongoDB provide a middle ground:
- Automated backups and maintenance
- Simplified replication setup
- Vertical and horizontal scaling options
Storage Specialization
Modern SaaS architectures typically employ multiple storage technologies optimized for specific workloads:
- OLTP: PostgreSQL/MySQL for transactional data
- Search: Elasticsearch or Algolia for search functionality
- Analytics: Snowflake, BigQuery, or Clickhouse for reporting and analytics
- Caching: Redis or Memcached for performance enhancement
- Object Storage: S3-compatible storage for files and media
The key insight from our experience: start with a solid relational database for your core business data, then add specialized storage systems only when you have a clear need.
Backend Architecture: Striking the Right Balance
Service Boundaries: Domain-Driven Approach
One of our biggest mistakes in past projects was defining service boundaries too arbitrarily. Now, we follow these principles:
-
Start with a "Majestic Monolith":
- Simpler development experience
- Lower operational overhead
- Easier to refactor as domain boundaries become clearer
-
Extract Services Along Domain Boundaries:
- When a subdomain becomes complex enough to warrant its own service
- When different scaling characteristics emerge
- When different teams need to own different parts of the system
-
Shared Infrastructure, Independent Deployments:
- Use a unified CI/CD pipeline
- Deploy services independently
- Share monitoring and observability tooling
Here's a typical evolution we've seen work well:
Phase 1: Majestic Monolith
- All functionality in a single codebase
- Clear internal module boundaries
- Shared database
Phase 2: Extract Key Services
- Authentication Service (often replaced with a third-party solution)
- File/Media Processing Service
- Notifications Service
- Analytics Service
- Main Application (still a monolith for core business logic)
Phase 3: Domain-Driven Decomposition
- Break down the main application along business domain lines
- Separate databases where necessary
- Shared authentication and cross-cutting concerns
API Design: Backend for Frontend (BFF) Pattern
The Backend for Frontend pattern has proven invaluable for our SaaS applications:
-
Dedicated API Layers for different frontend clients:
- Web application BFF
- Mobile application BFF
- Public API for integrations
-
Benefits:
- Optimized responses for each client type
- Simplified frontend development
- Better performance through focused queries
Here's a simplified implementation using a Node.js/Express architecture:
// api-server.js - Main API server
const express = require('express');
const app = express();
// Core API routes used by all BFFs
app.use('/api/core', coreApiRouter);
// Web BFF - Optimized for web client
app.use('/api/web', webBffRouter);
// Mobile BFF - Optimized for mobile clients
app.use('/api/mobile', mobileBffRouter);
// Public API - For third-party integrations
app.use('/api/public', publicApiRouter);
// Example of a BFF router
// web-bff-router.js
const webBffRouter = express.Router();
webBffRouter.get('/dashboard', async (req, res) => {
// Get tenant and user from auth middleware
const { tenantId, userId } = req.auth;
// Aggregate data from multiple services in a way optimized for web UI
const [user, stats, notifications, recentActivity] = await Promise.all([
userService.getUser(userId),
statsService.getDashboardStats(tenantId),
notificationService.getUnreadCount(userId),
activityService.getRecentForUser(userId, { limit: 5 })
]);
// Return a single, optimized response
res.json({
user: {
name: user.name,
avatarUrl: user.avatarUrl,
role: user.role
},
stats,
unreadNotifications: notifications.count,
recentActivity
});
});
module.exports = webBffRouter;
Authentication and Authorization
Authentication architecture deserves special attention in SaaS applications due to its central role and security implications.
-
Authentication Providers:
- For most SaaS, an identity provider like Auth0, Okta, or Clerk is worth the investment
- Implement OAuth 2.0 and OpenID Connect standards
- Support SSO options for enterprise customers early
-
Multi-level Authorization:
- Tenant-level permissions (what tenants can access)
- Role-based access within tenants (what users can do)
- Resource-level permissions (what specific data users can see)
Here's a simplified authorization system we've implemented successfully:
// Permission checking middleware using Casl library
import { AbilityBuilder, Ability } from '@casl/ability';
// Define abilities for a user
function defineAbilitiesFor(user, tenantRole) {
const { can, cannot, build } = new AbilityBuilder(Ability);
// Basic permissions for all authenticated users
can('read', 'Profile', { userId: user.id });
// Tenant-specific permissions based on role
if (tenantRole === 'admin') {
can('manage', 'all', { tenantId: user.tenantId });
} else if (tenantRole === 'editor') {
can('read', 'all', { tenantId: user.tenantId });
can('create', ['Post', 'Comment'], { tenantId: user.tenantId });
can('update', ['Post', 'Comment'], { authorId: user.id, tenantId: user.tenantId });
} else {
// Basic user role
can('read', ['Post', 'Comment'], { tenantId: user.tenantId });
can('create', 'Comment', { tenantId: user.tenantId });
}
// Specific restrictions
cannot('delete', 'Account');
return build();
}
// Express middleware to check permissions
function checkPermission(action, resource) {
return (req, res, next) => {
const ability = defineAbilitiesFor(req.user, req.tenantRole);
// Check if the user can perform the action on the resource
if (ability.can(action, resource)) {
next();
} else {
res.status(403).json({ error: 'Forbidden' });
}
};
}
// Usage in routes
app.get('/api/posts/:id',
authenticate,
loadTenantRole,
checkPermission('read', 'Post'),
async (req, res) => {
const post = await postsService.getPost(req.params.id);
// Additional object-level permission check
const ability = defineAbilitiesFor(req.user, req.tenantRole);
if (!ability.can('read', post)) {
return res.status(403).json({ error: 'Forbidden' });
}
res.json(post);
}
);
Frontend Architecture: Optimizing for Development Velocity
The frontend architecture for SaaS applications must balance performance, maintainability, and development velocity.
UI Framework Selection
In 2023, these approaches have proven successful for SaaS applications:
-
React with Next.js:
- Server-side rendering for improved initial load
- API routes for BFF implementation
- Built-in performance optimizations
-
Component Library Strategy:
- Start with a comprehensive UI library like MUI, Chakra UI, or Tailwind UI
- Customize to match your brand
- Create a shared component library early
-
State Management Evolution:
- Start with React Context + hooks for simple state
- Add React Query for server state
- Adopt a more structured solution like Redux only when needed
Performance Optimization Techniques
Our most successful SaaS applications implement these practices:
-
Route-based Code Splitting:
- Load only the code needed for the current view
- Preload code for likely next actions
-
Edge Caching:
- Use a CDN for static assets
- Implement stale-while-revalidate caching for API responses
-
Progressive Enhancement:
- Deliver core functionality quickly
- Load enhanced features progressively
Here's a Next.js component implementing some of these practices:
// Dashboard.tsx
import { Suspense, lazy } from 'react';
import { useQuery } from 'react-query';
import { Layout, LoadingSpinner } from '@/components/ui';
// Lazy-loaded components
const DashboardStats = lazy(() => import('@/components/DashboardStats'));
const ActivityFeed = lazy(() => import('@/components/ActivityFeed'));
const RecommendationPanel = lazy(() => import('@/components/RecommendationPanel'));
export default function Dashboard() {
// Fetch critical data immediately
const { data: dashboardData, isLoading } = useQuery(
'dashboardData',
() => fetch('/api/web/dashboard').then(res => res.json()),
{
staleTime: 5 * 60 * 1000, // 5 minutes
cacheTime: 10 * 60 * 1000, // 10 minutes
}
);
// Prefetch data for likely next actions
useEffect(() => {
const prefetchQueries = async () => {
await queryClient.prefetchQuery(
'userSettings',
() => fetch('/api/web/user/settings').then(res => res.json())
);
};
// Wait until after initial render and critical data is loaded
if (dashboardData && !isLoading) {
prefetchQueries();
}
}, [dashboardData, isLoading, queryClient]);
if (isLoading) {
return <LoadingSpinner />;
}
return (
<Layout>
{/* Critical UI shown immediately */}
<h1>Welcome, {dashboardData.user.name}</h1>
{/* Core dashboard components loaded with Suspense */}
<Suspense fallback={<LoadingSpinner />}>
<DashboardStats stats={dashboardData.stats} />
</Suspense>
<div className="dashboard-grid">
{/* Important but not critical components */}
<Suspense fallback={<LoadingSpinner />}>
<ActivityFeed initialData={dashboardData.recentActivity} />
</Suspense>
{/* Enhancement components loaded last */}
<Suspense fallback={null}>
<RecommendationPanel userId={dashboardData.user.id} />
</Suspense>
</div>
</Layout>
);
}
Infrastructure: The DevOps Decision
Infrastructure choices can make or break a SaaS startup. I've seen teams both succeed and fail with different approaches.
Kubernetes vs. Managed Services
The "build vs. buy" decision for infrastructure is critical:
Kubernetes Route:
- Pros: Ultimate flexibility, vendor portability, cost-effective at scale
- Cons: Significant operational overhead, steep learning curve
- Best for: Teams with DevOps expertise and complex, stateful workloads
Managed Services Route:
- Pros: Reduced operational burden, faster time to market
- Cons: Higher costs at scale, potential vendor lock-in
- Best for: Small teams focused on product development, early-stage startups
After several projects, my recommendation: Start with managed services until you have clear, validated reasons to move to Kubernetes.
Infrastructure as Code (IaC)
Regardless of your infrastructure choice, infrastructure as code is non-negotiable for a scalable SaaS:
// Example Pulumi code for a typical SaaS infrastructure
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as awsx from "@pulumi/awsx";
// Create VPC with public and private subnets
const vpc = new awsx.ec2.Vpc("saas-vpc", {
cidrBlock: "10.0.0.0/16",
numberOfAvailabilityZones: 2,
});
// Create RDS PostgreSQL instance in private subnet
const dbSubnetGroup = new aws.rds.SubnetGroup("db-subnet-group", {
subnetIds: vpc.privateSubnetIds,
});
const db = new aws.rds.Instance("saas-db", {
engine: "postgres",
instanceClass: "db.t3.medium",
allocatedStorage: 20,
dbSubnetGroupName: dbSubnetGroup.name,
vpcSecurityGroupIds: [dbSecurityGroup.id],
name: "saasapp",
username: "postgres",
password: dbPassword,
skipFinalSnapshot: true,
});
// Create ECS Fargate cluster for application
const cluster = new aws.ecs.Cluster("saas-cluster");
const loadBalancer = new awsx.lb.ApplicationLoadBalancer("saas-lb", {
vpc,
external: true,
});
const appService = new awsx.ecs.FargateService("app-service", {
cluster: cluster.arn,
taskDefinitionArgs: {
container: {
image: "my-saas-app:latest",
cpu: 512,
memory: 1024,
essential: true,
portMappings: [{
containerPort: 3000,
targetGroup: loadBalancer.defaultTargetGroup,
}],
environment: [
{ name: "DATABASE_URL", value: db.endpoint.apply(ep => `postgres://postgres:${dbPassword}@${ep}/saasapp`) },
{ name: "NODE_ENV", value: "production" },
],
},
},
desiredCount: 2,
vpc,
});
// Export endpoints
export const dbEndpoint = db.endpoint;
export const appEndpoint = loadBalancer.loadBalancer.dnsName;
Multi-Region Deployment
As your SaaS scales, especially with international customers, multi-region deployment becomes important:
-
Start with a Single Region:
- Focus on robustness within one region first
- Use multiple availability zones for high availability
-
Expand to a Second Region:
- Start with static assets and read replicas
- Gradually move to active-passive architecture
-
Consider Active-Active for Global Scale:
- Requires significant investment in data synchronization
- Only needed for truly global services with strict latency requirements
Pricing and Billing: Technical Considerations
Billing infrastructure is often overlooked in technical architecture discussions, but it's critical for SaaS success.
Usage Metering Architecture
The ability to accurately track usage metrics is essential:
// Example of a usage metering service
import { PrismaClient } from '@prisma/client';
import { Redis } from 'ioredis';
const prisma = new PrismaClient();
const redis = new Redis(process.env.REDIS_URL);
// Record a usage event
async function recordUsage(tenantId, metricName, quantity = 1) {
// First, increment in Redis for real-time access
await redis.hincrby(`usage:${tenantId}:${getCurrentMonth()}`, metricName, quantity);
// Then, asynchronously update in database for durability
await prisma.usageRecord.create({
data: {
tenantId,
metricName,
quantity,
timestamp: new Date(),
},
});
}
// Get current usage for a tenant
async function getCurrentUsage(tenantId, metricName) {
const currentMonth = getCurrentMonth();
// Try to get from Redis first
const cachedUsage = await redis.hget(`usage:${tenantId}:${currentMonth}`, metricName);
if (cachedUsage !== null) {
return parseInt(cachedUsage, 10);
}
// Fall back to database if not in cache
const result = await prisma.usageRecord.aggregate({
where: {
tenantId,
metricName,
timestamp: {
gte: new Date(getCurrentMonth() + '-01'),
lt: new Date(getNextMonth() + '-01'),
},
},
_sum: {
quantity: true,
},
});
const usage = result._sum.quantity || 0;
// Update cache for future requests
await redis.hset(`usage:${tenantId}:${currentMonth}`, metricName, usage);
return usage;
}
// Helper to get current month in YYYY-MM format
function getCurrentMonth() {
const now = new Date();
return `${now.getFullYear()}-${String(now.getMonth() + 1).padStart(2, '0')}`;
}
function getNextMonth() {
const now = new Date();
if (now.getMonth() === 11) {
return `${now.getFullYear() + 1}-01`;
}
return `${now.getFullYear()}-${String(now.getMonth() + 2).padStart(2, '0')}`;
}
export { recordUsage, getCurrentUsage };
Billing Integration Best Practices
For most SaaS applications, using a specialized billing provider like Stripe, Chargebee, or Paddle is worth the investment:
-
Maintain Pricing Information in One Place:
- Store pricing logic in the billing provider
- Keep minimal pricing information in your application
-
Implement Subscription Synchronization:
- Webhook-driven updates of subscription status
- Background jobs for subscription changes
-
Plan for Billing-Related Support Issues:
- Build proper tooling for customer support
- Implement detailed logging of billing events
Operational Excellence: Observability and Scalability
Finally, operational excellence is essential for a well-functioning SaaS.
Comprehensive Observability Stack
A proper observability approach includes:
-
Structured Logging:
- Tenant-aware logging
- Request correlation IDs
- Context-rich log events
-
APM (Application Performance Monitoring):
- End-to-end transaction tracing
- Service-level metrics
- Database query performance
-
Custom Business Metrics:
- Tenant-specific usage patterns
- Feature adoption rates
- Conversion and retention metrics
Automated Scaling Policies
Design your system to scale automatically:
# Example AWS CloudFormation for Auto Scaling policies
Resources:
AppAutoScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MaxCapacity: 10
MinCapacity: 2
ResourceId: !Sub service/${ECSCluster}/${ECSService}
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
RoleARN: !GetAtt AutoScalingRole.Arn
ScaleUpPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: ScaleUpPolicy
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref AppAutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ECSServiceAverageCPUUtilization
TargetValue: 70.0
ScaleInCooldown: 180
ScaleOutCooldown: 60
ScaleOnRequestCount:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: ScaleOnRequestCount
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref AppAutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ALBRequestCountPerTarget
ResourceLabel: !Sub ${LoadBalancer.LoadBalancerFullName}/${TargetGroup.TargetGroupFullName}
TargetValue: 1000.0
ScaleInCooldown: 300
ScaleOutCooldown: 60
Conclusion: Focus on Fundamentals First
Building a scalable SaaS in 2023 is both easier and harder than ever. We have better tools, platforms, and patterns at our disposal, but also higher customer expectations and more complex requirements.
My final advice:
-
Start Simple, Evolve Deliberately:
- Begin with proven, boring technology
- Add complexity only when needed
- Continuously refactor as you learn
-
Focus on the Customer Experience:
- Performance and reliability matter more than clever architecture
- Build features that drive customer value
- Design for user workflow first, technical elegance second
-
Optimize for Developer Experience:
- Make local development simple
- Invest in CI/CD early
- Prioritize observability and debugging tools
By following these principles, you can build a SaaS platform that not only scales technically but also evolves with your business needs.
What SaaS architecture challenges are you facing? Let me know in the comments, and I'll do my best to help!
Drawing from my experience building multiple SaaS platforms, including a healthcare analytics platform with 50,000+ users and an e-commerce operations platform processing millions in GMV daily.