Introduction to Aerospike
Aerospike is a distributed, high-performance NoSQL database optimized for flash storage and in-memory computing. It features a hybrid memory architecture that intelligently manages data between DRAM and SSDs to deliver sub-millisecond latency at scale. Aerospike excels in high-throughput, low-latency use cases like real-time bidding, fraud detection, payment processing, and user profile stores. Its shared-nothing architecture ensures linear scalability while maintaining strong consistency, making it suitable for mission-critical applications that demand reliability and performance at petabyte scale.
Core Aerospike Concepts
Data Model
- Namespace: Highest level container (similar to a database)
- Contains records
- Has its own storage configuration
- Has its own replication factor, memory settings, and persistence rules
- Set: Collection of records (similar to a table)
- Optional organizational unit
- Can have specialized policies
- Record: A collection of bins identified by a unique key
- Each record belongs to a namespace
- Optionally belongs to a set
- Has a configurable Time-To-Live (TTL)
- Bin: Name-value pair within a record (similar to a column)
- Names are limited to 14 characters (older versions) or 15 characters (newer versions)
- Different records in the same set can have different bins
- Key: Unique identifier for a record
- Composed of namespace, set (optional), and primary key value
- Primary key can be a string, integer, or blob
Data Types
Data Type | Description | Size Limit | Example |
---|---|---|---|
Integer | 64-bit signed integer | 8 bytes | 123 |
String | UTF-8 encoded text | 1 MB | "Hello Aerospike" |
Blob/Bytes | Arbitrary binary data | 1 MB | Raw bytes |
Double | IEEE 754 double-precision float | 8 bytes | 3.14159 |
List | Ordered collection of elements | 1 MB | [1, "a", true] |
Map | Collection of key-value pairs | 1 MB | {"name": "John", "age": 30} |
GeoJSON | Location data | 1 MB | {"type": "Point", "coordinates": [103.851959, 1.290270]} |
HyperLogLog | Probabilistic data structure | Implementation specific | Used for cardinality estimation |
Cluster Architecture
- Shared-nothing architecture: Each node is independent and self-sufficient
- Automatic data distribution: Consistent hashing using a partition map
- Replication: Data replication for high availability (configurable factor)
- Smart Client: Client is cluster-aware, connects directly to nodes owning the data
- Cross-datacenter replication (XDR): Asynchronous multi-datacenter replication
Basic Aerospike Commands & Operations
Installation & Setup
Docker Quick Start
docker run -d --name aerospike -p 3000-3002:3000-3002 aerospike/aerospike-server
Basic Configuration (aerospike.conf)
service {
user root
group root
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode multicast
address 239.1.99.222
port 9918
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3002
}
}
namespace test {
replication-factor 2
memory-size 4G
default-ttl 30d
storage-engine memory
}
namespace data {
replication-factor 2
memory-size 4G
default-ttl 30d
storage-engine device {
device /dev/sdb
write-block-size 128K
data-in-memory true
}
}
Basic Client Operations (Using Multiple Language APIs)
Record Operations
Writing Records
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
key = ('test', 'users', 'user1')
bins = {
'name': 'John Doe',
'email': 'john@example.com',
'age': 30,
'scores': [85, 90, 92]
}
client.put(key, bins)
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
Key key = new Key("test", "users", "user1");
Bin nameBin = new Bin("name", "John Doe");
Bin emailBin = new Bin("email", "john@example.com");
Bin ageBin = new Bin("age", 30);
Bin scoresBin = new Bin("scores", Value.get(new int[] {85, 90, 92}));
client.put(null, key, nameBin, emailBin, ageBin, scoresBin);
client.close();
Node.js
const Aerospike = require('aerospike')
const config = {
hosts: '127.0.0.1:3000'
}
Aerospike.connect(config).then(client => {
const key = new Aerospike.Key('test', 'users', 'user1')
const bins = {
name: 'John Doe',
email: 'john@example.com',
age: 30,
scores: [85, 90, 92]
}
return client.put(key, bins)
.then(() => {
client.close()
})
}).catch(error => {
console.error('Error:', error)
})
Reading Records
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
key = ('test', 'users', 'user1')
(key, metadata, bins) = client.get(key)
print(bins)
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
Key key = new Key("test", "users", "user1");
Record record = client.get(null, key);
System.out.println(record.toString());
client.close();
Node.js
const Aerospike = require('aerospike')
const config = {
hosts: '127.0.0.1:3000'
}
Aerospike.connect(config).then(client => {
const key = new Aerospike.Key('test', 'users', 'user1')
return client.get(key)
.then(record => {
console.log(record.bins)
client.close()
})
}).catch(error => {
console.error('Error:', error)
})
Deleting Records
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
key = ('test', 'users', 'user1')
client.remove(key)
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
Key key = new Key("test", "users", "user1");
client.delete(null, key);
client.close();
Node.js
const Aerospike = require('aerospike')
const config = {
hosts: '127.0.0.1:3000'
}
Aerospike.connect(config).then(client => {
const key = new Aerospike.Key('test', 'users', 'user1')
return client.remove(key)
.then(() => {
client.close()
})
}).catch(error => {
console.error('Error:', error)
})
Batch Operations
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
keys = [
('test', 'users', 'user1'),
('test', 'users', 'user2'),
('test', 'users', 'user3')
]
records = client.get_many(keys)
for key, metadata, bins in records:
print(key, bins)
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
Key[] keys = new Key[] {
new Key("test", "users", "user1"),
new Key("test", "users", "user2"),
new Key("test", "users", "user3")
};
Record[] records = client.get(null, keys);
for (Record record : records) {
if (record != null) {
System.out.println(record.toString());
}
}
client.close();
Queries and Scans
Secondary Index Creation
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
# Create secondary index on 'age' bin
client.index_integer_create('test', 'users', 'age', 'users_age_idx')
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
// Create secondary index on 'age' bin
IndexTask task = client.createIndex(null, "test", "users", "users_age_idx", "age", IndexType.NUMERIC);
task.waitTillComplete();
client.close();
Query With Secondary Index
Python
import aerospike
from aerospike import predicates as p
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
query = client.query('test', 'users')
query.select('name', 'age')
query.where(p.between('age', 25, 35))
results = []
def process_record(record):
results.append(record)
query.foreach(process_record)
for key, metadata, bins in results:
print(bins)
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
Statement stmt = new Statement();
stmt.setNamespace("test");
stmt.setSetName("users");
stmt.setBinNames("name", "age");
stmt.setFilter(Filter.range("age", 25, 35));
RecordSet rs = client.query(null, stmt);
try {
while (rs.next()) {
Record record = rs.getRecord();
System.out.println(record.toString());
}
} finally {
rs.close();
}
client.close();
Scan All Records in a Set
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
scan = client.scan('test', 'users')
scan.select('name', 'email')
results = []
def process_record(record):
results.append(record)
scan.foreach(process_record)
for key, metadata, bins in results:
print(bins)
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
ScanPolicy policy = new ScanPolicy();
client.scanAll(policy, "test", "users", (key, record) -> {
System.out.println(key.userKey + ": " + record.toString());
});
client.close();
Advanced Aerospike Features
Bin Operations (Atomic Updates)
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
key = ('test', 'users', 'user1')
# Increment age by 1
client.increment(key, 'age', 1)
# Append to string
client.append(key, 'name', ' Jr.')
# Prepend to string
client.prepend(key, 'title', 'Dr. ')
# Get the updated record
(key, metadata, bins) = client.get(key)
print(bins)
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
Key key = new Key("test", "users", "user1");
// Increment age by 1
client.add(null, key, new Bin("age", 1));
// Append to string
client.append(null, key, new Bin("name", " Jr."));
// Prepend to string
client.prepend(null, key, new Bin("title", "Dr. "));
// Get the updated record
Record record = client.get(null, key);
System.out.println(record.toString());
client.close();
List Operations
Python
import aerospike
from aerospike import list_operations as lop
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
key = ('test', 'users', 'user1')
# Add elements to the list
ops = [
lop.list_append('scores', 95),
lop.list_insert('scores', 1, 88)
]
client.operate(key, ops)
# Get the 2nd element (index 1)
ops = [lop.list_get('scores', 1)]
_, _, bins = client.operate(key, ops)
print("Element at index 1:", bins['scores'])
# Get list size
ops = [lop.list_size('scores')]
_, _, bins = client.operate(key, ops)
print("List size:", bins['scores'])
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
Key key = new Key("test", "users", "user1");
// Add elements to the list
Operation[] ops = {
ListOperation.append("scores", Value.get(95)),
ListOperation.insert("scores", 1, Value.get(88))
};
client.operate(null, key, ops);
// Get the 2nd element (index 1)
Record record = client.operate(null, key,
ListOperation.getByIndex("scores", 1, ListReturnType.VALUE)
);
System.out.println("Element at index 1: " + record.bins.get("scores"));
// Get list size
record = client.operate(null, key,
ListOperation.size("scores")
);
System.out.println("List size: " + record.bins.get("scores"));
client.close();
Map Operations
Python
import aerospike
from aerospike import map_operations as mop
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
key = ('test', 'users', 'user1')
# Initialize a map or update existing
ops = [
mop.map_put('profile', 'address', '123 Main St', aerospike.MAP_CREATE_ONLY_UNORDERED)
]
client.operate(key, ops)
# Add/update multiple items
ops = [
mop.map_put_items('profile', {'city': 'New York', 'zip': '10001'})
]
client.operate(key, ops)
# Get a value by key
ops = [mop.map_get_by_key('profile', 'city', aerospike.MAP_RETURN_VALUE)]
_, _, bins = client.operate(key, ops)
print("City:", bins['profile'])
# Get map size
ops = [mop.map_size('profile')]
_, _, bins = client.operate(key, ops)
print("Map size:", bins['profile'])
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
Key key = new Key("test", "users", "user1");
// Initialize a map or update existing
Operation[] ops = {
MapOperation.put(MapPolicy.Default, "profile", Value.get("address"), Value.get("123 Main St"))
};
client.operate(null, key, ops);
// Add/update multiple items
Map<Value, Value> items = new HashMap<>();
items.put(Value.get("city"), Value.get("New York"));
items.put(Value.get("zip"), Value.get("10001"));
client.operate(null, key,
MapOperation.putItems(MapPolicy.Default, "profile", items)
);
// Get a value by key
Record record = client.operate(null, key,
MapOperation.getByKey("profile", Value.get("city"), MapReturnType.VALUE)
);
System.out.println("City: " + record.bins.get("profile"));
// Get map size
record = client.operate(null, key,
MapOperation.size("profile")
);
System.out.println("Map size: " + record.bins.get("profile"));
client.close();
UDFs (User-Defined Functions)
Register UDF (Lua)
Register UDF
-- user_profile.lua
function update_activity(rec, last_activity)
rec['last_activity'] = last_activity
rec['activity_count'] = rec['activity_count'] + 1
aerospike:update(rec)
return rec
end
function get_activity_summary(rec)
local result = map()
result['user_id'] = rec['user_id']
result['activity_count'] = rec['activity_count']
result['last_activity'] = rec['last_activity']
return result
end
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
# Register UDF module
client.udf_put('user_profile.lua')
# Apply UDF on a record
key = ('test', 'users', 'user1')
current_time = "2023-04-30T12:34:56Z"
result = client.apply(key, 'user_profile', 'update_activity', [current_time])
print(result)
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
// Register UDF module
RegisterTask task = client.register(null, "udf/user_profile.lua", "user_profile.lua", Language.LUA);
task.waitTillComplete();
// Apply UDF on a record
Key key = new Key("test", "users", "user1");
String currentTime = "2023-04-30T12:34:56Z";
Object result = client.execute(null, key, "user_profile", "update_activity", Value.get(currentTime));
System.out.println(result);
client.close();
Transactions
Python
import aerospike
config = {'hosts': [('127.0.0.1', 3000)]}
client = aerospike.client(config).connect()
# Multi-operation transaction
key = ('test', 'accounts', 'user1')
ops = [
# Read balance
{"op": aerospike.OP_READ, "bin": "balance"},
# Subtract amount
{"op": aerospike.OP_INCR, "bin": "balance", "val": -100},
# Add to transaction history
{"op": aerospike.OP_APPEND, "bin": "transactions", "val": ",withdrawal:100"}
]
try:
_, _, bins = client.operate(key, ops)
print("New balance:", bins["balance"])
except Exception as e:
print("Transaction failed:", str(e))
client.close()
Java
AerospikeClient client = new AerospikeClient("localhost", 3000);
// Multi-operation transaction
Key key = new Key("test", "accounts", "user1");
Operation[] ops = {
// Read balance
Operation.get("balance"),
// Subtract amount
Operation.add(new Bin("balance", -100)),
// Add to transaction history
Operation.append(new Bin("transactions", ",withdrawal:100"))
};
try {
Record record = client.operate(null, key, ops);
System.out.println("New balance: " + record.getInt("balance"));
} catch (AerospikeException e) {
System.out.println("Transaction failed: " + e.getMessage());
}
client.close();
Performance Tuning & Configuration
Client Configuration
Optimal Client Settings
// Java client configuration example
ClientPolicy policy = new ClientPolicy();
policy.timeout = 50; // 50ms timeout
policy.maxConnsPerNode = 300; // Max connections per node
policy.connPoolsPerNode = 2; // Connection pools per node
policy.tendInterval = 1000; // Cluster tending interval in ms
policy.failIfNotConnected = true; // Fail operations if not connected
Write Policies
Policy Option | Default | Recommended | Description |
---|---|---|---|
writeCommitLevel | COMMIT_ALL | COMMIT_MASTER | When to report write completion |
maxRetries | 2 | Application specific | Maximum number of retries |
socketTimeout | 30s | 50-200ms | Socket idle timeout |
totalTimeout | 0 (no time limit) | 50-200ms | Total transaction timeout |
sleepBetweenRetries | 1ms | 1-5ms | Time to sleep between retries |
key | SEND (with record) | DIGEST_ONLY | How to send the key |
durableDelete | false | Application specific | If a delete is persisted |
Read Policies
Policy Option | Default | Recommended | Description |
---|---|---|---|
readModeAP | ONE | ONE/ALL | Read mode for AP (availability) mode |
readModeSC | SESSION | SESSION/LINEARIZE | Read mode for SC (strong consistency) mode |
replica | SEQUENCE | MASTER/MASTER_PROLES | Replica preference for reads |
Storage Engine Configuration
Memory Storage Engine
namespace memory {
replication-factor 2
memory-size 4G
default-ttl 30d # Default TTL if not specified
storage-engine memory # In-memory storage engine
}
Device Storage Engine
namespace ssd {
replication-factor 2
memory-size 4G # Primary index + metadata size
default-ttl 0 # Never expire by default
storage-engine device {
device /dev/sdb # Device or file path
device /dev/sdc
write-block-size 128K # Write block size
data-in-memory true # Store data in memory as well
}
}
Common Challenges & Solutions
Connection Issues
Challenge | Solution |
---|---|
Connection timeout | Check firewall settings, increase connection timeout |
Too many connections | Increase maxConnsPerNode, check for connection leaks |
Node not available | Enable cluster logging, check heartbeat config |
Performance Problems
Challenge | Solution |
---|---|
High latency | Tune timeouts, check hardware bottlenecks, optimize query patterns |
Client-side timeouts | Adjust timeout policies, identify slow operations |
Hot keys/partitions | Distribute workload, redesign data model |
Read amplification | Use batch operations, redesign data model, use projections |
Data Model Issues
Challenge | Solution |
---|---|
Too large records | Split records, use references between records |
Secondary index performance | Limit secondary indexes, be specific with bin selection |
Complex queries | Use batch operations, consider UDFs for processing |
Monitoring & Operations
Basic Monitoring Commands
Using asinfo
# Get namespace statistics
asinfo -v 'namespace/test'
# Get node statistics
asinfo -v 'statistics'
# Get XDR statistics
asinfo -v 'xdr-stats'
# Check cluster status
asinfo -v 'status'
Key Metrics to Monitor
Metric Type | Specific Metrics | Warning Signs |
---|---|---|
Latency | read/write/query latency | Increasing over time, spikes |
Memory | Memory usage percentage | Above 70-80% |
Disk | Used disk space, write performance | Above 70%, increasing write block time |
Node health | Cluster integrity, migrations | Node additions/removals, high migration rate |
Client | Timeouts, retries, errors | Increasing error rates, timeouts |
Backup & Recovery
Using asbackup/asrestore
# Backup a namespace
asbackup --host localhost --namespace test --directory /backup --verbose
# Restore from backup
asrestore --host localhost --directory /backup --verbose
Best Practices & Tips
Data Modeling
- Key Selection: Choose distribution-friendly keys to avoid hotspots
- Bin Design: Prefer fewer, structured bins over many simple bins
- TTL Strategy: Set appropriate TTLs to manage data lifecycle
- Set Organization: Use sets for logical grouping, not for querying
Performance Optimization
- Right-size memory: Ensure primary index fits in memory
- Batch operations: Use batch reads/writes for multiple records
- Predicate filtering: Filter at the client when possible
- Connection pooling: Configure appropriate connection pools
Operational Excellence
- Monitoring: Set up comprehensive monitoring and alerting
- Capacity planning: Plan for growth in advance
- Regular backups: Schedule consistent backups
- Test upgrades: Test upgrades in staging environment first
Resources for Further Learning
Official Documentation
Community Resources
Tools
This cheatsheet provides a practical reference for Aerospike NoSQL database. For specific version details and advanced configurations, always refer to the official documentation.