Environment Overview
Network Structure
The environment models an Underwater Wireless Sensor Network (UWSN) with:
- N sensor nodes randomly positioned in 3D space
- One sink node (destination) at a fixed position
- Each node has limited energy that depletes with transmissions
- Nodes can only communicate within a transmission range
Reinforcement Learning Setup
The environment follows the standard Gymnasium interface for RL:
- State space: Local node info + neighbor metrics
- Action space: Forward to neighbor or drop packet
- Reward function: Balances success with energy, distance, and link quality
- Termination: When packet reaches sink or is dropped
Network Visualization
Interactive 3D network visualization would appear here
Key Environment Components
State Space
The observation space provides the agent with information about:
Local Node Information
{
"energy": 50.0, # Current node's remaining energy
"position": [x, y, z] # 3D coordinates of current node
}
Neighbor Information
{
"energy": 45.0, # Neighbor's remaining energy
"distance": 25.3, # Distance to neighbor
"RSSI": -75.2, # Received Signal Strength
"SNR": 15.8, # Signal-to-Noise Ratio
"PDR": 0.92 # Packet Delivery Ratio
}
Action Space
The agent can choose from the following actions:
Forward to Neighbor N
Transmit packet to selected neighbor node
Forward to Sink
Direct transmission to destination node
Drop Packet
Terminate transmission (penalized)
Reward Function
The reward balances multiple objectives to encourage efficient routing:
reward = R_success # Base reward for success
- β * distance² # Penalize long distances
- η * energy_consumed # Penalize high energy use
- δ * (1 - PDR) # Penalize poor link quality
- θ * (1 / residual_energy) # Penalize low-energy nodes
Reward Components
-
R_success: Fixed reward for reaching sink
-
Distance penalty: Discourages long hops
-
Energy penalty: Conserves network energy
-
Link quality penalty: Encourages reliable paths
Example Scenarios
-
Short, efficient path: High reward
-
Long but reliable path: Moderate reward
-
Dropped packet: Significant penalty
Technical Implementation
Environment Initialization
def __init__(self, num_nodes=10, max_energy=100,
initial_energy=50, transmission_range=100):
# Initialize node positions randomly in 3D space
self.node_positions = np.random.uniform(0, 100, size=(num_nodes, 3))
# Set initial energies
self.node_energies = np.full(num_nodes, initial_energy)
# Define action space (forward to any node or drop)
self.action_space = gym.spaces.Discrete(num_nodes + 1)
# Define observation space (local + neighbor info)
self.observation_space = gym.spaces.Dict(...)
The environment is initialized with configurable parameters for network size, energy levels, and transmission range.
Step Function
def step(self, action):
if action == "drop":
reward = -10
terminated = True
elif action == sink_node:
reward = R_success
terminated = True
else:
# Calculate multi-component reward
reward = compute_reward(current_node, action)
# Update node energy
self.node_energies[action] -= 1
return observation, reward, terminated, truncated, info
The step function handles packet forwarding, energy consumption, and computes the multi-objective reward.