UWSN Routing Environment Explained

Environment Overview

Network Structure

The environment models an Underwater Wireless Sensor Network (UWSN) with:

N sensor nodes randomly positioned in 3D space
One sink node (destination) at a fixed position
Each node has limited energy that depletes with transmissions
Nodes can only communicate within a transmission range

Reinforcement Learning Setup

The environment follows the standard Gymnasium interface for RL:

State space: Local node info + neighbor metrics
Action space: Forward to neighbor or drop packet
Reward function: Balances success with energy, distance, and link quality
Termination: When packet reaches sink or is dropped

Network Visualization

Interactive 3D network visualization would appear here

Sink Node

Sensor Nodes

Active Links

Inactive Links

Key Environment Components

State Space

The observation space provides the agent with information about:

Local Node Information

{
    "energy": 50.0,       # Current node's remaining energy
    "position": [x, y, z] # 3D coordinates of current node
}

Neighbor Information

{
    "energy": 45.0,       # Neighbor's remaining energy
    "distance": 25.3,     # Distance to neighbor
    "RSSI": -75.2,        # Received Signal Strength
    "SNR": 15.8,          # Signal-to-Noise Ratio
    "PDR": 0.92           # Packet Delivery Ratio
}

Action Space

The agent can choose from the following actions:

Forward to Neighbor N

Transmit packet to selected neighbor node

Forward to Sink

Direct transmission to destination node

Drop Packet

Terminate transmission (penalized)

Reward Function

The reward balances multiple objectives to encourage efficient routing:

reward = R_success                     # Base reward for success
        - β * distance²              # Penalize long distances
        - η * energy_consumed       # Penalize high energy use
        - δ * (1 - PDR)             # Penalize poor link quality
        - θ * (1 / residual_energy) # Penalize low-energy nodes

Reward Components

R_success: Fixed reward for reaching sink
Distance penalty: Discourages long hops
Energy penalty: Conserves network energy
Link quality penalty: Encourages reliable paths

Example Scenarios

Short, efficient path: High reward
Long but reliable path: Moderate reward
Dropped packet: Significant penalty

Technical Implementation

Environment Initialization

def __init__(self, num_nodes=10, max_energy=100, 
             initial_energy=50, transmission_range=100):
    # Initialize node positions randomly in 3D space
    self.node_positions = np.random.uniform(0, 100, size=(num_nodes, 3))
    
    # Set initial energies
    self.node_energies = np.full(num_nodes, initial_energy)
    
    # Define action space (forward to any node or drop)
    self.action_space = gym.spaces.Discrete(num_nodes + 1)
    
    # Define observation space (local + neighbor info)
    self.observation_space = gym.spaces.Dict(...)

The environment is initialized with configurable parameters for network size, energy levels, and transmission range.

Step Function

def step(self, action):
    if action == "drop":
        reward = -10
        terminated = True
    elif action == sink_node:
        reward = R_success
        terminated = True
    else:
        # Calculate multi-component reward
        reward = compute_reward(current_node, action)
        # Update node energy
        self.node_energies[action] -= 1
        
    return observation, reward, terminated, truncated, info

The step function handles packet forwarding, energy consumption, and computes the multi-objective reward.