Underwater Wireless Sensor Network Routing

A Reinforcement Learning Environment for Optimal Packet Routing in UWSNs

Environment Overview

Network Structure

The environment models an Underwater Wireless Sensor Network (UWSN) with:

  • N sensor nodes randomly positioned in 3D space
  • One sink node (destination) at a fixed position
  • Each node has limited energy that depletes with transmissions
  • Nodes can only communicate within a transmission range

Reinforcement Learning Setup

The environment follows the standard Gymnasium interface for RL:

  • State space: Local node info + neighbor metrics
  • Action space: Forward to neighbor or drop packet
  • Reward function: Balances success with energy, distance, and link quality
  • Termination: When packet reaches sink or is dropped

Network Visualization

Interactive 3D network visualization would appear here

S
1
2
3
4
Sink Node
Sensor Nodes
Active Links
Inactive Links

Key Environment Components

State Space

The observation space provides the agent with information about:

Local Node Information

{
    "energy": 50.0,       # Current node's remaining energy
    "position": [x, y, z] # 3D coordinates of current node
}

Neighbor Information

{
    "energy": 45.0,       # Neighbor's remaining energy
    "distance": 25.3,     # Distance to neighbor
    "RSSI": -75.2,        # Received Signal Strength
    "SNR": 15.8,          # Signal-to-Noise Ratio
    "PDR": 0.92           # Packet Delivery Ratio
}

Action Space

The agent can choose from the following actions:

N

Forward to Neighbor N

Transmit packet to selected neighbor node

0

Forward to Sink

Direct transmission to destination node

X

Drop Packet

Terminate transmission (penalized)

Reward Function

The reward balances multiple objectives to encourage efficient routing:

reward = R_success                     # Base reward for success
        - β * distance²              # Penalize long distances
        - η * energy_consumed       # Penalize high energy use
        - δ * (1 - PDR)             # Penalize poor link quality
        - θ * (1 / residual_energy) # Penalize low-energy nodes

Reward Components

  • R_success: Fixed reward for reaching sink
  • Distance penalty: Discourages long hops
  • Energy penalty: Conserves network energy
  • Link quality penalty: Encourages reliable paths

Example Scenarios

  • Short, efficient path: High reward
  • Long but reliable path: Moderate reward
  • Dropped packet: Significant penalty

Technical Implementation

Environment Initialization

def __init__(self, num_nodes=10, max_energy=100, 
             initial_energy=50, transmission_range=100):
    # Initialize node positions randomly in 3D space
    self.node_positions = np.random.uniform(0, 100, size=(num_nodes, 3))
    
    # Set initial energies
    self.node_energies = np.full(num_nodes, initial_energy)
    
    # Define action space (forward to any node or drop)
    self.action_space = gym.spaces.Discrete(num_nodes + 1)
    
    # Define observation space (local + neighbor info)
    self.observation_space = gym.spaces.Dict(...)

The environment is initialized with configurable parameters for network size, energy levels, and transmission range.

Step Function

def step(self, action):
    if action == "drop":
        reward = -10
        terminated = True
    elif action == sink_node:
        reward = R_success
        terminated = True
    else:
        # Calculate multi-component reward
        reward = compute_reward(current_node, action)
        # Update node energy
        self.node_energies[action] -= 1
        
    return observation, reward, terminated, truncated, info

The step function handles packet forwarding, energy consumption, and computes the multi-objective reward.

Made with DeepSite LogoDeepSite - 🧬 Remix