MSc Computer Science thesis project exploring deep reinforcement learning
approaches for maritime route optimization. The system learns optimal
routing policies considering weather conditions, fuel efficiency, and
time constraints.
System Overview
- Custom grid-based maritime environment simulation
- DDQN (Double Deep Q-Network) implementation
- DDPG (Deep Deterministic Policy Gradient) for continuous action spaces
- Parallelised training infrastructure for faster convergence
Key Technical Decisions
- Designed discretised grid environment balancing realism and computational efficiency
- Implemented experience replay and target networks for stable learning
- Used parallel environment instances to accelerate training
- Compared value-based (DDQN) vs policy-based (DDPG) approaches
Results
The trained agents successfully learned to navigate complex maritime
scenarios, reducing route time by 15-20% compared to baseline heuristics
while maintaining safety constraints.