Opinion Networks Generalized Planning Intervention Planning Dialog Management

Opinion Dynamics & Information Spread Control

Bharath Muppasani, Protik Nag, Vignesh Narayanan, Biplav Srivastava, Michael N. Huhns

NeurIPS 2024 AAAI 2024 Demo GenPlan @ AAAI 2025
★ Best Demo Award
86%Infection Reduction (Cora)
6Reward Structures Studied
3Published Project Milestones
Numeric PDDLExecutable Formulation

Abstract

This project studies how to control misinformation propagation in social graphs through sequential intervention planning. We model each person as an agent with continuous beliefs, capture trust-weighted interactions, and compute intervention actions under limited budget at each step.

The work progresses from an interactive planning-based demo system (AAAI 2024) to generalized policy learning on graph families (NeurIPS 2024), then to human-AI dialog state control over connected beliefs (GenPlan @ AAAI 2025). All three stages share the same objective: reduce network infection while preserving transferability across unseen topologies.

AAAI 2024 Demo
InfoSpread System

Interactive simulation, planning-backed intervention loops, and visual diagnostics for misinformation control. Received the Best Demo Award.

NeurIPS 2024
Generalized Planning Strategies

Learned intervention policies that transfer from small training graphs to larger unseen networks using structural graph features.

GenPlan @ AAAI 2025
Dialog-State Belief Planning

Extended opinion-network control to dialog management where utterances influence interconnected user beliefs across topics.


Modeling Dynamic Opinion Networks

We represent the social system as a directed graph $G = (V, E)$ where each node has a continuous opinion value $x_i(t) \in [-1, 1]$ and each edge carries trust weight $\mu_{ik} \in [0,1]$. A node is considered infected when its belief crosses a misinformation threshold (for example, $x_i < -0.95$).

Opinion updates follow a linear adjustment process driven by trust-weighted interactions between connected agents.

$$x_i(t+1)=x_i(t)+\mu_{ik}\bigl(x_k(t)-x_i(t)\bigr)$$

At each step, the planner selects a budget-constrained subset of target nodes for corrective intervention. This turns opinion control into a long-horizon sequential decision problem rather than a one-shot classification task.

Environment and Intervention Model:

At each timestep, infected nodes propagate misinformation to their immediate susceptible neighbors (candidate nodes). An intervention policy chooses a budget-constrained subset of nodes to receive authentic information from a trusted source.

The trusted source uses opinion value +1, and source-trust is configured from the settings used in experiments (1.0, 0.8, 0.75). Intervened nodes move toward positive belief, and once a node crosses the positive threshold it is treated as blocked from further misinformation spread. The episode ends when no candidate nodes remain.

Propagation and intervention process schematic
Propagation and intervention dynamics in the simulation environment.
Infection Rate Metric: The planning objective is evaluated using the fraction of infected nodes at each timestep.
$$\text{InfectionRate}(t)=\frac{|\{v\in V:x_v(t)<-0.95\}|}{|V|}$$

Intervention Planning Strategies

Combinatorial node selection quickly makes exact planning intractable. To preserve scale and transfer, we focus on generalized approaches that learn reusable strategies from graph structure.

Supervised Learning (SL) with GCNs

We use Graph Convolutional Networks (GCNs) to rank intervention candidates based on structural signals such as connectivity, local polarity, and influence flow. Ground-truth supervision is produced from search-derived solutions on small training graphs.

This framing addresses the generalized planning question directly: learn a transferable intervention policy that is not tied to specific node identifiers.

Reinforcement Learning (RL) & Reward Engineering

To remove expensive supervised labeling, we introduced a reinforcement learning pipeline with a Deep Value Network (DVN) that learns directly from simulation outcomes.

RL Advantage: Learning from feedback avoids NP-hard label generation loops while retaining strong generalization across graph families and scales.
GCN Inference Architecture
Inference pipeline for GCN-based generalized planning strategy.

A key NeurIPS 2024 contribution is the study of six reward structures (Section 3.2.1) balancing infection control, susceptibility, and intervention speed:

Delta Infection
$R_0 = -\Delta I_t$
Candidate Suppression
$R_1 = -|C_t|$
Hybrid Local+Global
$R_2 = -|C_t|-\Delta I_t$
Episode Speed
$R_3 = 1-\frac{T_{\text{used}}}{T_{\max}}$
Current Infection
$R_4 = -I_t$
Mixed Episodic
$R_5 = -|C_t|-\frac{T_{\text{used}}}{T_{\max}}$
$I_t$ and $\Delta I_t$
$I_t$ is infection rate at time $t$. $\Delta I_t = I_{t+1} - I_t$ captures how infection changes after action $a_t$.
$C_t$ (candidate nodes)
Immediate neighbors of infected nodes that are susceptible to infection in the next timestep.
$T_{\text{used}}$
Number of steps consumed before misinformation is contained in an episode.
$T_{\max}$
Maximum episode length (time-step budget) used to normalize speed-focused rewards.

InfoSpread System Implementation

We developed InfoSpread to connect theoretical planning models to practical experimentation and operator-facing analysis.

Numeric PDDL Formulation: The system includes a numeric PDDL representation with explicit typed entities (agent, source, topic) and fluents such as have-opinion and have-trust, enabling direct execution on planners like Metric-FF.

The platform compiles network configurations into executable planning problems, launches planning/search backends, and visualizes intervention sequences for human-in-the-loop analysis and override.

InfoSpread system walkthrough (Google Drive).

InfoSpread UI

Interactive dashboard for inspection and intervention analysis.


Extension to Dialog Management

Recent work extends this formulation to conversational dialog management: a user state is represented as connected topic-beliefs, and utterances become interventions over that latent belief graph.

This links opinion dynamics with safer human-AI interaction design, where policies are optimized not just for one topic outcome but for coherent, multi-topic behavior change.


Representative Publications