Spider requests that leave PHP with no clear User-Agent header can be blocked, misrouted, or hard to separate from anonymous traffic in server logs. Set CURLOPT_USERAGENT on the cURL handle so each crawler request identifies the application name, version, and contact URL before the transfer is executed.

PHP exposes libcurl options through curl_setopt() and curl_setopt_array(). CURLOPT_USERAGENT writes the HTTP User-Agent request header for that cURL transfer, while CURLOPT_RETURNTRANSFER keeps the response available to PHP instead of printing it directly.

Use a truthful identifier for the spider instead of copying a browser string to bypass filters. Many sites use robots policy, rate limits, and log analysis around the user agent, so include a stable product name and a contact or information URL when the crawler will request pages outside the application's own systems.

Steps to change PHP cURL spider User-Agent:

  1. Create a local endpoint that prints the received User-Agent header.
    echo-user-agent.php
    <?php
    header('Content-Type: text/plain');
    echo 'Received User-Agent: ' . ($_SERVER['HTTP_USER_AGENT'] ?? 'none') . PHP_EOL;

    The local endpoint gives a private proof target before the spider requests production pages.

  2. Start PHP's built-in server for the test endpoint in a second terminal.
    $ php -S 127.0.0.1:8080 -t .

    Keep this terminal open until the verification request finishes, then stop it with Ctrl+C.

  3. Create the spider request script with CURLOPT_USERAGENT set before curl_exec().
    spider.php
    <?php
    $url = $argv[1] ?? 'https://www.example.com/';
    $userAgent = 'ExampleSpider/1.0 (+https://www.example.com/bot)';
     
    $curl = curl_init($url);
    curl_setopt_array($curl, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_USERAGENT => $userAgent,
        CURLOPT_TIMEOUT => 10,
    ]);
     
    $response = curl_exec($curl);
    if ($response === false) {
        fwrite(STDERR, curl_error($curl) . PHP_EOL);
        curl_close($curl);
        exit(1);
    }
     
    $status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
    curl_close($curl);
     
    echo "HTTP status: {$status}" . PHP_EOL;
    echo $response;

    Replace ExampleSpider/1.0 and the contact URL with the spider's real name, version, and information page.

  4. Run the spider script against the local endpoint.
    $ php spider.php http://127.0.0.1:8080/echo-user-agent.php
    HTTP status: 200
    Received User-Agent: ExampleSpider/1.0 (+https://www.example.com/bot)
  5. Use the same script against the real target after the header check passes.
    $ php spider.php https://www.example.com/

    Do not use a browser User-Agent string to hide automated traffic. Some sites treat misleading crawler identifiers as abusive behavior and may block the client.