How to change the User-Agent for PHP cURL spiders

Spider requests that leave PHP with no clear User-Agent header can be blocked, misrouted, or hard to separate from anonymous traffic in server logs. Set CURLOPT_USERAGENT on the cURL handle so each crawler request identifies the application name, version, and contact URL before the transfer is executed.

PHP exposes libcurl options through curl_setopt() and curl_setopt_array(). CURLOPT_USERAGENT writes the HTTP User-Agent request header for that cURL transfer, while CURLOPT_RETURNTRANSFER keeps the response available to PHP instead of printing it directly.

Use a truthful identifier for the spider instead of copying a browser string to bypass filters. Many sites use robots policy, rate limits, and log analysis around the user agent, so include a stable product name and a contact or information URL when the crawler will request pages outside the application's own systems.

Steps to change PHP cURL spider User-Agent:

Create a local endpoint that prints the received User-Agent header.
echo-user-agent.php
```
<?php
header('Content-Type: text/plain');
echo 'Received User-Agent: ' . ($_SERVER['HTTP_USER_AGENT'] ?? 'none') . PHP_EOL;
```
The local endpoint gives a private proof target before the spider requests production pages.
Start PHP's built-in server for the test endpoint in a second terminal.
```
$ php -S 127.0.0.1:8080 -t .
```
Keep this terminal open until the verification request finishes, then stop it with Ctrl+C.

Create the spider request script with CURLOPT_USERAGENT set before curl_exec().

spider.php

<?php
$url = $argv[1] ?? 'https://www.example.com/';
$userAgent = 'ExampleSpider/1.0 (+https://www.example.com/bot)';
 
$curl = curl_init($url);
curl_setopt_array($curl, [
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_USERAGENT => $userAgent,
    CURLOPT_TIMEOUT => 10,
]);
 
$response = curl_exec($curl);
if ($response === false) {
    fwrite(STDERR, curl_error($curl) . PHP_EOL);
    curl_close($curl);
    exit(1);
}
 
$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
 
echo "HTTP status: {$status}" . PHP_EOL;
echo $response;

Replace ExampleSpider/1.0 and the contact URL with the spider's real name, version, and information page.

Run the spider script against the local endpoint.

$ php spider.php http://127.0.0.1:8080/echo-user-agent.php
HTTP status: 200
Received User-Agent: ExampleSpider/1.0 (+https://www.example.com/bot)

Use the same script against the real target after the header check passes.
```
$ php spider.php https://www.example.com/
```
Do not use a browser User-Agent string to hide automated traffic. Some sites treat misleading crawler identifiers as abusive behavior and may block the client.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.