Beating the Hack
Have you heard of the Pharma Hack? It’s a fairly widespread problem among PHP-driven websites, particularly those powered by WordPress or Joomla. I’m a WordPress user myself, and Nostalgia For Infinity, the Wrecktheplacefantastic bandsite and Arcadian Rhythms are all powered by the platform.
About a fortnight ago AJ, one of the writers for Arcadian Rhythms, noticed that some of our Google search results were a bit odd. The links themselves were fine but using the preview function (where you get a screenshot of the page to the right of the search results) revealed a ton of spam content. Another AR writer, Spann, who works in SEO, had a look and quickly realised that this was 1.) not good, and 2.) we had been pharma hacked.
After a bit of research from Spann he discovered that there was a vulnerability in the WordPress theme we use, Arras, that allowed code to be injected into the site via a file called timthumb.php. It’s not a malicious file itself – it’s supposed to enable thumbnail resizing via sites like Flickr, I believe – but the outdated version used in Arras was deeply insecure.
The onus was now on me to identify the exact nature of the problem and fix it. I decided to write up what I went about doing in the hopes that it might be of use to some other poor bastard who is frantically googling the symptoms they’re seeing, as the nature of the hack I fixed was not identical to the other guides I read (though I will of course link to these, especially as they contain all of the technical instructions on how to go about doing what I did).
I’m not really in the mood to write this all up as a nice friendly blog post, so what I will do is provide a bulleted list of what I did and the order in which I did so. At the end of the post I’ll link to everything I found useful so you can find out how to do what I mention, or learn more about why. Where I can remember the relevant links, I’ll also include them in the list.
Sorry this is a bit lazy. I want to be helpful but I have other things to write that are a lot more interesting than this!
- First up it was important to update timthumb.php to the latest version, since that was the original vulnerability.
- I checked NFI and WTPF and it seemed that they were not affected by the hack, which was a relief. I included them in all of my subsequent investigations anyway just in case.
- A blog post described obvious fake files (typically within a plugin directory, typically with a filename that began “.class-” or similar) but I couldn’t find any of these. Readers, don’t go killing anything that looks a bit like that, find the blog post in the list below and read up on what to look for. If you delete an actual vital file you could do something to your site that is harder to fix than a pharma hack.
- I had a look at the .htaccess files in the root of my installations in case there were obvious rewrite requests in there. There were some lines in there I wasn’t sure about, but all three sites had near-identical lines and I suspected they were valid, so I left the files alone – after temporarily renaming them to hide them and confirming that doing so hadn’t adversely affected the three sites.
- I set up a bunch of Google alerts as per this site to warn me of similar problems in future (fixing pharma hacks is easier the earlier you catch it).
- The suggestion of setting up Google Alerts came via this post which seemed the most useful one I found for the problem I was seeing. Based on its instructions I ran through a series of investigations.
- I ran search queries for Arcadian Rhythms and attempted to locate more affected pages, and also load those affected pages and view the source code to search for pharma terms. This actually returned positive results as the majority of Google page previews showed the correct site content. Only a scattered few previews appeared to be compromised. I did encounter a few more by churning through later pages of site results. Searching the source code of affected pages did not reveal any issues.
- A pattern was emerging: the affected pages were tag/category/author pages. The affected parts of the page are always the main content windows, where a listing of other pages is replaced by pharma spam content. Unfortunately I then found an actual post which was affected, and a tag page which was not. So it was a trend and not a pattern, but at least I knew it was a ‘global’ rather than page type-specific issue.
- I used Google Webmaster Tools to retrieve some afflicted pages. This means that I will be able to see the code of the pages as the Googlebot User Agent (which fetches pages for google search results) sees them.
- By pretending to be the Googlebot and retrieving a known afflicted page I saw that all of the dodgy code was definitely still there and of pharma spam content. Well, hoping that the issue had fixed itself was always plain dumb optimism.
- Next I tried using Redleg File Viewer to do the same thing. This works similarly to the Googlebot but as the request doesn’t come from a known Google IP, certain types of Pharma Hack react differently (by serving or not serving the spam content). This will help me narrow down the type of hack we’re suffering if nothing else. In the end no matter which settings I used the Redleg tool returned the correct page headers, so it seemed that the hack was cleverer than the tool and was definitely only reacting to the Googlebot User Agent.
- Now I was on to the really fun bit. I started looking at all of the Arcadian Rhythms PHP files in Notepad++ to try and identify whatever nasty bit of codeshit was returning the pharma junk to Google. I started with all of the base, root WordPress files, then planned to move on to the Arras theme files. After that I would check every other PHP file, including all of those for our various plugins.
- (At this point I also made a note to change all related passwords – every WordPress user’s password for all three sites, the passwords for the MySQL databases themselves – which also required updating wp-config.php, remember – and the password to access the domain itself via SFTP or SSH. I had no evidence that these had been discovered but better safe than sorry.)
- (I also learned around now that you could add “Authentication Unique Keys and Salts” to wp-config.php to help boost future security, and you can also define the WordPress database table prefix here, which makes guessing the right filenames tougher for hackers.)
- This process helped identify theme.php as a Backdoor/PHP file – it showed up as a block of Base64 code when viewed via the WP theme editor in the site’s backend and, when downloaded locally, AVG flagged it as Backdoor/PHP. I renamed the file on the server to lock it down and, upon research confirming it was not a valid WP file, I deleted it entirely.
- Around this time I realised that manually checking PHP files for stuff that looked wrong was really, really inefficient and slow, and would actually take me days even to only do AR. So I moved on to using a tool called PuTTY to search for evidence of hacks. PuTTY connects to your server using a protocol called SSH and can use a command called “grep” to scan files on the server for certain fragments of code, in this case anything that was encrypted in Base64 within a PHP, plain text or .js file.
- This process found a huge amount of Base64 code in the wp-includes directory. This is presumably where the code was hiding that was serving up spam content to the Googlebot User Agent. Unfortunately I don’t know Putty well enough to isolate the problem file, so I’ve downloaded the whole directory (AVG registered no positives this time) and am skipping through it manually, looking for obviously encrypted code.
- By doing this I identified https.php as containing nothing but a long line of base64 encrypted code. I erased this from the server as it was already downloaded – may as well remove any harm it is trying to do. Once that was removed, I re-grep’d the entire wp-includes directory, which identified two more files containing base64 code. Class-ixr.php looks legitimate (it contains references to base64 embedded in valid-looking PHP code) whereas class-sftp.php did not. As the latter was already downloaded I deleted it too. I could put it back again if that seems to be a problem.
- Whoops, that broke the site! At this point I thought that these two files might actually refer to secure HTTP and secure FTP connections, which would explain why they contained encrypted code. I put them back for now but made a note. (This is why, when it comes to security problems and code, a little knowledge can be a dangerous thing. In retrospect I am glad I noticed these files and made a note of them.)
- Further research revealed that https.php was suspiciously large for a PHP file. Plus, and this was interesting, that file did not appear at all in a vanilla WordPress installation (I had downloaded one for comparative purposes). Nor did class-sftp.php. I deleted both files again, then took a second look at the error message which appeared on the site.
- The error claimed that line 65 in plugins.php was calling https.php and that’s why the site couldn’t load. Very interesting. Why would plugins.php be calling something to do with secure HTTP, anyway? I replaced the plugins.php file on the server with a clean new file from the aforementioned vanilla WP installation.
- Fixed! The site works again! I guess those two files really weren’t needed.
- Wondering if I had fixed the issue, I tried Googlebot again… and the results came back clean! VICTORY!!!!!! I was still not sure what class-sftp.php was doing but the site didn’t object to it not being there, and I could still log in via SFTP.
- Despite having apparently fixed the problem I continued with a number of additional checks. Doing so revealed a few more apparently inactive/unexploited PHP/Backdoor infected files which served no valid purpose, so I erased those. Grepping the entirety of each of the three sites in turn revealed no base64 code which should not have been there. It looked like I was clear.
- Still,I wasn’t done yet. Two things remained: beefing up the security, and performing full backups of the site files and databases.
- I had already changed a number of passwords before starting the fixes – in case the original hacker was still tooling around or something – but now that the vulnerabilities were closed and the hack was removed I again changed all WordPress admin passwords.
- I renamed one of the three WP admin users away from the default username (note: this involves editing database tables, so be careful) – fortunately the other two were not defaults anyway.
- I changed my password with my web host – this is the password that applies to SFTP and SSH access.
- I changed the password for the users with access to the mysql databases underpinning the three WP blogs. REMINDER: this will also require you to edit the wp-config.php file in the root of your WordPress installation.
- I instructed all users on the WP blogs to update their passwords within twenty-four hours. If any didn’t, I manually changed their passwords for them.
- Next I had a look at the instructions on this post. I had already covered #1 and decided to not use #2 (I have multiple WordPress installations and didn’t want them to get confused about which wp-config.php file to use). I also left #3 alone because I didn’t want to risk breaking my databases (at this point I had been at this for a good ten hours or so). I may try it in future however. I also skipped #5 because various contributors to the blogs were connecting via domestic ISPs with dynamically assigned IP addresses. I did however implement #4 as it was very easy to do so.
- Finally, I backed up all files and the WordPress databases as per the the WordPress codex, and then I took those backups and stored them in multiple locations (good backup practice anyway).
And that was everything! The problem was fixed, the security was upgraded, and the backups were finished. All in all it took me about thirteen hours. I had to take the day off work and it had been a long day, but it was done.
Hopefully the above will prove useful to some, and will also offer confidence to those who – like me – have only rudimentary experience with modern web code and tools/protocols like SSH/Telnet. A basic techy mindset (identify and remember patterns, try and contextualise what you are seeing, read and think about what you are reading) along with a bit of google-fu will take you a long way – this is pretty much true of 95% of what is regarded as “technical” by those who are entirely not.
For the more technical, perhaps the above may be slightly useful as it describes a minor variation on the pharma hack which I have not seen described elsewhere.
Finally, here are the links I mentioned at the start of the post:
- http://www.pearsonified.com/2010/04/wordpress-pharma-hack.php – I should note that the type of hack described in here was not what I experienced, but you may suffer from it…
- http://chrispederick.com/work/user-agent-switcher/ – this may help you view your site as Google sees it, revealing if you are suffering from the pharma hack.
- http://www.arrastheme.com/forums/topic7299-zero-day-vulnerability-in-timthumb-script.html – this is the vulnerability through which we were attacked.