I won’t claim it was the most popular post I’ve ever written, but a while ago I kinda vented a few frustrations around my attempts to realise an idea I had, whereby I could post content to an Instagram feed, and scrape that to push the content into WordPress to display on a timeline. The idea I suppose being that Instagram provides a ready interface for publishing decent-quality images and simple text.
It provided a chance to do a little bit of hobby-development (seldom do I have the excuse, and therefore make the time/get the chance), as well as a reason to play with some of the AI tooling which is now available and which the world is constantly telling me can improve one’s productivity by orders of magnitude by doing the gruntwork for you.
It wasn’t going smoothly:
We’re in mid-August now and I’m fairly pleased to say that things have moved on a bit since then. Not necessarily forward.
- I sourced a Timeline plugin which would display Custom Post Types, meaning that I could also use the custom post to store taxonomy data, AND put together a REST API for that post type so that pushing in all the data I wanted wasn’t going to be a ballache.
- The only way I could think of to read the Insta feed but not require any kind of authentication was to write a Python script to run a couple of times a day & call the Insta API to get posts, and then parsing the text & calling my WordPress REST API would be fairly trivial.
I got all that working on my laptop, talking to my dev version of my blog (running on my Unraid server on my home network). Should be a simple case then of getting the WordPress code I wrote into what we’ll laughably call my Production environment, set the Python script up somewhere else on my VPS, set up a cron job, and let the whole thing get moving.
So, getting everything onto the server… I figured there must be a way to automate this, because not only do I vehemently hate copy/paste deployment (we live in the future, dammit!), but also it seemed a reasonable opportunity to mess about with github actions.
I managed to convince github to use pipelines to copy my wordpress file updates and my Python stuff up to my VPS. In the case of the former, however, there was the not-insignificant matter of what to do with it once it was there – Plesk sites run under an internally-maintained username and it became readily apparent that it wasn’t possible to upload using that user’s creds (owing the fact they weren’t real users), and so any files I DID get anywhere near the webserver couldn’t be read by it.
This hurdle was vaulted by my discovery that Plesk has a git integration whereby you can fit it up with a repo and then click a button in the Plesk console to drag the files across, all with the right perms, etc.
Python was another story, however.
Pushing/running was no trouble – but I found that the logs were full of 401s when the Python script tried to talk to Instagram. ChatGPT and Claude must’ve gotten really tired of my stupid questions on this, cos I genuinely couldn’t fathom why it was working perfectly from my laptop but the same scripts/creds were not working from the VPS.
I stumbled across someone’s lament on Reddit that Meta’s APIs are really picky about where they accept requests from – they don’t want people running bots, so anything it recognises as a non-residential IP address gets bounced. No problem – there’s loads of Residential Proxy services set up to get around this sort of thing (I mean, people will swoop in and make a crust wherever, no?). This, however, gave me no joy – nor did authenticating to the Instagram API before attempting to do the profile read.
So I’m scratching my head – how in the hell can I run this Python script on my home internet connection, but not from a cronjob on my laptop/desktop? Well, I guess I *do* have that Unraid server running 24/7 with my media box on it. Only in order to do that I’d need to learn how to put the Python stuff in Docker and then write the template to allow Unraid to run it.
No small amount of swearing later, trendsetters, and that’s exactly what I’d done. Only instead of 401s from the Insta APIs, I was now getting 403s (Forbidden, rather than Unauthorized). So now it was happy to look at who I was, but once that’d been established it was STILL telling me to piss off.
So I go back to running it on my laptop to see where I might be going wrong. And now THAT’s giving me 403s. What the hell? Had I gotten myself blocked, somehow?
Turned out this was all a case of poor timing on my behalf – around the same time I started mucking about with all this, Meta published this:
https://developers.facebook.com/blog/post/2024/09/04/update-on-instagram-basic-display-api
In essence, if you don’t pay for an Instagram account, you can sling your hook. So they must’ve been rolling these changes out and it’s only now that I’ve gotten within sight of the finish line that my user’s been hoovered up into the update.
Ultimately this isn’t a huge problem – there’s other similar phone apps out there for social media channels, so there must be something less world-dominatey I can slot right in? Step forward, Tumblr.
Now that I had most of the code and a good idea of the approach I was taking it oughtta be simple right? RIGHT?
Pulling stuff out of Tumblr wasn’t too bad – Pytumblr library instead of Instapy, and away you go. All running locally on laptop and pushing to dev wordpress site (on home network). Now to swap in the credentials for “Live”. Oh – no permission to create posts.
Much investigation ensued and eventually I concluded that for whatever reason, my setup of WordPress running on Plesk meant that trying to make WP REST API calls using Application Passwords would be a bit of a nonstarter. I *think* it was something to do with headers being stripped, but it was a pretty crap way of doing Auth in the first place. Only trouble was finding a better one that didn’t involve heroic amounts of further fuckery.
A plugin install later and I was off on the way using JWT instead. Well. Theoretically.
Some back & forth on that one (including the revelations that I wasn’t setting the Authorization header correctly, and that you need to set a User Agent header for your calls from requests-py in order for WordPress/Plesk to block you), and in MID-FEBRUARY it I actually got the goddamn thing working.
…
I stumbled across this unfinished draft and it’s no longer clear why I didn’t post it at the time. The thing works ok.
Quasi-amusing coda: I described this whole setup to Chelsea at work, and she said “You do know that Tumblr got bought by WordPress, and they’ve moved the whole backend across so that it’s powered by it now?”.
So I’ve written a bunch of Python code running on a server in Docker to pull data out of Tumblr (wordpress), parse it, then stuff it back into a different WordPress.
Sounds about right.
Update: Oh no, they’ve changed their minds.