There’s a few things I’ve learned by hosting websites, some of them should be pretty obvious where others might be obvious after you’ve made a mistake.
#1: Use raid on your system
I must admit, the first server I ever had, didn’t have any raid. It had two disks, but no raid. Back then, I was always trying to have as much storage as possible, meaning, if I had two disks of 500 gigs, I couldn’t see why I should mirror content between those two drives, that changed, after causing hours of downtime (around 8), but at least I had backups.
#2: Be paranoid – somehow
It’s good to be somehow paranoid, because that means you usually do your best, not to make mistakes, and keep proper backups etc, also multiple times.
#3: Be relaxed, and don’t stress
I’ve found out, that I somehow get stressed, when problems occur, it make sense, since you’ll be afraid that stuff will go bad. I’ve also learned, if you are stressed, and don’t relax, things will go bad, so chill, better spend 5 minutes more writing same code, than spending 1 hour more fixing your own mistakes.
#4: Make a copy before changing (including upgrades)
Sometimes you need to do config changes, or system updates, that means – files will change.
Updating kernel? Then make sure to take a copy of everything in /boot first, it might save you for some time afterwards, if stuff doesn’t work.
#5: Be patient
Sometimes things need to finish, today (28-12-2013), I learned that rebooting a system for first time in many days, will require a fsck check on startup. Be patient, and let it run, and don’t freak out, and think your update wen’t wrong, just causing more downtime than needed. (Yes, I failed big time).
#6: Ask for help, if you need help
No one, knows everything. I’ve learned, that there’s things I either don’t have deep knowledge about, or things that I know people do way better than me. Ask people for help, I’m sure they would like to help if possible. This can save you for a lot of headaches as well.
But at same time, look at what they’re doing, understand WHY they’re doing it, things is usually done because it’s needed.
#7: Know the basics
I think it happens to many, they know a lot of things, or have knowledge about it, but they don’t really know the basics for specific things. Happens for myself as well 😀
#8: Have a test environment
Having a ‘staging’ or test environment for your things, is always a good idea, because this system shouldn’t affect real people, some people have the test environment, but forget to use it from time to time.
#9: Check your settings multiple times
When changing something in production, check your settings multiple times, making a mistake can break things.
#10: Do backups (See #2)
Do backups, be paranoid. Backups is important, store them multiple places. Costs for backups can be high, but it’s a good investment, because the cost of not having backups might be way higher.
Remember, backup with slow restore (glacier e.g.) is still better than no backup.
#11: Things will go wrong
At some point, things will go wrong, or stuff will not do as you expected.
#12: fsck is great, but not during boot
fsck is nice, but not really if it’s unexpected on system boot, specially because it’s not the first thing that come to your mind.
#13: Spending extra money, can save you time and money
Sometimes just spend those 10-20% extra on what you’re buying, it can save you a lot of time and money.
#14: Be open and transparent
Not something I’ve really learned, but just a tip: be transparent and open to your customers, tell them the actual issue.
#15: Don’t do everything for free
I’ve done a lot of work for people for free, where I should have charged them money for the work that have been done. But still help from time to time, it might be that people can help you some day.
#16: Admit failures, and get back up
I’ve had failures, and I will probably have more failures, admit you failed, and get back up. You often learn more by failures than anything else, and we’ll all fail, question is who is getting back up.
#17: Monitor using multiple tools
Using one tool of monitoring is not good, use multiple to make sure you spot problems when they happen.
#18: People don’t appreciate when stuff works, but complain when it doesn’t work
This is maybe mean to say, but it’s true. If you deliver a good service, no one really tell you, that you do, but as soon as you have 1 minute of downtime, people will tell the public how bad you are.
#19: Let people do what they’re good at (see #6)
There’s a lot of people, all have different knowledge, let people do what they’re good at, and do things you’re good at. Don’t do things you suck to do EXCEPT if you enjoy doing it, or want to learn it (see #8).
#20: Help your competitors, or at least don’t dislike them
Having competitors is good, you shouldn’t dislike them, rather be open, there’s no problem actually talking to your competitors from time to time, or maybe even help them if you can. In the end they have knowledge as well. And they don’t bite 🙂
—
Oh and by the way, you should really check out Paolo Iannelli’s website! He’s a good friend, a good co-worker, and a super awesome guy!