C'mon, it's just cruel to end a story that way and not link to the article that the Certified DBA wrote.
That would be hard because it in all likelihood doesn't exist.
like 80% of TDWTF, but never mind, it serves a good lesson, and it's fun to read!
And it's a shame, too. What a world it would be, if the internet was a little more faithful to real life. I'd love to meet some of those hot 19-year-old bisexual chicks.
Yup. I've had a submitted story published and it was heavily editorialized. The resulting story differed wildly from reality. Oh well, it's an entertainment site. What else would you expect?
A better story would end with Paul retrieving his testicles from the "Certified DBA"s purse and kicking some ass.
Agreed. The real WTF here is not telling the guy he was absolutely wrong and showing him the better way to do it, Certified DBA or not.
I'm a lowly database weenie in an organization where there is a "DBA Guru" at the central office. Years ago, I got into a tiff with this guy over licensing regarding dual instances of SQL Server 2k on a server that I was to be responsible for. He contended that we had to buy multiple licenses. I had just that day been researching the subject and told him that it was no longer the case that one had to buy a license for each instance on a server. You would have thought that I had disagreed with the pope.
Of course, I was right, not because I am a db god, but because I had just seen it on the Microsoft site. Despite my being correct and saving thousands of dollars, I continue to get sneers from this asshat for having corrected him. Unfortunately it has limited my ability to move up in this organization because any promotion would put me under him.
If you are going to call someone out like that,make sure you aren't shooting yourself in the foot by doing it.
I had this with a "senior architect" for a major bank I used to contract for.
You know what? Contract -- key word. It's not the way I want to work forever, but for the last ten years or so it's sure given me the best three words in response to some senior moron insisting on doing something in a certain, stupid way -- "it's your money".
Don't forget to smile on the way home.
Hourly billing really makes these sorts of things less rage inducing. First you get paid to implement their stupid design, and then you get paid again to fix everything when they realize it's stupid two months later.
OK about money at the end, but still, it's your life being poorly spent. Can't get time back, you see. (Just sayin', I did it, too...)
Sure, but you can spend your free time doing something cool with the money you earn doing stupid shit... it's the American way.
If you have any free time, that is.
I prefer the idea of billing high and doing things right the first time, then enjoy your life with all that money you just earned.
if there is such a thing as "doing things right". it is often a question of context - doing it technically right may not give politcal support within the organization; which will lead to even more problems then having tried the technically wrong solution which is politcally right.
You're looking too much into the wording. "Doing it right" means getting the job done without mistakes so you don't have to come back and fix your mistakes.
I agree. I contract but it's still very frustrating to do stupid stuff.
No, but you can just walk away and do other stuff in your job that you actually do enjoy.
If you're getting annoyed about work, you're doing something wrong.
i'm inclined to think you've never had a job before.
I have had several jobs that got me annoyed. Now I don't. I was doing something wrong before.
yes, but isn't it nicer to get a paycheck?
That made me smile.
I've most likely had a lot more experience than you in realizing how to deal with work. :)
smileyface, yourself. making absurd claims, and then eluding to your authority as justification for your "wisdom" does not make you appear to be a credible source.
I met a very capable engineer with your attitude. He was actually the kind o guy that you would want on you team, yet you would need a thick skin to deal with him.
Last I heard he was doing great.
I agree, but perhaps annoyed isn't the right word.
You can get annoyed at work. You can get annoyed with customers, bosses, coworkers, vendors, etc. Annoyance is a small emotion that you can feel in any job, just as easily as you can be pleased.
However, you shouldn't get enraged at work. If you can be pushed to anger over your job, you're doing something wrong. You're definitely too wrapped up in your work.
I don't think the opposite is true. People can have jobs that are their passion, such as the arts or helping other people.
If you are in your job because it's your passion, chances are you'd be doing it regardless of employment status. So if you find you can't fulfill your passion at work, you'll generally end up pursuing it outside of work (volunteering, side business, etc).
Meh, the moment you start letting it affect your life or health (insofar as you're not getting fired or otherwise materially impacted, in which case anything goes), it's too much.
I don't think the opposite is true.
I would never ever in a million billion years dispute this, because people who can be enthusiastic about their work are awesome and enviable.
meh, work time is a sunk cost.
And that's why I think consulting would be fucking AWESOME
If you are going to call someone out like that, do it with their management involved and either get them fired or marginalize their input. Then you don't have to care how they respond.
I could use a couple pointers on how to do this. I usually see escalating things to upper management as a dick move.
I hear you. A lot of it was my being 1) inexperienced and 2) an employee who worked "out there" and therefore unimportant.
The good news is that I am "out there" far enough that I am left alone for the most part. Most junior dba types don't have that luxury.
I've been there. Good luck man.
Over missing a licensing detail? That's a dick move that can backfire really quickly. Just send an email next time and don't walk around the department telling everyone how dumb he is and how much money your suggestion saved the company.
And if that doesn't work....tell his mom...that will really make him pay for disagreeing.
Standard career advice -- always try to make everybody else look good. If you think he's wrong, try to find a way to make it a positive. Something like "hey, your design is really great, and what's more, I think it's even better than you realize because I have heard that Microsoft recently changed their licensing requirements so that your design will save us some money". You still get the same thing accomplished, and you make a friend instead of an enemy.
Of course, some people are just dicks to work with.
Start looking for a new place. Don't bother with the existing place at all. Anyone as petty as your DBA Guru can never be won over (at least not without 5 or more years working on it).
There are ways to correct a situation like that without anyone having to lose face. I used to be terrible at understanding this, but years of working with fragile egos and ending up in situations like yours made me think a lot about ways to get things done without people losing face. Sometimes you have to sacrifice your own ego and even a bit of credit, but everyone wins in the end if you do it carefully. You can hate workplace politics all that you want, but hating them will never make them go away. Regardless of your feelings towards that guy, it sounds like that it's important that you at least get on his side. It doesn't have to mean selling out or doing stupid shit. You can play the game a bit while still being honest and true to yourself.
Maybe next time, just send him a link the next day and say "hey, good news, I was just reading and found out that apparently we might not need two licenses (unless I am reading this wrong.) Check it out." Say nothing to anyone else about how stupid he is or about how you figured out how to save the company a bunch of money. Keep it between you two. Let him take the credit. Now instead of making him look dumb over something that really isn't worth a whole lot (you read a link on a website...no one is really going to care about the numbers even if they should) you have shown to him that you can be a very good ally and a good person to have around and work with.
Good point. It's a job. Do what your boss tells you too. If someone bitches about it, point to him.
Not only that, but letting the guy write an article about how he was right so that other people who don't know will read it and learn something completely wrong.
Right. I wasn't sure of the moral here was about arrogant DBAs or how Paul's passive attitude hurt a lot of people in the long run. How many more sysadmins had to experience this because he couldn't chime up and just tell the guy he's nuts?
If he'd been 100% passive that would have been okay. He should never have fixed the problem. If you're stuck following directions in a situation like that, FOLLOW. Period. Eventually idiots will out.
Edit: I don't know who is giving the downvotes for this, but the SA should not have fixed the problem. The way he did it is the kind of thing that can get you fired - check some of the other comments in this thread. He did it on the "production" machine (assuming they had a test environment) and then he hid his work. Very very bad. He would have done himself and everyone else much better to just follow the contractor DBA's instructions to the letter, and ask his boss if he wanted him to take a crack at fixing it. If he wanted to say to his boss "I KNOW I can make this work", that would be cool, but unless it's above-board and Important People have bought in, you can't win. Let the jerks hoist themselves on their own petards.
Hey, its not all bad. he got a photo of one of his scripts in a magazine!
Unfortunately reality is often sad and depressing. Actually trying to be helpful is often seen as something entirely different. Often the first reaction is that the guy above you thinks your trying to undermine his position or authority. Or they simply look at you as something less than a team player.
In this case it looks like the guy involved here as a DBA is in a position to do whatever he wants with full backing of management those sorts of people are a problem when they don't know what they are doing.
In this case I'd have to say that he didn't know what he was doing or was basing his setup on outdated info. There are certainly advantages to distributing a system across mutiple spindles when done correctly but in this case there doesn't appear to be a rational plan.
The higher ups know less than the janitor about the systems. They will take the certified DBA's word over you every day. You may even be fired for being too dumb.
It's not worth the fight. His solution was best. Fix the problem and hide the fix. A week or two with everything fix, even if the DBA found out, when it runs like shit after his changes that would allow you to override the DBA and hire ups would trust that the DBA is an idiot.
I always used to correct people when they were wrong, because I thought it was better for the world (oh no, not for my ego!); but people don't appreciate it. I now think that a person can only learn when they are ready. Otherwise, it just annoys them (and frustrates you).
You know the saying When the student is ready, the teacher will appear ? It works because we are surrounded by people and experiences that can teach us; we just can't see them til we are ready. That's when they 'appear' to us.
You can correct people without damaging their ego. It just takes a degree of verbal finesse and a bit of fairly obvious psychology (golden rule, etc.)
That's an interesting point. I hadn't thought about it in terms of the other person's ego being hurt, and this would prevent them from learning.
Can you give an example of where you did this, and the other person actually learned? (maybe from your reddit history).
Haha...I can give you examples, but probably not from my reddit history. It's a lot more difficult in this venue as all that you have are typed words. Also, my investment here is a lot smaller than my investment at work.
I'll write some examples up later. Right now, gotta get to work.
OK, thanks. BTW I've had good results in teaching environments (tutoring comp-sci and teaching guitar privately) and sometimes online (when people are interested in learning.)
I think one reason teaching can fail in a non-teaching environment is because the "teacher" doesn't understand what the "student" is doing. For example, I've been swimming for years, and one day I was deliberately slapping my hands on the water as I swam (to work off some aggression and also to be free of technique for a while), and this guy stopped me to explain that that wasn't the most efficient way to swim. I tried to dissuade him, but he angrily snapped, "Why don't you listen, you might learn something". On second thoughts, maybe it was just that that guy was a wee bit psycho.
Seriously, the guy has rock solid proof that the DBA is incompetent. Go over his head and get him fired.
honestly, the sysadmin kinda screwed the pooch in the first place by even letting a dba dictate drive layout.. but that's another story.
There's a lot of potentially important context left out of the story, but IME when a sysadmin lets someone else walk all over their domain, it's because a boss somewhere told them they had to do it. I personally don't know any real sysadmins who don't have NO loosened in their holster and ready for use.
I hear ya... I rarely get to break out the "NO" hammer.
Unfortunately, it's been a long time since we sysadmins were actually allowed to administer systems; now we're glorified operators. It's really sad--no-one in a position of power has a good overall view of the systems our businesses rely upon.
In many cases, the lower guy on the totem pole will be given a pink slip because of the office politics involved. Part of it is because he's easier to replace and part of it is because the bosses lose face for not having noticed the problems earlier. Might as well try to get positive action by oiling up and wrestling the CEO during the next board meeting.
It would have been even better if he sent an e-mail to the DBA and BCCed it to the trade magazine.
The DBA is mixing up some concepts he got from a previous job where they must have had an advanced SAN, and someone who knew how to tune it.
There is such a thing as short stroking drives, and breaking your DB up into lots of little partitions on discrete spindles is an industry practice. The problem is he doesn't understand how you actually do those things correctly on the back end.
EDIT: By "discrete spindles" , I don't mean 1 partition = 1 spindle. I mean this set of 20 spindles = partition 1, 2nd set = partition 2, etc.
So, it's Cargo Cult DB Tuning!
That would be 99% of all tuning wouldn't it?
I don't see anything wrong with that.
Can't upvote hard enough!
Quite. One place I joined had an interesting setup where the database was strewn across a half dozen partitions to enhance performance. Which on the surface sounded OK, until you realized the database server was on a VM and the entire VM resided on a single drive. This was a case of the DBA being taught about using partitions across multiple disks to enhance performance and never really grasping the underlying machine's environment.
Or maybe it was consolidated into a VM after it was designed and built? Was it a new setup?
Newish setup. I got there about 10 months after it went online. Was built in place, not migrated.
even better when the SA does a p2v in production and doesn't tell anyone.
little partitions on discrete spindles is an industry practice.
Sure.... Discrete spindles in a SAN... Storage folk will be ever so happy to see you (leave their office)
What are you talking about? Except for wide striping systems, high performance DBs are very commonly seperated out into specific spindles per application within the SAN.
If I need SAN space for a heavy-hitting DB that does 10000 IOPs, I'm going to need 55 15k spindles to support that. Yes, with 72GB drives, I'd have to leave 980GB on the floor @ RAID 10 if my DB only needs 1TB. That's a fact of life in the SAN world.
I could stripe additional LUNs onto the unused space on those 55 drives, but then I lose IOPS for both applications due to head thrashing (about 10-20% penalty per additional striped LUN).
Nowadays you could use SSDs, right?
Yes, that's true. In fact, SSDs support so many IOPS, vendors are running into problems where the front end controllers can't keep up. We did a POC with an HP EVA 8400, and were able to bring the controllers down with only 12 SSDs behind them.
With traditional drives, an 8400 can sail along with 240 15k FC drives behind it with no troubles.
SSDs won't really dominate in SANs for a while since all the vendors are moving from FC to SAS drives and they're so much cheaper per GB/IOP/whatever metric you like.
Can you elaborate? Because I don't see how on earth rotating drives beat SSD in $/IOps.
I phrased that badly. Yes, you can get a Fusion IO card that does 40,000 IOPs for $5k. The problem is, the capacity on it is like 100GB when formatted for heavy writes. That's the top end performing SSD, and it doesn't scale well into arrays because its PCI-e.
The SSDs you actually see SAN vendors OEMing are Samsungs/STECs, which do more like 5,000 IOPs (in a real workload, not IOMeter scamming BS), but cost like $10k once they've had an FC interface added. (I'm sure this will come way down as SAN vendors move to SAS backplanes) There's all kinds of costs associated with SAN drives that you wouldn't see in an Intel drive you buy for direct attachment at Frys for $700.
In the SAN world, it seems to work out that the DBs that need really high IOPs also are multi-terabyte in size. So if I want to do a 2TB DB with 10,000 IOPs in 100GB SSDs, I need to spend like $200k, and I'm leaving 90k IOPs on the floor so I can cover the space needs. I can get 55 15k 300GB FC drives for more like $50k.
There's surely an argument in there about space/power/OPEX savings, but the CAPEX delta is just too high to get over.
I have buddies that work for EMC / Netapp / HP, and they'll tell you very few people are buying SAN based SSDs right now. The ones that are tend to surgically target the SSDs at things like TempDB, which are small, but can really use the speed.
To comment on that article, its very interesting at 3TB for $25k or whatever, but its not an enterprise grade rig that any shop would put a critical database on. Maybe someone will come along and make a product out of it, but that takes serious time and effort to gain respect around reliability that is the bread and butter of the SAN world. SANs developed from mainframe DASDIs, and the uptime expectations are still there.
Sure, in a 1k+ server, 500 TB+ env....
I'm not sure what your point is. A SAN consists of disks that can be sliced and diced various ways according to rules that don't change.
I guess really small shops that have an Equallogic or similar piece of crap will just have one disk pool with all the apps living inside it. Though I question whether they'd have their own Storage Guy with an office that he can throw me out of.
Different 'types' of DB files have different storage needs: anywhere between onlinelogs (redo) and infrequently accessed ('old') data.
Onlinelogs are some 100Gs, and you want them FAST. SSD fast (They'll kill you EVA6K as shown beneath)
Data can wait a bit, 10K FC will do, correctly configured RAID'n'stuff.
You'll end up with such an unmanageble mixture, that it will ABSOLUTELY not make up for the 10% gain you are about to archive.
The storage blokes that like to see you (leave their office) know that (There's no S to know.. notice? There are MORE than One).
They'll simply tell you that your exorbitant demands can not be fulfilled, for the sake of manageability.
Why? Because they know, just like you, 10% gain is NOT worth that PITA, especially knowing that 80% gain is possible, by having a correct application. That's why.
I am a storage guy myself. I run a 600TB, 200 FC node SAN. Its not an exhoribtant demand to lay out a transactional DB the correct way, with transaction logs broken apart from tempdb, then the data laid out yet a third way. Its industry standard. And you can absolutely get more than a 10% performance gain by doing this.
I've taken DBs that were on their knees and cut their job time into a fourth of what it used to be by laying out partitions correctly, doing correct block alignment, etc. Usually the DBAs have little concept of IOPS/latency/queuing.
I won't argue that the apps are usually written poorly, and there are far more gains to be found there. For whatever reason, that seems to be organizationally more difficult to get done than to throw hardware at the problem (in the large orgs I've been in at least).
Throwing hardware seems the industry standard, but throwing software (out of the window, that is) is what should be done.
Fine we agree at last :-)
Usually the DBAs have little concept of IOPS/latency/queuing.
Do not be offended, but generally they seem to do know a wee bit of that. More than the app-huys at least (then again: you not be offended!) Thing is: Once the matter is in their (-->DBA's) hands, it's too late to be saved, if you know what I mean.
tempdb.... Either sybase (ASE) or sql-server... I thought we were discussing DATABASES here?
Do not be offended, but generally they seem to do know a wee bit of that
The good ones do. The Oracle guys usually. Like you point out, about 60% of the DBAs I have to deal with are MSSQL specialists. I'm just really tired of handing them a list of questions I need answered to build their shiny new setup, and I get back "What's a LUN? Whats an IOP?".
Look at the original article we are discussing. I've encountered a lot of DBAs just like that who sort of have a half-baked knowledge of the storage side. Its based on what they've seen at previous gigs, but they don't really know the underlying reason it was done. So you wind up with bastardized implementations.
Rule #1: No pooftas
Rule #2: Do not talk like 'you know shit' about stuff you don't know shit about.
Rule #3: No pooftas
Rule #4: hardware can do good, software can do bad.
Rule #5: No pooftas
Rule #6: ...
or you could do:
alter system set fast=true;
You're the guy from the story, right?
Please feel free to point out any inaccuracy in what I've stated.
It was a bit sarcastic but it was supposed to be humorous. I'm sorry, I apparently forgot that fiddling with you DB just drained all your humor. Maybe you're just grown up. I shouldn't have try to be funny. I see that now. No one makes fun of your LUNs and just walks away. This is serious stuff. Yeah ... what ever ... have a happy life
If I need SAN space for a heavy-hitting DB that does 10000 IOPs, I'm going to need 55 15k spindles to support that. Yes, with 72GB drives, I'd have to leave 980GB on the floor @ RAID 10 if my DB only needs 1TB. That's a fact of life in the SAN world.
For a database that size, wouldn't the entire thing be in system memory anyway?
There aren't a lot of x86/x64 servers out there that hold a TB of RAM. I can't think of any offhand.
The most RAM I've ever personally seen in a non ia64 server was 512GB.
If it needs to be backed on permanent storage and the database actually does a fsync for every COMMIT then the disk performance is still a problem for writes.
Servers with >1 Terrabyte of RAM?
Not quite yet...
I worked with such people before. Certifiable for sure.
I don't understand, I thought partitions allowed load distribution and thus reduce latency. The first scenario of 10 partitions on each disk (10x3 disks and 10x3 for mirror) should have worked well if the distribution of data (hash) is properly set. The create table statements at my workplace use this. A single partition per disk would help faster reads. Why didn't this work? On the contrary single disk with all partitions would be a bottleneck once inserts and updates keep happening.
P.S: I am not a DBA, just a lowly developer
There is one immutable fact about HDD's on a spindle, there is a finite amount of IO, seeking is the devil. So if you put stuff all over the hard drive and do NOT allow the OS to optimize the disk layout then seeking will be through the ROOF.
The problem here is the design was almost maximized for increasing seek time for the simplest of DB operations. As there was apparently on the first one 60 (exaggeration I'm sure) on one raidset. The second time they used faster HDD's and still had multiple partitions per disk set.
Except that OS doesn't optimize disk layout.
I think he meant file layout, ie. defragmenting.
What particular OS do you mean?
As far as I know, standard defragmenter on Windows NT 5 (2k, XP) doesn't really optimize anything.
And on Linux people rarely use defrag at all.
A defrag tool is unnecessary on most modern filesystems. NTFS isn't really a modern filesystem.
Not defragmenting. Fragmentation avoidance. Do not become fragmented in the first place. A large part of modern filesystems is doing this.
And how is this related to optimization?
The OS already does a good job (or attempts to) in keeping things defragmented to avoid seeks. By manually partitioning and striping back and forth you are actively working against the fragmentation avoidance built into the OS, thus the actual layout of the data on disks gets more fragmented.
More fragmented means more disk seeks, as anyone optimizing disk performance will tell you.
Now that doesn't mean that you should always just use one big volume, but it does mean that you must know the access pattern of your data being accessed, as well as know exactly how an application read goes through the filesystem and RAID layers, getting its sector translated along the way, in order to minimize disk seek (and other factors).
The OS already does a good job (or attempts to) in keeping things defragmented to avoid seeks.
By manually partitioning and striping back and forth you are actively working against the fragmentation avoidance built into the OS, thus the actual layout of the data on disks gets more fragmented.
Only if you stripe partitions from the same drive, but it is absolutely idiotic thing to do.
If you just make partitions and stripe partitions on different drives, result will be AT LEAST as good as on single drive for sequential read of one file.
And if by optimization you mean not fucking things up, that's pretty low standards. You can do more with specialized defrag software -- e.g. keep files from same dir together, keep frequently used files together etc. But I haven't yet seen OS which does it automatically, this requires installing 3rd party defrag tool and manually choosing mode of operation.
Now that doesn't mean that you should always just use one big volume, but it does mean that you must know the access pattern of your data being accessed,
Yes, but OS does not know these patterns, does not measure these patterns and does not optimize according to them. That's what I'm trying to say.
No. Really. There was nothing to debate about that sentence.
Only if you stripe partitions from the same drive
That is completely irrelevant however. (besides the fact that it assumes that they aren't on the same bus, etc...)
More relevant is a read of a file: 10ms seek plus some negligible time reading. If you cross a stripe boundary you'll cause a seek on a second disk. You just caused two seeks instead of just one. You have X drives that can do 100 seeks/sec each.
If you stripe you can cause one read to consume two or more seeks in this system.
In other words instead of "locking" one disk for 10ms plus 2ms you locked two disks for 11ms each. Sure, that one read was probably faster (well, if it was a large file. But then "2ms" is off), but you have a thousand others and you just used twice the resources to speed up one request by 1ms. Your system can now handle less.
But yes, other factors come in that you mentioned. But just going raid (raid0 in this example) will NOT magically make your system able to handle more concurrent requests.
Yes, but OS does not know these patterns,
It does optimize for some patterns ("keep files from same dir together" comes to mind). But yes, if you have other patterns then you may need to change it. But you need to know what the OS does too.
I'm really not sure what you mean by that. Many file systems do in fact optimize the disk layout. On grown up systems anyways.
Mechanical HDD's consist of platters and possibly multiple read heads, so I would imagine if one were to try to partition for load distribution, you'd need to consider the actual physical architecture of the drive? And then considering this thing would be in RAID - where the partition could be fanned across multiple disks (and physical platters), ... gives me a headache just thinking about this.
Usually it looks like this, so you don't need a lot of details -- just put frequently used stuff to the beginning. I.e. advice to use only first 25% of the disk was quite sane.
Except for the fact that the dropoff is almost completely negligible before 50%. He's throwing out half the baby with the bathwater.
I once met a DBA that had never heard of stored procedures. True story.
I worked with a certified DBA who couldn't even construct a simple SQL Select statement. True story.
Even truer story. He was later fired for incompetence. The guy couldn't even punch down ethernet cables.
After seeing some patch jobs on Wacom products I don't know if that particular analogy holds as much water as you would expect.
Why would anyone ask a DBA to punch down ethernet cables?
We're a small IT consultancy and he wasn't hired to explicitly be a DBA, but he was certified for it. He was hired to do contractual networking and such (don't ask me for the details too much, I work with the contractual software development side of things).
I guess that was the mistake. A good DBA pretty much only does DBA work. He probably can program well enough to make trouble, and probably knows a good deal sysadmin stuff, but I have never met a DBA/Network engineer.
Also "Certification" is highly over rated.
He also had certifications in other areas and had previously worked doing networking for another consultancy.
He didn't program at all, and knew almost no sysadmin stuff.
Like I said, the guy was pretty much an incompetent who had an impressive sounding resume.
The guy couldn't even punch down ethernet cables.
Hey, I only did that for the first time a couple months back, been in IT for 15+ years!
Yeah, but I'm sure you were able to grasp the concept and perform the task with a minimal of instruction.
We had shown this guy several times before he went out to the client's site and then bitched about never having done it or having been shown how to do it once he got there.
One thing to be incompetent, even worse to act that way in front of a client
Silly question I'm sure: what does it mean to "punch down" an ethernet cable? Something to do with crimping the wires into a jack?
Who certified him? You wouldn't even be able to get Oracle Master Bronze without that kind of knowledge.
It was a SQL Server 2000 DBA certification (IIRC).
I once met a certified DBA that thought cucumbers taste better pickled.
That goes against all the certification stands for. Appalling.
The sysdamin was stupid for not showing the DBA his configuration worked better. Now the DBA moves on to the next job, increasingly confident that he is right. Even if the next sysadmin seemingly proves him wrong, what is the DBA to think? That the last sysadmin created an elaborate hoax to make him think he was right? What are the odds?
This sysadmin should be fired for making the world worse, not better. At least the DBA has ignorance as an excuse.
You've apparently never worked IT in a structured corporate environment like the one described in that article. I can tell you exacty how that would go:
a) Paul demonstrates new environment to the DBA. DBA goes nuts, runs to Paul's boss and tells Paul's boss (who undoubtedly thinks the DBA walks on water) that Paul has been unilaterally making unauthorized modifications to production systems and Paul gets fired for it.
b) Paul demonstrates new environment to his boss. 50/50 chance he gets fired for unilaterally making modifications to production systems. If he doesn't get fired, still has a black mark on his record for messing around with a production system without permission, and the DBA will hate him and actively campaign to get rid of him.
What Paul SHOULD have done was build a parallel system and then run benchmarks on both, and then go to his boss and show his boss the results. This way he can't be in trouble for messing with a production system, and by going to his boss first, he's enlisting an ally ahead of time instead of trying to get one after the shit hits the fan.
Paul should have a boss that is technical as well and not just a suit. This person can listen to Paul and tell the DBA how retarded the idea is. This is how it would go at my work. I agree with the parallel build though.
Also, people should be nice to each other, and not make mistakes.
I want a pony!
In an ideal world perhaps. And scientists and others who advance civilization should also get paid more than someone who can throw a ball through a hoop from 30' better than other people who can throw a ball through a hoop at 30', but that's not our reality.
In most corporate environments the guy ultimately in charge of IT seldom has an IT background. To an extent, this is changing, but slowly.
But thats my point, that business is setup to fail at IT. That business and many others need to change.
Many businesses see IT as just one piece, like shipping/receiving or sales or marketing. In some places they even see it as a utility like the phone system or the power/lights. Places like this don't care about changing until they are forced to, and even then the only "change" they make is to outsource it all to someone they can sue if there's a screwup.
I get the feeling we actually agree but I looking at two different sides. I'm saying how I think things should be while you are saying how most things are. I can't actually see anything in your post thats wrong or disagreeable but the thought of working for such a company as you describe makes me sad.
Then you're very lucky to be working where you are, so definitely enjoy it.
I think what surprises a lot of people until they actually think about it is that there are tons of IT jobs in non-IT companies. I've worked in environments that run the whole gamut from a full tech shop where the president used to be a sysadmin and had an intricate understanding of any issue that came up, to taking care of a data-retrieval system for a bunch of rednecks in oilfield trucks who don't like computers and couldn't give a crap about WHY it doesn't work, they just want it fixed NOW! Unfortunately a considerable number of the IT jobs fall into the latter category. It's just something you have to learn to deal with unless you're fortunate enough to spend your whole career at tech companies.
The technical has to stop somewhere along the chain of command.
Not at a tech company.
Yes, but Paul should have a manager or supervisor who is technical who can lobby for Paul's clearly better idea to the person who sits above both the sysadmins and the dbas. I'm just relating how things are at my company and I can't imagine a thing like this story happening. Too many other admins, engineers, and architects would weigh in and talk to management.
Count yourself very lucky.
Your ideas intrigue me, do you have a newsletter that I may subscribe too?
Yes, I have worked in such an environment, so I know what I'm talking about.
From my reading of the article, Paul did set up a parallel system, at least as far as the disks go. He didn't duplicated the database, no, but he can easily migrate the system back to the other disks.
Paul has no fear of being fired for doing this. The Boss doesn't care about the details; he only cares about results. The system runs better; Paul will be fine, though the DBA will get credit. He could go to the Boss ahead of time; that can be both good and bad, depending on the Boss.
The Boss doesn't care about the details; he only cares about results.
Google does not help me with what 'YBMV' means. Please help?
You Broke My Volkswagon
Sounds like YMMV, with Boss in there..
That makes sense. :)
Your Boss May Vary.
Bosses in the wild can and often do diverge from what is expected or even what makes sense. To claim Paul has no reason to be concerned about being fired is a theoretical position that may not hold up in the real world. Appeals based on having read otherwise on Reddit may not help.
Say it isn't so!
Your bileage may vary.
My guess would be your boss may vary.
Regarding point 2, apparently you didn't read the article. If Paul had set up a parallel system, why did he modify 'df' to report false data in a panic to avoid discovery on the production system? Also, from the article:
"When the system was brought back online – using a single drive and a single partition"
Sounds like the production system to me. I can sympathize with Paul's perspective, but if anyone who worked for me did that to prove a point, no matter how right they were, I'd can their ass like tuna on the spot for using a production box for a test.
Again, I said "as far as the disks go". Yes, he brought back the production system under the new disks. But did so with "a pair of the 10,000 RPM drive". The point is the old hardware is still available for reversion to the DBA system should the need arise.
The DBA who set up the system was already "using a production box for a test", as evidenced from the article. So this is not a company where such things are tested in advance.
You made me chuckle. Yes technical everything can be put back. However... many a slip between cup and lip. If something goes wrong with the "put it back together" implementation then the result for you could be much worse. Not only did you futz with production unilaterally, your backup/recovery plan failed... might as well kiss your prospects at that company goodbye (political shit-storms stick to you for a long long time).
Two further points:
Yes the old disks exist, and they have OLD data on them. At the bare minimum, a backup/restore operation will be needed after hours, and I can tell you from experience that moving a backup from a monolithic partition into this ten partition hellhole won't be straightforward. This won't happen transparently. As well, just as a "what if", what happens if something bad happened to those drives that were supposed to be safe in a server? I've seen people in a server room take a drive out and immediately drop it on the floor from 6 feet up. It can happen. Yet another reason not to screw with a production box.
second : "The DBA who set up the system was already "using a production box for a test", as evidenced from the article."
No. They set up a new system that was put into production. You'll notice key phrases like "Per his request, the server Paul was to set up" and "The application launched a couple months later".
And that's why all companies do well.
Then the Software Engineering Lead goes to the DBA...
"We're going with Hadoop."
As satire, using stereotypes, it got a chuckle.
As anything applicable in reality, no way. Work with databases for fifteen years, and you end up acting very, very much like The Certified DBA.
It gets real old having an SA or a vendor consultant redo your physical layout design because they just read something in WindowsITPro magazine or MySQLMonthly. After a while, you just sigh and point at the framed certification.
My fear is that this article will lead a lot of SA's to jump right in and start moving partitions around. After all, they read it on reddit...
On the other hand- most of the deeply rewarding successes I've had have been team efforts, with development and infrastructure working together.
The whole idea of the 'Do this, I said so,' model of IT management is pretty passe.
Pre-production proof-of-concepts and metrics don't show political favoritism, and best practices float right up. Performance problems should be identified with robust load testing weeks or months before release- and a successful IT group are partners at that level.
As anything applicable in reality, no way. Work with databases for fifteen years, and you end up acting very, very much like The Certified DBA.
Excuse my interjection, but that's precisely the problem with a lot of DBAs. I've never met a DBA that wasn't 10-20 years my senior. Granted, while that leads to a lot of experience, it also leads to a lot of ancient knowledge in IT terms. Things that used to matter don't matter so much anymore. Things that matter now they can't seem to grasp as readily.
I had one guy at a previous job send me a very detailed storage diagram (much like the guy in the linked WTF). However, the assumptions were based on infrastructure and SAN architecture from 10 years ago. When I started to explain to him that everything was VMs now and you really don't buy a whole lot when you place partitions on certain areas of the disk (since the VMs share LUNs and the LUNs were distributed over several areas of the SAN and they all shared the same controllers) he went out of his way to demand a special "old school" hardware setup just for his project. When I pressed him, he couldn't quantify how much dedicated hardware would improve performance, or even if it would improve performance at all. Meanwhile, the projected costs ballooned 3x for an undetermined "performance increase". He got shot down by management because he couldn't make a coherent business case for his design.
On the other hand- most of the deeply rewarding successes I've had have been team efforts, with development and infrastructure working together. The whole idea of the 'Do this, I said so,' model of IT management is pretty passe.
It's good that you feel that way. A lot of DBA and devs I've worked with can't seem to grasp the idea. A fair number come to the SAs with a seemingly universal "do this, I've been doing it like this for 15 years" mentality. The problem is that a lot changes in 15 years.
As a DBA I have to agree, you need to know exactly what the crap is going on all levels before you can tune and that includes knowing what the application is doing. The amount of hardware tuning is a bit silly unless you are Google search engine app.
I'm a dba too, and I just tell the 'sa' that I need XX gig on raid 10 and XX gig on raid 5, and it's going to be high volume or backup or archive and I get disk that fits what's needed.
experience... that's the only thing that counts. And curiosity, and the ability to say " Sorry, My fault, didn't now that, I don't know and such"
After a while, you just sigh and point at the framed certification.
That's the problem. You should be pointing at benchmarks.
Read down a bit in OP. It was a humorous exaggeration. In reality, I base my practice on metrics.
The real WTF is that TCDBA gets to publish an article perpetuating the bullshit.
He out DBAed the DBA
I wouldn't blame the DBA for thinking he's right. I'm pretty sure we've all experienced something similar. I'll freely admit that I work mostly with databases and I have been wrong. I also wouldn't fault poor Paul. I know for a fact that even though databases come naturally to me they seem to really confuse a whole lot of people and if you're going up against a specialist whose an angry retard (many of us are) you want to be 110% sure. Especially if you're new to the industry.
The fault lies with their manager who should have realized what I just wrote up and stepped in and MANAGED the situation. Apparently the manager was not up to specifications.
Managers who manage are fairly rare. And managers who manage properly are ever more rare.
"Millionaire models are rare enough; but, by Jove, model millionaires are rarer still!"
— Oscar Wilde
Sorry, your post just jumped out at me as very Wildeish
do you wish the postings on Daily WTF start with a short summary ?
like, the punch line?
Isn't that why Oracle invented ASM?
The head of the database department for a big company once told me that there are two things they never use. ASM and RAC. They are almost never worth it.
Considering that I've yet to see a stable setup of them, I would think there's some truth in that.
That must be an old story since both ASM, and RAC are now Oracle best practices.
I am a DBA, it's not like I am some guy who knew a DBA once. In practice ASM and RAC are something I don't like working without. Also I am not sure what kind of clustering would be preferable to RAC.
2 years ago.
But to be fair, most of the systems crashing and burning that I've seen were Ericsson systems (as in sold by, not running at). And everything they do sucks.
On that subject, interesting reading for DB in RAID,
"Disk Partition Alignment for SQL Server "
I agree with the sentiment about DBAs. However, the solution trades poor disk management for a single point of failure.
I agree with the sentiment about DBAs. However, the solution trades poor disk management for a single point of failure.
Nope. The original design was the first set (disks 1, 3, 5) mirrored with the second set (2,4,6). This is Raid 0+1 - and it doesn't qualify for the "Raid 0" bit since the drives aren't striped.
A better design is paring disks 1 and 2, 3 and 4, and 5 and 6, and placing these pairs into an array. This is known as Raid 1+0, and has better resiliance if a second drive in the set fails. See http://www.aput.net/~jheiss/raid10/ for more information on the math.
Of course, the hard drive layout wasn't the issue. A properly done raid will not impact performance.
The bad design is splitting the database across partitions. Depending on how the database works, it requires a minimum seek time when you jump across disks. The DBA was strictly focusing on throwing more hardware at it, which won't help if there's major bottlenecks (and his major recommendation, 10000 RPM instead of 7200 RPM, is moot since you don't measure performance using that metric.)
Thanks for the info, though I'm not sure which part you're disagreeing with. Perhaps I misunderstood - I read the first arrangement as mirroring the concatenated volumes. If a drive fails, that volume would be lost, but the data would be mirrored in the second volume.
By single point of failure, I was referring to the solution using only one disk and therefore being susceptible to interruption if the disk fails.
The sysadmin's solution used two mirrored disks, which is the minimum required for critical systems.
This reminds me of something I used to work on
This article was TechKnowledge Document ID: 33368 The information in this article applies to: Navision 3.01, Navision 3.10, Navision 3.60A, Navision 3.70
Navision Server Hardware Recommendations
Your hardware has a big effect on the performance you can expect and determining the hardware requirements is not an exact science. Performance depends on many different factors including number of users, activity of users, number of transactions, volume of data, when and how certain tasks are performed, and how the program is written.
Follow the hints below to help you make decisions on the hardware for the Navision Financials server. The recommendations are very different for the native (C/SIDE) Navision server compared to the SQL Server Option. This information supplements the Installation and System Setup manual.
A. The performance of the server depends on the following resources in this order:
Note See TechKnowledge 33352 for a IBM Performance Report detailing the same findings.
B. HDD subsystem recommendations:
Use SCSI/RAID controller with as many SCSI channels as possible.
If the disk controller has memory (caching), make sure that there is a battery on the controller.
Use RAID1 (disk mirroring), if you require extra resilience.
NEVER use RAID5.
NEVER use software RAID; you must use hardware RAID.
Turn off write back cache on the controller. Use all controller memory for read cache.
Use 4GB SCSI disks for building your storage space. See paragraph (C) for details.
C. Disk organization:
Disk0 - System disk 4GB partition, Windows, programs, binaries, utilities, Navision client
Use NTFS file system format. NEVER put the Windows pagefile on the database disk(s).
Disk1 - Database(s) disk 1 2 x 2GB partitions.
Disk2 - Database(s) disk 2 2 x 2GB partitions.
DiskN - Database(s) disk N 2 x 2GB partitions.
On the database disks ( Disk1 to DiskN), the first partition is used for a "life" database part. The second partition is not used or can be used for "backup" database part or for test system or any other non-busy usage.
If you want to have a system that can store up to a 6GB Navision database, you will want 4 x 4GB disks (or 8 x 4GB disks if RAID1 is used).
See TechKnowledge 33361 for an explanation on how to use the "unattended backup" database partition.
Database files must be the SAME SIZE on all disks. For example, if a 2.1 GB database is placed over 3 disks, use 3 * 700MB parts. If the same database is expanded to 2.4 GB, expand 100MB per partition, making it 3 * 800MB parts.
If you change the number of disks (database parts) you MUST do the following:
Make a backup.
Delete the database.
Create a new database with the same database file parts sizes.
Restore the backup.
D. Allocate all available memory to the Navision Server cache. Use commitcache to speed up insert transactions.
The installation program allocates approximately 2/3 of physical memory to the server cache. You must change the server parameter CACHE.
The installation program does not activate the commitcache. You must change the server parameter COMMITCACHE.
If you activate commitcache, make sure that you use UPS to back up power failures (you may lose transactions from commitcache that have not been flushed to the disks).
Memory is a way to decrease the harddisks' bottleneck.
Use as much RAM as possible. Generally, use at least 4 - 8 MB of memory per user for cache. Plan for approximately 200MB cache for a 30 user system (256MB system RAM at least) or more, because memory is rather inexpensive.
The maximum Navision Server cache is 1GB. Therefore, there is no advantage to purchasing more than 2048MB of RAM, leaving 1GB for Windows and 1GB for Navision.
MAKE SURE that the computer is not swapping, for example, after you increase the cache size.
E. Use a DEDICATED Navision server that is a stand-alone server (not PDC or BDC). If you have a non-dedicated Navision server computer, make sure that the programs are not competing for resources. NEVER run SQL server or Exchange server on the same computer with Navision server.
F. Use a single processor computer. Allow Windows NT to use processor cache fully.
G. See TechKnowledge 33353 for information on low bandwidth constraints.
Note Allowing low bandwidth connections for some users can impose the risk of very bad performance, but if you must do this, do NOT allow those users to modify/insert records. If a low bandwidth client processes data, tables may be locked for a longer period of time, locking every other user and slowing down the whole system.
you either don't work in the industry, or you're new here. I have met and worked with people this dumb.
I've worked for people this dumb.
I've hired people this dumb.
And dumber ones as well...