Stories from the Software Engineering Suck
Friday, January 28, 2022
Tuesday, December 21, 2021
Ruby and MS Excel Woes: How I Learned Too Much about UTF8 BOMs
Description
What could go wrong? I was trying to export an Excel spreadsheet to a CSV for an anonymous Fortune 500 quasi-government/quasi-private corporation.
| Asset ID | Software Asset Name | Other Column 1 | Other Column 2 |
| SAN12344 | Some App | | |
| SAN8272 | Some Other App | | |
Code
Can you spot the bug? 😜 No exceptions are thrown, but the resulting `inventory_list` is an empty array. Unit tests with some manually generated CSVs pass.
csv = CSV.open(path, headers: true, skip_blanks: true)
@inventory_list = csv
.to_a
.map(&:to_h)
.select { |row| row.key?('Asset ID') }
.map { |row| row.transform_keys { |k| k && k.strip } }
.map { |row| row.transform_values { |v| v&.strip } }
.map { |row| row.delete_if { |k, v| k.nil? || k.empty? || v.nil? || v.empty? } }
Fix
csv = CSV.open(path, 'r:bom|utf-8', headers: true, skip_blanks: true)
Catch the change?
Root Cause
$ xxd software_inventory_list_bom_spec.csv | head -1
00000000: efbb bf41 7373 6574 2049 442c 2053 6f66 ...Asset ID, Sof
Turns out the file has a UTF-8 BOM! I've only heard of these, and what little I knew was that it's supposed to be handled transparently by whatever language's file parsing library, if it supports UTF-8.
And, turns out, MS Excel (2016, to be more specific), inserts a BOM if you select the "CSV UTF8" option in "Save As"!
And even vim and Notepad++ auto-detected the BOM, so I didn't even think that my Ruby code was somehow the culprit.
Well, after too many hours debugging to admit (the lag against ssh through Chrome, through Remote Desktop, is pretty high), it turns out that by default (at least on Amazon Linux 2, Ruby 2.8), the BOM is not auto-detected and just included as part of the "Asset ID" column.
The bytes 0xEF,0xBB,0xBF bytes were included in the string's! And worse, irb didn't even render as a characters! It didn't render them at all, so you can't tell by printing out the strings to the console that it's different. The only way to tell that the string "Asset ID" had those three extra bytes in the beginning was to do a byte count. 🤦
It turns out, if you ask Excel to export a CSV by using the "CSV UTF8" option, it'll insert a BOM, BUT it won't insert a BOM if you use the "CSV" option, the one that doesn't mention UTF8. What confusing UI...
If you want to insert a BOM in a file for testing (and, say, make a unit test out of it like I did 😼 ), there's a vim command for it:
vim -e -s +"set bomb|set encoding=utf-8|wq" filename
vim -e -s +"set bomb|set encoding=utf-8|wq" filename
Take Aways
- A CSV seems simple on the outside, like a pumpkin, nice and smooth on the outside, but inside can get pretty gnarly.
- When in MS Excel, make sure you know if you're exporting a CSV with or without a BOM
- the file util is helpful
- Ruby doesn't automatically handle a BOM. You have to ask it to check to see if it seems a BOM, which seems like a strange default.
Extra Reading
- https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
- https://estl.tech/of-ruby-and-hidden-csv-characters-ef482c679b35
- https://stackoverflow.com/q/543225/423943
- https://theonemanitdepartment.wordpress.com/2014/12/15/the-absolute-minimum-everyone-working-with-data-absolutely-positively-must-know-about-file-types-encoding-delimiters-and-data-types-no-excuses/
Tuesday, April 25, 2017
Why I Started Lying in Tech Interviews--And You Should Too!
"Do you have experience with the Microsoft Word?"
I didn't start lying at this point, although I did giggle a little to myself quietly at the poor grammar coming from an HR drone. I politely pointed out that, with a Bachelor's in Computer Science and two years of corporate IT experience, I had written many English papers, many lab reports, and many corporate reports in Microsoft Word."Have you ever taught a class in Microsoft Excel?"
I still remember the exact moment that I first lied during an interview. This was it. At this point in my career, I had more than 30 credits of teaching experience in higher ed. I had taught classes in databases, Java, C++, and even algebra. I even had a Master's in Computer Science at this point.No, technically, I had never taught a class in Excel. What did I say? What do you think I said?
"Yes--twice."
"Imagine an infinite chess board."
Oh, please! You can probably tell how dismayed I was upon hearing this question. This was supposedly from a company that specializes in interviewing (*cough*, Karat, *cough*).It was an algorithms question, and you're tasked with figuring out the algebraic formula for the location of a knight, or some such non-sense.
Business value? 0.
Easy to grade? Yes.
I would argue that actually playing chess with someone and asking them to explain their thinking would give you more insight into their understanding of algorithms. But that kind of thing is time-consuming, and it's much easier to ask someone to imagine an infinite chess board.
I don't fill out the stupid tests anymore
For my last round of job searches, I took about 8 hours of online "entrance" exams, ranging from Karat to TekSystems. They range from basic Java questions which are easily googl'able, to useless "infinite chess board" puzzles, to obscure details of the Java 8 DateTime API.
If you don't cheat and plug the code examples into your IDE or just google for answers, you'll get a low score and the non-technical recruiter will think you're an idiot.
And how many interviews did I get from those exams? Zero. Although I did get a lot of compliments on how highly I scored.
I would have been better off writing cover letters or making more phone calls. Never again.
Misleading Job Descriptions
One time, I got handed a BlackBerry for off-hours support a few weeks into a job for a certain cable company. I ask, "What is this?" They said, "It's your turn." It wasn't on the job description, and they conventionally forgot to mention it during the interview.
You realize the job description isn't thought through very well, and/or poorly worded (sometimes intentionally), and if that's gonna keep you from even interviewing from the job you want, well, two can play that game.
Bad interviewers
Let's be honest: most of are us are bad interviewers. We've never got any training, so we cargo-cult-it and just ask questions we were asked. At one startup I worked for, we went through (i.e., hired and fired) two employees in the year since I started!
"What would you do if you were 6-inches tall and were thrown in a blender that's gonna turn on in 15 seconds?"
Those questions were pretty popular in the 90s and early 00's, but thankfully people have mostly stopped, although I still get the occasional question about buildings with infinite floors and such nonsense.
Conclusion
I feel like the problem has gotten worse, with many more toolkits, buzzwords, and technologies than even 10 years ago. (Do you know Kafka? You should know Kafka. And node.js)You can get certainly get a job not knowing these things, but not the good ones.
I think the problem is deep enough, with bad interviewers asking bad questions, that the only solution is for everyone to lie. Eventually, once the amount of lies reaches a critical turning point, interviewers will need to be trained to ask better questions.
Tuesday, November 15, 2016
What's the Difference Between a Computer Programmer and Software Developer? Trick Question
You could argue there's a difference between Web Developers and Software Developers. One is primarily front-end, and the other is a combination, but primarily backend. (Even this distinction is slowly going away as JavaScript enters the back end via Node.js.)
But Computer Programmers and Software Developers? I can't think of a difference.
Enter the United States Department of Labor (technically the BLS, Bureau of Labor Standards):
http://www.bls.gov/ooh/computer-and-information-technology/home.htm
Boy, where do I start?
But Computer Programmers and Software Developers? I can't think of a difference.
Enter the United States Department of Labor (technically the BLS, Bureau of Labor Standards):
OCCUPATION | JOB SUMMARY | ENTRY-LEVEL EDUCATION | 2015 MEDIAN PAY | |
---|---|---|---|---|
Computer and Information Research Scientists |
Computer and information research scientists invent and design new approaches to computing technology and find innovative uses for existing technology. They study and solve complex problems in computing for business, medicine, science, and other fields.
| Doctoral or professional degree | $110,620 | |
Computer Network Architects |
Computer network architects design and build data communication networks, including local area networks (LANs), wide area networks (WANs), and intranets. These networks range from small connections between two offices to next-generation networking capabilities such as a cloud infrastructure that serves multiple customers.
| Bachelor's degree | $100,240 | |
Computer Programmers |
Computer programmers write and test code that allows computer applications and software programs to function properly. They turn the program designs created by software developers and engineers into instructions that a computer can follow.
| Bachelor's degree | $79,530 | |
Computer Support Specialists |
Computer support specialists provide help and advice to people and organizations using computer software or equipment. Some, called computer network support specialists, support information technology (IT) employees within their organization. Others, called computer user support specialists, assist non-IT users who are having computer problems.
| See How to Become One | $51,470 | |
Computer Systems Analysts |
Computer systems analysts study an organization’s current computer systems and procedures and design information systems solutions to help the organization operate more efficiently and effectively. They bring business and information technology (IT) together by understanding the needs and limitations of both.
| Bachelor's degree | $85,800 | |
Database Administrators |
Database administrators (DBAs) use specialized software to store and organize data, such as financial information and customer shipping records. They make sure that data are available to users and are secure from unauthorized access.
| Bachelor's degree | $81,710 | |
Information Security Analysts |
Information security analysts plan and carry out security measures to protect an organization’s computer networks and systems. Their responsibilities are continually expanding as the number of cyberattacks increases.
| Bachelor's degree | $90,120 | |
Network and Computer Systems Administrators |
Computer networks are critical parts of almost every organization. Network and computer systems administrators are responsible for the day-to-day operation of these networks.
| Bachelor's degree | $77,810 | |
Software Developers |
Software developers are the creative minds behind computer programs. Some develop the applications that allow people to do specific tasks on a computer or another device. Others develop the underlying systems that run the devices or that control networks.
| Bachelor's degree | $100,690 | |
Web Developers |
Web developers design and create websites. They are responsible for the look of the site. They are also responsible for the site’s technical aspects, such as its performance and capacity, which are measures of a website’s speed and how much traffic the site can handle. In addition, web developers may create content for the site.
| Associate's degree | $64,970 |
http://www.bls.gov/ooh/computer-and-information-technology/home.htm
Boy, where do I start?
- Not only is this distinction very dubious to begin with, but they managed to come up with different average salaries! Guess who makes more money, software developers or computer programmers? Software developers make $100,690 per year, versus computer programmers with $79,530 per year. WTF?
- They predict job growth of 17% for software developers, and a decline of 8% for computer programmers!
- They don't even include a category for "software engineer", which is the most common job title nowadays.
I hope some poor college student isn't using this data to make a career choice, and no wonder people don't trust unemployment numbers.
Wednesday, July 13, 2016
BufferBloat Be Gone! (or How I Learned to Love OpenWrt)
Friday, July 31, 2015
Snippet :: HOWTO Make JAXB Generate Joda-Time Classes
The key is using a bindings XML file with a javaType element.
pom.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | <plugin> <groupId>org.apache.cxf</groupId> <artifactId>cxf-codegen-plugin</artifactId> <version>3.1.1</version> <executions> <execution> <id>generate-sources</id> <phase>generate-sources</phase> <configuration> <sourceRoot>${project.build.directory}/generated-zuora</sourceRoot> <wsdlOptions> <wsdlOption> <wsdl> ${basedir}/src/main/resources/xml-schemas/zuora/zuora.a.69.0.wsdl </wsdl> <wsdlLocation>classpath:/xml-schemas/zuora/zuora.a.69.0.wsdl</wsdlLocation> <bindingFiles> <bindingFile>${basedir}/src/main/resources/xml-schemas/zuora/bindings.xml</bindingFile> </bindingFiles> </wsdlOption> </wsdlOptions> <!-- generate toString()'s too --> <extraarg>-xjc-Xts</extraarg> </configuration> <goals> <goal>wsdl2java</goal> </goals> </execution> </executions> </plugin> |
bindings.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <?xml version="1.0" encoding="UTF-8"?> <jaxb:bindings version="2.1" xmlns:jaxb="http://java.sun.com/xml/ns/jaxb" xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- so jaxb doesn't create JAXBElement<..>'s everything http://stackoverflow.com/a/4583912/423943 --> <jaxb:globalBindings generateElementProperty="false"> <!-- use JODA-Time DateTime for parsing xs:date --> <jaxb:javaType name="org.joda.time.LocalDate" xmlType="xs:date" parseMethod="org.joda.time.LocalDate.parse"/> <!-- use JODA-Time DateTime for parsing xs:dateTime --> <jaxb:javaType name="org.joda.time.DateTime" xmlType="xs:dateTime" parseMethod="org.joda.time.DateTime.parse"/> </jaxb:globalBindings> </jaxb:bindings> |
Monday, April 13, 2015
For the love of all that is holy, don't use ERROR and SUCCESS in the same log statement
Subscribe to:
Posts (Atom)