{"id":670,"date":"2011-03-24T20:56:00","date_gmt":"2011-03-24T18:56:00","guid":{"rendered":"http:\/\/lukav.com\/wordpress\/?p=670"},"modified":"2025-05-24T16:27:20","modified_gmt":"2025-05-24T14:27:20","slug":"a-little-awk-script-to-change-encoding-in-parts-of-file","status":"publish","type":"post","link":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file","title":{"rendered":"A little awk script to change encoding in parts of file."},"content":{"rendered":"<p>We&#8217;re in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I&#8217;m not sure if it does this natively or the client determines the commends encoding. So I had to change all the commit comments from cp1251 to UTF8. I couldn&#8217;t\u00a0 just change the hole file, because some of the files have already changed the encoding in the work process and I wanted to keep the history and current encoding intact.<\/p>\n<p>One way was to use &#8220;<em>cvs admin -m rev:comment<\/em>&#8221; command which changes the comment for a given revision in CVS, but that would mean I have to write a script that goes over each file, get all the log, then tries to figure out each revision and comment and use the admin command. Further more it had to work with multi-row comments. Although it is possible it seamed to me too much trouble with many points of breaking the comments.<\/p>\n<p>So I looked at the idea of modifying the RCV files directly. I needed a tool to figure out the parts in the RCV (that is ,v file) between the lines containing only &#8220;log&#8221; and &#8220;text&#8221; and change the encoding only for those part. It doesn&#8217;t seam complicated, but when I tried to use my favorite &#8216;sed&#8217; it couldn&#8217;t call the external &#8216;iconv&#8217; for just parts of the file. So I needed an alternative.<\/p>\n<p>After googling around it turns out awk was the tool for the job. It has the ability of calling system() function that executed external program for certain line.<\/p>\n<p>So here it is. A awk file that looks for \/^log$\/ and then start to execute iconv for each line until it finds \/^text$\/.<\/p>\n<pre>#!\/usr\/bin\/awk -f\r\n\r\n\/^log$\/ {\r\n    flag = 1\r\n    print\r\n    next\r\n}\r\n\r\nflag == 1 {\r\n    str = gsub(\/\"\/,\"\\\\\\\"\")\r\n    system(\"echo \\\"\"$0\"\\\" | iconv -f cp1251 -t utf8\")\r\n}\r\n\r\nflag != 1 {print}\r\n\r\n\/^text$\/ {\r\n flag = 0\r\n next\r\n}<\/pre>\n<p>Of course the file can be easily modified for different tasks.<br \/>\nEnjoy it.<\/p>\n<div id=\"facebook_like\"><iframe src=\"\/\/www.facebook.com\/plugins\/like.php?href=https%3A%2F%2Flukav.com%2Fwordpress%2F2011%2F03%2F24%2Fa-little-awk-script-to-change-encoding-in-parts-of-file&amp;layout=standard&amp;show_faces=true&amp;width=500&amp;action=like&amp;font=segoe+ui&amp;colorscheme=light&amp;height=80\" scrolling=\"no\" frameborder=\"0\" style=\"border:none; overflow:hidden; width:500px; height:80px;\" allowTransparency=\"true\"><\/iframe><\/div>","protected":false},"excerpt":{"rendered":"<p>We&#8217;re in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I&#8217;m not sure if it does this natively or the client determines the commends encoding. So I had to [&hellip;]<\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[70,40],"tags":[],"class_list":["post-670","post","type-post","status-publish","format-standard","hentry","category-en","category-tech"],"aioseo_notices":[],"aioseo_head":"\n\t\t<!-- All in One SEO 4.9.9 - aioseo.com -->\n\t<meta name=\"description\" content=\"We&#039;re in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I&#039;m not sure if it does this natively or the client determines the commends encoding. So I had to\" \/>\n\t<meta name=\"robots\" content=\"max-image-preview:large\" \/>\n\t<meta name=\"author\" content=\"lukav\"\/>\n\t<link rel=\"canonical\" href=\"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file\" \/>\n\t<meta name=\"generator\" content=\"All in One SEO (AIOSEO) 4.9.9\" \/>\n\t\t<meta property=\"og:locale\" content=\"en_US\" \/>\n\t\t<meta property=\"og:site_name\" content=\"Lukav&#039;s Weblog - So that what I know, you know too.\" \/>\n\t\t<meta property=\"og:type\" content=\"article\" \/>\n\t\t<meta property=\"og:title\" content=\"A little awk script to change encoding in parts of file. - Lukav&#039;s Weblog\" \/>\n\t\t<meta property=\"og:description\" content=\"We&#039;re in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I&#039;m not sure if it does this natively or the client determines the commends encoding. So I had to\" \/>\n\t\t<meta property=\"og:url\" content=\"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file\" \/>\n\t\t<meta property=\"article:published_time\" content=\"2011-03-24T18:56:00+00:00\" \/>\n\t\t<meta property=\"article:modified_time\" content=\"2025-05-24T14:27:20+00:00\" \/>\n\t\t<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n\t\t<meta name=\"twitter:title\" content=\"A little awk script to change encoding in parts of file. - Lukav&#039;s Weblog\" \/>\n\t\t<meta name=\"twitter:description\" content=\"We&#039;re in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I&#039;m not sure if it does this natively or the client determines the commends encoding. So I had to\" \/>\n\t\t<script type=\"application\/ld+json\" class=\"aioseo-schema\">\n\t\t\t{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"BlogPosting\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#blogposting\",\"name\":\"A little awk script to change encoding in parts of file. - Lukav's Weblog\",\"headline\":\"A little awk script to change encoding in parts of file.\",\"author\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/author\\\/lukav#author\"},\"publisher\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/#organization\"},\"datePublished\":\"2011-03-24T20:56:00+02:00\",\"dateModified\":\"2025-05-24T16:27:20+02:00\",\"inLanguage\":\"en-US\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#webpage\"},\"isPartOf\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#webpage\"},\"articleSection\":\"EN, Tech\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#breadcrumblist\",\"itemListElement\":[{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress#listItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/lukav.com\\\/wordpress\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/category\\\/tech#listItem\",\"name\":\"Tech\"}},{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/category\\\/tech#listItem\",\"position\":2,\"name\":\"Tech\",\"item\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/category\\\/tech\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#listItem\",\"name\":\"A little awk script to change encoding in parts of file.\"},\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress#listItem\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#listItem\",\"position\":3,\"name\":\"A little awk script to change encoding in parts of file.\",\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/category\\\/tech#listItem\",\"name\":\"Tech\"}}]},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/#organization\",\"name\":\"Lukav's Weblog\",\"description\":\"So that what I know, you know too.\",\"url\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/author\\\/lukav#author\",\"url\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/author\\\/lukav\",\"name\":\"lukav\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#authorImage\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/317f6418582595fc4849ac671d89398ab7579c22578d678d9774727e81490902?s=96&d=mm&r=g\",\"width\":96,\"height\":96,\"caption\":\"lukav\"}},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#webpage\",\"url\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file\",\"name\":\"A little awk script to change encoding in parts of file. - Lukav's Weblog\",\"description\":\"We're in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I'm not sure if it does this natively or the client determines the commends encoding. So I had to\",\"inLanguage\":\"en-US\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/#website\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/2011\\\/03\\\/24\\\/a-little-awk-script-to-change-encoding-in-parts-of-file#breadcrumblist\"},\"author\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/author\\\/lukav#author\"},\"creator\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/author\\\/lukav#author\"},\"datePublished\":\"2011-03-24T20:56:00+02:00\",\"dateModified\":\"2025-05-24T16:27:20+02:00\"},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/#website\",\"url\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/\",\"name\":\"Lukav's Weblog\",\"description\":\"So that what I know, you know too.\",\"inLanguage\":\"en-US\",\"publisher\":{\"@id\":\"https:\\\/\\\/lukav.com\\\/wordpress\\\/#organization\"}}]}\n\t\t<\/script>\n\t\t<!-- All in One SEO -->\n\n","aioseo_head_json":{"title":"A little awk script to change encoding in parts of file. - Lukav's Weblog","description":"We're in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I'm not sure if it does this natively or the client determines the commends encoding. So I had to","canonical_url":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file","robots":"max-image-preview:large","keywords":"","webmasterTools":{"miscellaneous":""},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"BlogPosting","@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#blogposting","name":"A little awk script to change encoding in parts of file. - Lukav's Weblog","headline":"A little awk script to change encoding in parts of file.","author":{"@id":"https:\/\/lukav.com\/wordpress\/author\/lukav#author"},"publisher":{"@id":"https:\/\/lukav.com\/wordpress\/#organization"},"datePublished":"2011-03-24T20:56:00+02:00","dateModified":"2025-05-24T16:27:20+02:00","inLanguage":"en-US","mainEntityOfPage":{"@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#webpage"},"isPartOf":{"@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#webpage"},"articleSection":"EN, Tech"},{"@type":"BreadcrumbList","@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#breadcrumblist","itemListElement":[{"@type":"ListItem","@id":"https:\/\/lukav.com\/wordpress#listItem","position":1,"name":"Home","item":"https:\/\/lukav.com\/wordpress","nextItem":{"@type":"ListItem","@id":"https:\/\/lukav.com\/wordpress\/category\/tech#listItem","name":"Tech"}},{"@type":"ListItem","@id":"https:\/\/lukav.com\/wordpress\/category\/tech#listItem","position":2,"name":"Tech","item":"https:\/\/lukav.com\/wordpress\/category\/tech","nextItem":{"@type":"ListItem","@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#listItem","name":"A little awk script to change encoding in parts of file."},"previousItem":{"@type":"ListItem","@id":"https:\/\/lukav.com\/wordpress#listItem","name":"Home"}},{"@type":"ListItem","@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#listItem","position":3,"name":"A little awk script to change encoding in parts of file.","previousItem":{"@type":"ListItem","@id":"https:\/\/lukav.com\/wordpress\/category\/tech#listItem","name":"Tech"}}]},{"@type":"Organization","@id":"https:\/\/lukav.com\/wordpress\/#organization","name":"Lukav's Weblog","description":"So that what I know, you know too.","url":"https:\/\/lukav.com\/wordpress\/"},{"@type":"Person","@id":"https:\/\/lukav.com\/wordpress\/author\/lukav#author","url":"https:\/\/lukav.com\/wordpress\/author\/lukav","name":"lukav","image":{"@type":"ImageObject","@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#authorImage","url":"https:\/\/secure.gravatar.com\/avatar\/317f6418582595fc4849ac671d89398ab7579c22578d678d9774727e81490902?s=96&d=mm&r=g","width":96,"height":96,"caption":"lukav"}},{"@type":"WebPage","@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#webpage","url":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file","name":"A little awk script to change encoding in parts of file. - Lukav's Weblog","description":"We're in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I'm not sure if it does this natively or the client determines the commends encoding. So I had to","inLanguage":"en-US","isPartOf":{"@id":"https:\/\/lukav.com\/wordpress\/#website"},"breadcrumb":{"@id":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file#breadcrumblist"},"author":{"@id":"https:\/\/lukav.com\/wordpress\/author\/lukav#author"},"creator":{"@id":"https:\/\/lukav.com\/wordpress\/author\/lukav#author"},"datePublished":"2011-03-24T20:56:00+02:00","dateModified":"2025-05-24T16:27:20+02:00"},{"@type":"WebSite","@id":"https:\/\/lukav.com\/wordpress\/#website","url":"https:\/\/lukav.com\/wordpress\/","name":"Lukav's Weblog","description":"So that what I know, you know too.","inLanguage":"en-US","publisher":{"@id":"https:\/\/lukav.com\/wordpress\/#organization"}}]},"og:locale":"en_US","og:site_name":"Lukav's Weblog - So that what I know, you know too.","og:type":"article","og:title":"A little awk script to change encoding in parts of file. - Lukav's Weblog","og:description":"We're in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I'm not sure if it does this natively or the client determines the commends encoding. So I had to","og:url":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file","article:published_time":"2011-03-24T18:56:00+00:00","article:modified_time":"2025-05-24T14:27:20+00:00","twitter:card":"summary_large_image","twitter:title":"A little awk script to change encoding in parts of file. - Lukav's Weblog","twitter:description":"We're in the process of migrating our ancient CVS with the more modern GIT. However I stumbled in the following problem. We make the commits comment in Bulgarian language with windows-1251 encoding. Git uses utf8 although I'm not sure if it does this natively or the client determines the commends encoding. So I had to"},"aioseo_meta_data":{"post_id":"670","title":null,"description":null,"keywords":null,"keyphrases":null,"primary_term":null,"canonical_url":null,"og_title":null,"og_description":null,"og_object_type":"default","og_image_type":"default","og_image_url":null,"og_image_width":null,"og_image_height":null,"og_image_custom_url":null,"og_image_custom_fields":null,"og_video":null,"og_custom_url":null,"og_article_section":null,"og_article_tags":null,"twitter_use_og":false,"twitter_card":"default","twitter_image_type":"default","twitter_image_url":null,"twitter_image_custom_url":null,"twitter_image_custom_fields":null,"twitter_title":null,"twitter_description":null,"schema":{"blockGraphs":[],"customGraphs":[],"default":{"data":{"Article":[],"Course":[],"Dataset":[],"FAQPage":[],"Movie":[],"Person":[],"Product":[],"ProductReview":[],"Car":[],"Recipe":[],"Service":[],"SoftwareApplication":[],"WebPage":[]},"graphName":"","isEnabled":true},"graphs":[]},"schema_type":"default","schema_type_options":null,"pillar_content":false,"robots_default":true,"robots_noindex":false,"robots_noarchive":false,"robots_nosnippet":false,"robots_nofollow":false,"robots_noimageindex":false,"robots_noodp":false,"robots_notranslate":false,"robots_max_snippet":null,"robots_max_videopreview":null,"robots_max_imagepreview":"large","priority":null,"frequency":null,"local_seo":null,"breadcrumb_settings":null,"limit_modified_date":false,"ai":null,"created":"2023-06-02 03:37:43","updated":"2025-06-03 23:59:38","seo_analyzer_scan_date":null},"aioseo_breadcrumb":"<div class=\"aioseo-breadcrumbs\"><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"https:\/\/lukav.com\/wordpress\" title=\"Home\">Home<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"https:\/\/lukav.com\/wordpress\/category\/tech\" title=\"Tech\">Tech<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\tA little awk script to change encoding in parts of file.\n\t\t<\/span><\/div>","aioseo_breadcrumb_json":[{"label":"Home","link":"https:\/\/lukav.com\/wordpress"},{"label":"Tech","link":"https:\/\/lukav.com\/wordpress\/category\/tech"},{"label":"A little awk script to change encoding in parts of file.","link":"https:\/\/lukav.com\/wordpress\/2011\/03\/24\/a-little-awk-script-to-change-encoding-in-parts-of-file"}],"_links":{"self":[{"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/posts\/670","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/comments?post=670"}],"version-history":[{"count":9,"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/posts\/670\/revisions"}],"predecessor-version":[{"id":778,"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/posts\/670\/revisions\/778"}],"wp:attachment":[{"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/media?parent=670"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/categories?post=670"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lukav.com\/wordpress\/wp-json\/wp\/v2\/tags?post=670"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}