How to move static html site from Windows to Linux
- 1. Increase disk space.
- 2. Transform database from MsSQL to MySQL.
- 3. Add Mime types.
- 4. Add basic authentication.
- 5. Change file names.
- 6. Change CMS.
- 7. Recursion and memory optimization.
- 8. Enable SSI
- 9. Upload site Cloudflare CDN
- 10. Get free SSL certificate from Cloudflare
- 11. Enable SSL
- 12. My finally NGINX config
This site since creation was be hosted in Windows, because Window was be full free for hosting (Microsoft was send to anybody Windows key for MSDN subscription), but now Microsoft's degenerates has changed their policies and decide leave servers - move ASP.NET to Linux platform and decide to stop sending free key through MSDN. Therefore I decide follow this rules and also leave Windows in server.
For 20+ years my site become a huge, more than 120,000 pages and images.
So, this is list of steps I performed to do this task.
1. Increase disk space.
Now my site need about 100GB disk size, so fist step was to add additional space.
2. Transform database from MsSQL to MySQL.
To be honest this is not a static HTML-site, it has a lot of active ASP.NET extension, so first my step was be transform database from MsSQL to MySQL.
The simplest and fastest way to do this operation manually.
Firstly I have created DB inside MySQL, it's require only two commands - create database and user (grant permission):
This is more detail instruction Setup MariaDB on Ubuntu server (remote access, user privileges, upload database).
Than I have created DB structure I need - I have downloaded MS DB structure and convert SQL by http://www.sqlines.com/online.
Than I have copy data to Notepad++, create needed SQL Insert command and perform this command by MySQL.
Main point is turn off Autoincrement during data add processing.
SET FOREIGN_KEY_CHECKS=0; alter table `Forum` modify column `i` int(11) not null AUTO_INCREMENT; SET FOREIGN_KEY_CHECKS=1;
Next step is converting StoredProcedures. Unfortunately its possible only manually with a lot of attention for each procedures.
3. Add Mime types.
I have analyze my addition to mime.types as add it to /etc/nginx/mime.types.
4. Add basic authentication.
My site absolutely free, but has a couple folders with interesting book only for me. This books is not a huge secret, this is ordinary computer books and you can download it in thousands site from Inet, but I decide to protect it by password in my site.
So, this step is add basic AU to some folders. This is simple operations, in my case:
# sudo apt-get install apache2-utils # sudo htpasswd -c /var/www/vb-net.com/html/AndroidBook/Doc/.htpasswd 1 # cat /var/www/vb-net.com/html/AndroidBook/Doc/.htpasswd
Than I have add restriction to NGINX rules, in my case:
location /AndroidBook/Doc/ { try_files $uri $uri/ =404; auth_basic "Restricted Content"; auth_basic_user_file /AndroidBook/Doc/.htpasswd; }
And restart NGINX.
# sudo service nginx restart
So, this step is pass too, this is my result config, and we will going ahead to main step.
5. Change file names.
Of course, Windows file names is case insensitive and Windows site can not be working in Linux without tricks. Various recipes, for example Using Apache htaccess file to change URL to lowercase, Convert and redirect URL in uppercase to lowercase using .htaccess, is not working in this case, because this redirect mean that you know filenames on disk. For example folder in disk has name LowCostAspNet, but inside page I use link to this page as LOWCOSTASPNET or lowcostaspnet. In this case to find out existing file in disk need 2 147 483 648 redirect with 32 chars in URL and 18 446 744 073 709 551 616 redirect with 64 letter in name. So, we need workable solution, I it describe below.
Continue reading - Linux console app (.NET Core, EF DB first, CamelCase file and dir rename, calc MD5, RegExpression, change and check link).
6. Change CMS.
Of course, my site has my own CMS to publish page, sync local Windows folder and remote, automatically create list of articles //www.vb-net.com/Articles/index.htm, automatically create forum topic for user comments for each page, automatically create RSS fields, create advertising (on top of pages you can see ticker to related topics) and many other futures. Of course, I don't manually add page to each this list, my CMS doing all needed operation automatically. And my CMS need to change too.
7. Recursion and memory optimization.
Most unexpected and interesting step was be memory optimization. I start this program in huge Linux machine with 50GB memory, therefore I don't thinking about memory at all, I thinking only my own time I mask spend to programming.
But I found additional time for optimization and during a couple of minutes I reduced consumption memory from 1300 MB with objects 200-300 MB.
to 75 MB with objects 0,1 MB (about 20 times !!!)
All I need to so radically reducing memory - I deleted recursion. I have replaces this code.
1: Imports System.Text.RegularExpressions
2:
3: Partial Module Program
4:
5: Public Enum LinkType
6: Href = 1
7: Src = 2
8:
9: End Enum
10:
11: Sub ParseOneFile(FileName As String, ByRef HTML As String)
12: Dim HrefRegex = New Regex("<a\s.*?href=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
13: RecursiveProcessingOneInternalLink(FileName, HTML, HrefRegex, LinkType.Href, 0)
14: Dim LocationRegex = New Regex("location.href=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
15: RecursiveProcessingOneInternalLink(FileName, HTML, LocationRegex, LinkType.Href, 0)
16: Dim SrcRegex = New Regex("<img\s.*?src=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
17: RecursiveProcessingOneInternalLink(FileName, HTML, SrcRegex, LinkType.Src, 0)
18: Dim LinkRegex = New Regex("<link\s.*?href=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
19: RecursiveProcessingOneInternalLink(FileName, HTML, LinkRegex, LinkType.Href, 0)
20: Dim ScriptRegex = New Regex("<script\s.*?src=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
21: RecursiveProcessingOneInternalLink(FileName, HTML, ScriptRegex, LinkType.Src, 0)
22: End Sub
23:
24: Sub RecursiveProcessingOneInternalLink(FileName As String, ByRef HTML As String, Regex As Regex, Type As LinkType, StartIndex As Integer)
25: Dim Links As MatchCollection = Regex.Matches(HTML)
26: If StartIndex <= Links.Count - 1 Then
27: If Links(StartIndex).Value.ToLower.Contains("vb-net.com") And Not Links(StartIndex).Value.ToLower.Contains("forum.vb-net.com") And Not Links(StartIndex).Value.ToLower.Contains("products.vb-net.com") And Not Links(StartIndex).Value.ToLower.Contains("bug.vb-net.com") And Not Links(StartIndex).Value.ToLower.Contains("freeware.vb-net.com") Or
28: Not Links(StartIndex).Value.ToLower.Contains("http") And Not Links(StartIndex).Value.ToLower.Contains("href=""#""") And Not Links(StartIndex).Value.ToLower.Contains("vb-net") And Not Links(StartIndex).Value.ToLower.Contains("forum.vb-net.com") Then
29: 'processing internal link
30: 'Debug.Print($"{StartIndex}: {Links(StartIndex).Index}:{Links(StartIndex).Value}")
31: ReplaceOneLink(FileName, HTML, Links(StartIndex).Index, Links(StartIndex).Value, Type)
32: RecursiveProcessingOneInternalLink(FileName, HTML, Regex, Type, StartIndex + 1)
33: Else
34: 'look to next link
35: RecursiveProcessingOneInternalLink(FileName, HTML, Regex, Type, StartIndex + 1)
36: End If
37: End If
38: End Sub
39:
40: Sub ReplaceOneLink(FileName As String, ByRef HTML As String, LinkPosition As Integer, LinkText As String, Type As LinkType)
41: Dim Str1 As New Text.StringBuilder()
...
93: Str1.Append(Mid(HTML, LinkPosition + Len(LinkText))) 'add right HTML part outside of link
94: HTML = Str1.ToString
95: End Sub
...
To this my code https://github.com/Alex-1347/WindowsServiceExample/blob/main/CacheBuilder/Parse.vb.
1: Public Module Proxy
2:
3: Public Enum LinkType
4: Href = 1
5: Src = 2
6: End Enum
7:
8: Sub ProcessingHTML(ByRef HTML As String)
9: Dim HrefRegex As Regex = New Regex("<a\s.*?href=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
10: ProcessingLinks(HTML, HrefRegex, LinkType.Href)
11: HrefRegex = Nothing
12: Dim LocationRegex As Regex = New Regex("location.href=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
13: ProcessingLinks(HTML, LocationRegex, LinkType.Href)
14: LocationRegex = Nothing
15: Dim SrcRegex As Regex = New Regex("<img\s.*?src=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
16: ProcessingLinks(HTML, SrcRegex, LinkType.Src)
17: SrcRegex = Nothing
18: Dim LinkRegex As Regex = New Regex("<link\s.*?href=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
19: ProcessingLinks(HTML, LinkRegex, LinkType.Href)
20: LinkRegex = Nothing
21: Dim ScriptRegex As Regex = New Regex("<script\s.*?src=(?:'|"")([^'"">]+)(?:'|"")", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
22: ProcessingLinks(HTML, ScriptRegex, LinkType.Src)
23: ScriptRegex = Nothing
24: End Sub
25:
26: Sub ProcessingLinks(ByRef HTML As String, Regex As Regex, Type As LinkType)
27: Dim Links As MatchCollection = Regex.Matches(HTML)
28: Dim I As Integer = 0
29: While Links.Count > 0
30: If Not Links(I).Value.ToLower.Contains("//") Then
31: ReplaceOneRelativeLink(HTML, Links(I).Index, Links(I).Value, Type)
32: Links = Regex.Matches(HTML)
33: End If
34: If I < Links.Count - 1 Then
35: I += 1
36: Else
37: Exit While
38: End If
39:
40: End While
41: Links = Nothing
42: End Sub
43:
44: Sub ReplaceOneRelativeLink(ByRef HTML As String, LinkPosition As Integer, LinkText As String, Type As LinkType)
45: Dim Str1 As New Text.StringBuilder()
46: Str1.Append(Left(HTML, LinkPosition)) 'add left HTML part outside of link
47: Dim Pos1 As Integer
48: Select Case Type
49: Case Type.Href
50: Pos1 = InStr(LinkText.ToLower, "href=", CompareMethod.Text)
51: Case Type.Src
52: Pos1 = InStr(LinkText.ToLower, "src=", CompareMethod.Text)
53: End Select
54: If Pos1 > 0 Then
55: Dim Pos2 = InStr(Pos1 + 1, LinkText.ToLower, """", CompareMethod.Text)
56: If Pos2 <= 0 Then
57: Pos2 = InStr(Pos1 + 1, LinkText.ToLower, "'", CompareMethod.Text)
58: End If
59: If Pos2 <= 0 Then
60: Debug.Print("Link start not found :" & LinkText)
61: Else
62: Dim Pos3 As Integer = InStr(Pos2 + 1, LinkText.ToLower, """", CompareMethod.Text)
63: If Pos3 <= 0 Then
64: Pos3 = InStr(Pos2 + 1, LinkText.ToLower, "'", CompareMethod.Text)
65: End If
66: If Pos3 <= 0 Then
67: Debug.Print("Link end not found :" & LinkText)
68: Else
69: Dim ClearSiteLink As String = Mid(LinkText, Pos2 + 1, Pos3 - Pos2 - 1)
70: Str1.Append(Left(LinkText, Pos2)) 'add left part of link
71: If ClearSiteLink.StartsWith("/") Then
72: Str1.Append(PromocodeCacheCreater.TargetServerRoot)
73: Str1.Append(ClearSiteLink)
74: Else ' link starts with other chars - #, Index.htm, ../
75: Str1.Append(PromocodeCacheCreater.TargetServerPath)
76: Str1.Append("/")
77: Str1.Append(ClearSiteLink)
78: End If
79: If PromocodeCacheCreater.IsUrlCollected Then
80: PromocodeCacheCreater.UrlList.Add(ClearSiteLink)
81: End If
82: Str1.Append(Mid(LinkText, Pos3 + 1)) 'add right part of link
83: End If
84: End If
85: Else
86: Debug.Print("Link not found : " & LinkText)
87: End If
88: Str1.Append(Mid(HTML, LinkPosition + Len(LinkText))) 'add right HTML part outside of link
89: HTML = Str1.ToString
90: Str1 = Nothing
91: End Sub
92:
93: End Module
8. Enable SSI
9. Upload site Cloudflare CDN
10. Get free SSL certificate from Cloudflare
11. Enable SSL
12. My finally NGINX config
1: events {
2: worker_connections 4096; ## Default: 1024
3: use epoll;
4: }
5:
6: http {
7:
8: include mime.types;
9: default_type application/octet-stream;
10:
11: server {
12:
13: sendfile on;
14: keepalive_timeout 65;
15:
16: listen 80;
17: listen [::]:80;
18:
18: *** Other domain ***
23:
24: location / {
25: try_files $uri $uri/ =404;
26: }
27:
28: location ~ \.php$ {
29: try_files $uri =404;
30: }
31: }
32:
33: server {
34:
35: sendfile on;
36: keepalive_timeout 65;
37:
38: listen 80;
39: listen [::]:80;
40:
41: listen 443;
42: ssl on;
43:
44: ssl_certificate /etc/ssl/vb-net-bundle.txt;
45: ssl_certificate_key /etc/ssl/PrivatePemCert.txt;
46:
47:
48: root /var/www/vb-net.com/forum;
49: index Index.htm;
50:
51: server_name forum.vb-net.com;
52:
53: location / {
54: # try_files '' /Index.htm =404;
55: try_files $uri =404;
56: }
57:
58: location /Forum.aspx {
59: try_files '' /Index.htm =404;
60: }
61:
62: location /reclama.html {
63: try_files '' /Index.htm =404;
64: }
65:
66: location /Reclama.aspx {
67: try_files '' /Index.htm =404;
68: }
69:
70: location /reclama.aspx {
71: try_files '' /Index.htm =404;
72: }
73:
74: location /rss.ashx {
75: try_files '' /Index.htm =404;
76: }
77: }
78:
79:
80: server {
81:
82: sendfile on;
83: keepalive_timeout 65;
84:
85: listen 80;
86: listen [::]:80;
87:
88: listen 443;
89: ssl on;
90:
91: ssl_certificate /etc/ssl/vb-net-bundle.txt;
92: ssl_certificate_key /etc/ssl/PrivatePemCert.txt;
93:
94:
95: root /var/www/vb-net.com/html;
96: index Index.htm index.htm index.html Index.html;
97:
98: server_name vb-net.com www.vb-net.com;
99:
100: location / {
101: ssi on;
102: try_files $uri $uri/ $uri/Index.htm =404;
103: }
104:
105: location /2015/Doc/ {
106: ssi on;
107: try_files $uri $uri/ =404;
108: auth_basic "Restricted Content";
109: auth_basic_user_file /2015/Doc/.htpasswd;
110: }
111:
112: location /AndroidBook/Doc/ {
113: ssi on;
114: try_files $uri $uri/ =404;
115: auth_basic "Restricted Content";
116: auth_basic_user_file /AndroidBook/Doc/.htpasswd;
117: }
118:
119: location /ProgramTheory/Books/ {
120: ssi on;
121: try_files $uri $uri/ =404;
122: auth_basic "Restricted Content";
123: auth_basic_user_file /ProgramTheory/Books/.htpasswd;
124: }
125:
126:
127: location ~ /\. {
128: deny all;
129: access_log off;
130: log_not_found off;
131: }
132:
133: location ~ \.php$ {
134: try_files $uri =404;
135: }
136:
137: location ~ (.*)/index.htm$ {
138: return 301 $1/Index.htm;
139: }
140:
141:
142: }
143:
144:
145: server {
146: listen 90;
147: server_name localhost;
148:
149: location /CS {
150: #root /var/www/development/API
151: #Microservices
152: proxy_pass http://localhost:5000;
153: proxy_http_version 1.1;
154: proxy_set_header Upgrade $http_upgrade;
155: proxy_set_header Connection keep-alive;
156: proxy_set_header Host $host;
157: proxy_cache_bypass $http_upgrade;
158: proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
159: proxy_set_header X-Forwarded-Proto $scheme;
160: }
161:
162: location / {
163: #root /var/www/development/Blazor
164: #Frontend
165: proxy_pass http://localhost:6000;
166: proxy_http_version 1.1;
167: proxy_set_header Upgrade $http_upgrade;
168: proxy_set_header Connection keep-alive;
169: proxy_set_header Host $host;
170: proxy_cache_bypass $http_upgrade;
171: proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
172: proxy_set_header X-Forwarded-Proto $scheme;
173: }
174: }
175: }
|