Thursday 4 February 2016

Real time questions for weblogic

Introduction:

Hi This is <Name>, I did my <>
Currently I am working with <Current working company>.
 Previous experience with the payroll of <if your working as contractor>
If you belong to admin tell about complete info,
Installation of weblogic application server 8.1,9.2,10.3 
11g and webserver and Apache 2.0 and 2.2.I
have good experience in Troubleshooting and 
Performance tuning as well as monitoring 

add these as well

 
1.I involved in creation and configuration of Domains.

2.Deployment of applications.

3.Managing SSL certificates.
4.Creation and configuration of clusters for providing load balancing and fail over support.
5.Creation and configuration of JDBC and Data sources and connection pools.


Working Environment in your company.
There are totaly 30applications deployed on 10 domains.
Each domain contains of  3horizantal cluster.
Each cluster contains 10-25 managed servers.
Currently we are working on Linux 
Application server weblogic
webserver Apache 

Backend Database is Oracle.


Day to day activities:

1.Taking shift handover from previous shifts.
2.Health status check-up of all weblogic server instances.
3.how you will check  webservices and applications.
4.Deployment on Non prod  and production servers.
5.ITIL concepts in Incidents,change and problem management
6.Providing on call support of 24*7 Basis.
7.Giving shift hand over to next shift.

Wednesday 3 February 2016

Apache Installation & Configuration in Weblogic

Installation Steps:

Step1: Download the any version of httpd tar file (Ex: httpd-2.2.11.tar) and Extract it
tar -xvf httpd-2.2.11.tar

Step2: Go to httpd-2.2.11 directory and execute the confiure command to configure the apache in one folder (Ex:apache2053)
cd httpd-2.2.11

./configure --prefix=$Home/apache2053

Step3: we need to execute the below commands to build
make
make install

Step4: take the backup of httpd.conf file
cd apache2053/conf
cp httpd.conf httpd.conf_original

Step5: Edit the httpd.conf configuration file
vi httpd.conf
ServerRoot "$Home/apache2053"
(ServerRoot is the path to the server's configuration, error and log files.It is possible to change this path, provided all the necessary files are copiedto the new location accordingly)
Listen IPAddress:Port (Ex: Listen 4.192.50.25:9999)
LoadModule weblogic_module modules/mod_wl_22.so
User wlusername
#main content of wl application details to update in httpd.conf file
Open tag of IfModule
#keep the info in open tag of IfModule ---mod_weblogic.c
WebLogicCluster hostname:9902,hostnamevir1:9902
MatchExpression *.jsp
close tag of IfModule
Open tag of Location
#keep /application root folder name in Lacation
SetHandler weblogic-handler
DynamicServerList ON
HungServerRecoverSecs 600
ConnectTimeoutSecs 40
WLCookieName JSESSIONID
DebugConfigInfo OFF
Debug ON
WLLogFile /$Home/apache2053/logs/paniweb1.log
ConnectRetrySecs 2
Idempotent ON
FileCaching ON
WLProxySSL OFF
SecureProxy OFF
Debug OFF
Close tag of Location

ServerAdmin sarangapani.matoori@gmail.com
ServerName 4.192.50.25 (we can use dns name , if we don't have it then we can use the ip)

Step6:copy the mod_wl_22.so module file from $Home/wl923/bea/weblogic92/server/plugin/solaris/sparc to $Home/apache2053/modules
cp $Home/wl923/bea/weblogic92/server/plugin/solaris/sparc/mod_wl_22.so $Home/apache2053/modules/

Step7:check the status of apche configuaration is ok
cd $Home/apache2053/bin
apachectl -t

Step8: start the apache if configuaration is ok
apchectl -k start or httpd -k start

Step9: Access the application with the apache port
http://hostip:port(which%20is%20kept%20in%20httpd.conf/ file)/ProjectName/jsp/Logon.jsp

Step10: Stop the Apache
apchectl -k stop or httpd -k stop

HTTP Error codes

Included in the HTTP server response data for each request is a code number indicating the result of the request. These result codes are three-digit numbers divided into categories as follows:

100-199 : Informational status

200-299 : Success status
300-399 : Redirection status
400-499 : Client errors
500-599 : Server errors

Informational

    100 - Continue
    A status code of 100 indicates that (usually the first) part of a request has been received without any problems, and that the rest of the request should now be sent.

    101 - Switching Protocols

    HTTP 1.1 is just one type of protocol for transferring data on the web, and a status code of 101 indicates that the server is changing to the protocol it defines in the "Upgrade" header it returns to the client. For example, when requesting a page, a browser might receive a statis code of 101, followed by an "Upgrade" header showing that the server is changing to a different version of HTTP.
________________________________________________________________

Successful


    200 - OK

    The 200 status code is by far the most common returned. It means, simply, that the request was received and understood and is being processed.

    201 - Created

    A 201 status code indicates that a request was successful and as a result, a resource has been created (for example a new page).

    202 - Accepted

    The status code 202 indicates that server has received and understood the request, and that it has been accepted for processing, although it may not be processed immediately.

    203 - Non-Authoritative Information

    A 203 status code means that the request was received and understood, and that information sent back about the response is from a third party, rather than the original server. This is virtually identical in meaning to a 200 status code.

    204 - No Content

    The 204 status code means that the request was received and understood, but that there is no need to send any data back.

    205 - Reset Content

    The 205 status code is a request from the server to the client to reset the document from which the original request was sent. For example, if a user fills out a form, and submits it, a status code of 205 means the server is asking the browser to clear the form.

    206 - Partial Content

    A status code of 206 is a response to a request for part of a document. This is used by advanced caching tools, when a user agent requests only a small part of a page, and just that section is returned.
_________________________________________________________________


Redirection


    300 - Multiple Choices

    The 300 status code indicates that a resource has moved. The response will also include a list of locations from which the user agent can select the most appropriate.

    301 - Moved Permanently

    A status code of 301 tells a client that the resource they asked for has permanently moved to a new location. The response should also include this location. It tells the client to use the new URL the next time it wants to fetch the same resource.

    302 - Found

    A status code of 302 tells a client that the resource they asked for has temporarily moved to a new location. The response should also include this location. It tells the client that it should carry on using the same URL to access this resource.

    303 - See Other

    A 303 status code indicates that the response to the request can be found at the specified URL, and should be retrieved from there. It does not mean that something has moved - it is simply specifying the address at which the response to the request can be found.

    304 - Not Modified

    The 304 status code is sent in response to a request (for a document) that asked for the document only if it was newer than the one the client already had. Normally, when a document is cached, the date it was cached is stored. The next time the document is viewed, the client asks the server if the document has changed. If not, the client just reloads the document from the cache.

    305 - Use Proxy

    A 305 status code tells the client that the requested resource has to be reached through a proxy, which will be specified in the response.

    307 - Temporary Redirect

    307 is the status code that is sent when a document is temporarily available at a different URL, which is also returned. There is very little difference between a 302 status code and a 307 status code. 307 was created as another, less ambiguous, version of the 302 status code.
_________________________________________________________________

Client Error


    400 - Bad Request

    A status code of 400 indicates that the server did not understand the request due to bad syntax.

    401 - Unauthorized

    A 401 status code indicates that before a resource can be accessed, the client must be authorised by the server.

    402 - Payment Required

    The 402 status code is not currently in use, being listed as "reserved for future use".

    403 - Forbidden

    A 403 status code indicates that the client cannot access the requested resource. That might mean that the wrong username and password were sent in the request, or that the permissions on the server do not allow what was being asked.

    404 - Not Found

    The best known of them all, the 404 status code indicates that the requested resource was not found at the URL given, and the server has no idea how long for.

    405 - Method Not Allowed

    A 405 status code is returned when the client has tried to use a request method that the server does not allow. Request methods that are allowed should be sent with the response (common request methods are POST and GET).

    406 - Not Acceptable

    The 406 status code means that, although the server understood and processed the request, the response is of a form the client cannot understand. A client sends, as part of a request, headers indicating what types of data it can use, and a 406 error is returned when the response is of a type not i that list.

    407 - Proxy Authentication Required

    The 407 status code is very similar to the 401 status code, and means that the client must be authorised by the proxy before the request can proceed.

    408 - Request Timeout

    A 408 status code means that the client did not produce a request quickly enough. A server is set to only wait a certain amount of time for responses from clients, and a 408 status code indicates that time has passed.

    409 - Conflict

    A 409 status code indicates that the server was unable to complete the request, often because a file would need to be editted, created or deleted, and that file cannot be editted, created or deleted.

    410 - Gone

    A 410 status code is the 404's lesser known cousin. It indicates that a resource has permanently gone (a 404 status code gives no indication if a resource has gine permanently or temporarily), and no new address is known for it.

    411 - Length Required

    The 411 status code occurs when a server refuses to process a request because a content length was not specified.

    412 - Precondition Failed

    A 412 status code indicates that one of the conditions the request was made under has failed.
    413 - Request Entity Too Large
    The 413 status code indicates that the request was larger than the server is able to handle, either due to physical constraints or to settings. Usually, this occurs when a file is sent using the POST method from a form, and the file is larger than the maximum size allowed in the server settings.

    414 - Request-URI Too Long

    The 414 status code indicates the the URL requested by the client was longer than it can process.

    415 - Unsupported Media Type

    A 415 status code is returned by a server to indicate that part of the request was in an unsupported format.

    416 - Requested Range Not Satisfiable

    A 416 status code indicates that the server was unable to fulfill the request. This may be, for example, because the client asked for the 800th-900th bytes of a document, but the document was only 200 bytes long.

    417 - Expectation Failed

    The 417 status code means that the server was unable to properly complete the request. One of the headers sent to the server, the "Expect" header, indicated an expectation the server could not meet.
_________________________________________________________________

Server Error


    500 - Internal Server Error

    A 500 status code (all too often seen by Perl programmers) indicates that the server encountered something it didn't expect and was unable to complete the request.

    501 - Not Implemented

    The 501 status code indicates that the server does not support all that is needed for the request to be completed.

    502 - Bad Gateway

    A 502 status code indicates that a server, while acting as a proxy, received a response from a server further upstream that it judged invalid.

    503 - Service Unavailable

    A 503 status code is most often seen on extremely busy servers, and it indicates that the server was unable to complete the request due to a server overload.

    504 - Gateway Timeout

    A 504 status code is returned when a server acting as a proxy has waited too long for a response from a server further upstream.

    505 - HTTP Version Not Supported

    A 505 status code is returned when the HTTP version indicated in the request is no supported. The response should indicate which HTTP versions are supported.

Issues for weblogic

1:   OOM, Native OOM, Server Crash, High CPU Utilization, Server down/Unknown
2:   404, 403, Users Unable to access some application and URL, application errors, application responding slowly, application not working , application not opening,        not getting authenticated, blank page.
3:   Log file not rotating, high disk space usage on servers, Stack overflow, Thread count, Site scope alert, Error while uploading war file.
4:   User creation errors.


1.OOM

Login to the Corresponding Server through Putty

Then Check the Status of the Server instances
Check the Server logs and Out logs for OutOfMemory Error
Take the Access logs at the time of OOM and it will be good if we take thread dump
 If Server(s) is/are in Running State.
Analysis the Thread dump for the Cause of OutOfMemory Error (Due to App/Server)
Then Depending on the Server Status (if not in Running State) Restart the Server.

 OutOfMemory during deployment:


If the application is huge(contains more than 100 JSPs), we might encounter this problem with default JVM settings.

The reason for this is, the MaxPermSpace getting filled up.
This space is used by JVM to store its internal datastructures as well as class definitions. JSP generated class definitions are also stored in here.
MaxPermSpace is outside java heap and cannot expand dynamically.
So fix is to increase it by passing the argument in startup script of the server: –XX:MaxPermSize=128m (default is 64m)

2.Site Scope alerts:


Login to the Server

Check the server status and Particularly at the time of Site Scope alert
Check the logs (Server/Out) for any Errors and Exceptions at the time of Site Scope alert.

3.High CPU utilization:


 Login to the Corresponding Server through putty

 Check the server instances CPU utilization
 ps –ef  [0r] top [or] prstat
 aix: topas or psstat
 Make Sure that the instances are running in weblogic User.
 ps –ef | grep java
Check the logs for any findings regarding high utilization
Check the Queue threads
If 100% cpu utilization :: kill -9 pid
Restart the instances to bring down the more CPU Utilization.
4.High disk space usage on servers:
Login to the Server.
Check the disk space of the respective Mount which is consuming more disk Space.
df –kh
Zip log files or remove oldest logs backup war files and also access logs.
gzip <filename> or compress <filename>  [0r] rm –rf  <filename>
Backup : mv /apps/bea/domains/gwmp_desktop/ads_web.war /apps/back_up/ads_web.war_bak
 Backup: mv <sourcepath> <destinationpath>.
5.Threads count :
 Check the logs for any  Errors and Exceptions
 Check the status of instances & connection pools
 Check the CPU usage.
 Take the thread dump if possible and Analyze the thread dump
 Check with Other Subsystems
 Check with the DB team if any Issues related to Database.
6.Stack overflow:
Checkout the Server logs as well as Out logs and also the access logs at the time of Stack Overflow Occurrence. Restart the instance if required
 Xss=.
7. Log files not rotating:

 Check the Status of the Server

 ./startWeblogic.sh
 ./startManagedWeblogic.sh <manageservername>(or)
 Check through console.
 Check the disk Space(if full, Delete the logs and then need to restart the Server)
 du –kh (folder)
 df –kh (filesystem)
available  capacity 45% 90%
 If full , mv <source path> <destination path>
 Delete, rm –rf <filename: adminserver.log>

8.Server Errors:


 Check the Status of Servers.

 ./startWeblogic.sh
 ./startManagedWeblogic.sh <manageservername>[0R]
 Check through console.
 Check the Server logs
 /apps/bea/domain/gwmp_destop/logs
 Adminserver.log
 Managedserver.log
 If any Database Errors, Check the Connection pool and Datasource.
 Services->jdbc->connectionpool,datasource
 Check out the Deployment Descriptors.
 Weblogic.xml,web.xml
 Based on the logs if any Configuration Changes Required, Make the Changes and then restart instances one by one if in Cluster.

9.Server Down/Unknown:

Login to the Server through Putty as well as Open the Admin Console
 Check out the respective Instance Process from putty as well as the instance Status from Admin Console
 If Process does not exist and Instance Status is Unknown, then check the logs of the Server Instance as well as Admin Logs.
 Admin and managed server logs.
 Node manage status.
 Find the root Cause from the logs And Restart the required instances

10. URL not working:

Access the URL
 Check the Status of the Server instances on which this Application is deployed.
 Then Check the Default Queue threads or (Application Specified Queue if any)
 whether idle threads are zero or not. Then Server logs and Application logs (Out logs) for Errors and Exceptions.
 If idle threads are Zero, Check which Application is consuming all threads and if it is the same application which you are accessing, then check with the Application Owner.
 (To resolve the above Issue, Need to restart the Corresponding Instances, before that check
 with the App owner why they are getting consumed)
 If there is any Application Related Exceptions- Check with the Application owner or check the server logs for exceptions.
 If there are any DB Exceptions related to the application which you are accessing, Please Check the Corresponding Connection pool and Datasource whether they are running fine or not.

11.Application errors:

Access the Application URL
 Check the instances and their status if any Errors
 Check the logs of the Server as well as Application (Out) logs
 Check out the Connection pool Parameters and Datasource

12.Users unable to access some application/URL:

 Check out by  accessing the url
 Check out whether they are using Correct URL or not
 Check the logs of both Weblogic and Webserver
 Check the Server Instances status.
 Test the pools.
 Check the DB connectivity.
 Check if the deployment is done properly or not, else redeploy the application and check for errors in the logs simultaneously.
 Check out the Connection pool user name.
 Restart the instance if required.

13.Application error, responding slowly, Application not working/not Opening, not getting authenticated,Blank page

Check the Web server and App server instance status.
 Check the logs for any errors/exceptions both in Webserver as well as in Weblogic Server.
 Check the Queue threads, Connection pool Status, Connections and Datasource.
 Check disk space
 Check the log4j property enable.
 Check if the deployment done properly.

14.Error while uploading war file:

check out the Availability of Space

15.Log locations:


1) Server log

WebLogic server creates server log file by default under:
/<domain-name>/<server name>/<server name>.log
The location is configurable.

2) JDBC log

All SQL statements and DB related exceptions/errors.
This file is created under /<server name>/jdbc.log

3)STDout log (If the process is redirected to STDout)

Domain log
All domain level information is logged into this file.
This is subset of server log file.
<domain name>/<domain name>.log
4) Access log
All http requests are recorded in this log file
/<server name>/access.log
5) Transaction log
All servers record transaction in the tlog file
/<server name>/<server name>.tlog

16.Server Crash:

 Server Crash
 This implies the weblogic java process no longer exists.
 Server crash can occur only because of native code. (Java cannot cause a process to crash)
 Determine all potential sources of native code used by the WebLogic Server.
 nativeIO.
 Type4 jdbc driver.
 Native libraries accessed with JNI calls.
 SSL native libraries.
 JVM itself. Most of the times its from JVM.

Sometimes the JVM will produce a small log file that may contain useful information as to which library the crash has originated from. (hs_err_pid*.log)


Server Crash Analysis

When a JVM is crashed, a core file(binary image of the process) is created. Run pmap and pstack against the core file to get the library that caused the crash.

Demo to figure out offending library using existing pmap & pstack out files.

Check list:

1) hs _err_pid*.log (Look for library that caused the crash)


2) pmap core (core file created in JVM root dir)

pstack core

3) Using debugger (gdb,dbx,adb) (if above two steps does not provide any information)


17.Server Hang:


A server is said to be hung when:

 Process is still alive
 Server does not accept any requests because all the execute threads busy or stuck for some reason.
 No reponse sent to clients.
 java weblogic.Admin PING command doesn’t return a normal reponse

Server Hang Analysis:

The first step is to take multiple thread dumps.
 A thread dump is a snapshot of the JVM at the particular instant.
 Multiple thread dumps are necessary to conclude that the threads are  stuck and not progressing.

Procedure to take thread dumps:

Unix:
 Open shell window and issue the command  kill -3 <PID>
 where PID is java processID of weblogic. Thread dumps are
 logged on to STDout file.
Windows:
 Do ctrl-break on command window where weblogic is running.
 Thread dumps are created on the same command window.

Windows Service:

 Open a command prompt and issue the command(Make sure beasvc.exe is in the PATH)
 c:\> beasvc -dump -svcname:service-name
 Thread dumps are created in the defined log file.
 While creating service, we can provide log option in installservice script    as:
 -log:"d:\bea\domains\mydomain\myserver-stdout.txt

•             Before we analyze thread dumps, it is important to know the common thread states:

1)Runnable [marked as R in some VMs]:
This state indicates that the thread is either running currently or is ready to run the next time the OS thread scheduler schedules it.
2)Object.wait() [marked as CW in some VMs]:
Indicates that the thread is waiting for some condition to be
fulfilled.
3)Waiting for monitor entry [marked as MW in some VMs]:
Indicates that the thread is waiting to enter a synchronized block.

These threads are something to watch out because there is lock contention here. Thread is waiting for a lock on object and some other thread is holding the lock.


In case of weblogic, the main worker threads are from group weblogic.kernel.defalt:

"ExecuteThread: '1' for queue: 'weblogic.kernel.Defalt'“….
This is the set of threads we need to look for hang/slow performance issues.
This is a snapshot of idle thread waiting for some work to be assigned.
On an idle system you would see lot of threads in the below state:

"ExecuteThread: '1' for queue: 'weblogic.kernel.Defalt'" daemon prio=5 tid=0x031a6308 nid=0x980 in Object.wait() [2dff000..2dffd8c]

at java.lang.Object.wait(Native Method)
- waiting on <0x112cf2c0> (a weblogic.kernel.ExecuteThread)
at java.lang.Object.wait(Object.java:429)
at weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:153)
- locked <0x112cf2c0> (a weblogic.kernel.ExecuteThread)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:172)

•             As for thread dump analysis & conclusion, lets see a sample thread dump and drill into it further

Demo of RSD thread dump (Thread stuck issue on UAT)
Server performing Slow
There are lot of reasons for server performing slow.
First step is to take thread dumps and see what the threads are doing. If there is nothing wrong with the threads  there are other reasons why server performs slow:

Process runs OutOfMemory:

If java heap is full, server process appears to be hung and not accepting any requests because each request needs heap for allocating objects.
So if heap is full, none of the requests get served, all the requests fail with java.lang.OutOfMemory


OutOfMemory Analysis:

OutOfMemory can occur because of real memory crunch or a memory leak causing the heap to fill with orphaned objects.
First step is to enable GC and run the server again.
(-XX:printGCDetails).
The STDout file would show the garbage collection details.
If the error is because of memory leak, then we would need to use profilers like Introscope or optmizeIT to figure out the source of leak.
OutOfMemory Analysis
Process size  = java heap + native memory + memory occupied by the executables and libraries.
On 32 bit operating systems, the virtual address space of a process can go up to 4 GB. This is data bit limitation (2 pow 32)

Out of this 4 GB, the OS kernel reserves some part for itself (typically 1 – 2 GB).

This is not a limitation on 64 bit machines like solaris(sparc) or windows running on Itanium (64 bit)

OutOfMemory Analysis

OOM can occur due to fragmentation. In this situation, we can see free memory available but still get OutOfMemory errors.
Before we know about fragmentation, we need to know the following fact:
Heap allocation can only be contiguous (As per JVM spec). If a request needs 2MB of memory then JVM has to provide 2MB of contiguous memory chunk.
Over a period of time, memory allocation is becomes scattered and there might not be enough contiguous memory available.
FullGC might no be able to reclaim the contiguous space.
This is called fragmentation
For eg: The verbose:gc output might look like the following if there was a fragmentation of heap. There is free memory available, but  still JVM throws OOM error.
(Most of the fragmentation bugs are resolved in Sun JDK1.4.2_xx)

[GC 4673K->3017K(32576K), 0.0050632 secs]

[GC 5047K->3186K(32576K), 0.0028928 secs]
[GC 5232K->3296K(32576K), 0.0019779 secs]
[GC 5309K->3210K(32576K), 0.0004447 secs]
java.lang.OutOfMemoryError

•             OutOfMemory Analysis

Fragmentation relates issues are because of bug in JVM.
Best approach is to try the latest minor version of JVM and if does not work out, we need to work with vendor to get it fixed.
•             The following commands on solaris will provide good information:
vmstat :
The vmstat command reports statistics about kernel   threads, virtual memory, disks, traps and CPU activity
sar:
An OS utility that is termed as system activity reporter
•             If the application uses SSL, then the server performs slow compared to non SSL.
SSL reduces the capacity of the server by about 33 to 50 percent depending upon the strength of encryption used in the SSL connections.

Process running out of File descriptors. Server cannot accept further requests because sockets cannot be created. (Each socket created consumes a FileDescriptor)

The following exception is thrown in such cases:
java.net.SocketException: Too many open files
OR
java.io.IOException: Too many open files
In the above case, the lsof utility would help. lsof utility shows the list of all open filedescriptors. From the list of open files, we ( application owner) can easily figure out if it is a bug or expected behavior. If it is expected behavior, then the number of FDs needs to be increased. (default number is 1024)

•             GC taking long times (more than 20secs).

This appears like a hang for end users.
In the above case, we need to tune the GC parameters.
In these scenarios, we should be trying other GC options  available. In some cases (GC taking very long times), incremental GC has been useful (-Xincgc).


WebLogic Troubleshooting Communication from Apache - Weblogic


If there is any issue between Apache and Weblogic and the cause is not obvious, enable debug at Apache layer. In http.conf file add:

Debug ALL
This would create file called wlproxy.log under /tmp of Apache machine. The log would contain all the request/response headers between Apache and WebLogic.
Most of the plug-in issues in WLS8.1 were centered around the attribute “KeepAliveEnabled”.
For most of the socket related errors, it worth trying turning off
“KeepAliveEnabled” and redo the test.

Apache Restart and Check the Connection counts:


APACHE_HOME\bin\Apache –t   Syntax check

APACHE_HOME\bin\Apache  start Start the server
APACHE_HOME\bin\Apache  stop Stop the server
APACHE_HOME\bin\Apache  Restart
APACHE_HOME\bin\Apache  -l
_______________________________________________________________________
Getting error while restarting one of the Weblogic server instance

####<Sep 13, 2007 6:45:44 PM IST> <Error> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'weblog

ic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344635> <000000> <Error opening the Transaction Log: ./servers/
itms/data/ldap/ldapfiles/EmbeddedLDAP.tran (Permission denied)>
####<Sep 13, 2007 6:45:44 PM IST> <Error> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'weblog
ic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344637> <000000> <Error Instantiating 'dc=web2prod-Domain': nul
l>
####<Sep 13, 2007 6:45:44 PM IST> <Critical> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'web
logic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344653> <BEA-171521> <An error occurred while initializing t
he Embedded LDAP Server. The exception thown is java.lang.ClassCastException: com.octetstring.vde.backend.BackendRoot. This ma
y indicate a problem with the data files for the Embedded LDAP Server. This managed server has a replica of the data contained
 on the Master Embedded LDAP Server in the Admin server. This replica has been marked invalid and will be refreshed on the nex
t boot of the managed server. Retry the reboot of this server.>
####<Sep 13, 2007 6:45:44 PM IST> <Critical> <WebLogicServer> <bng1web2prod> <itms> <main> <<WLS Kernel>> <> <> <1189689344667
<BEA-000362> <Server failed. Reason:

While restarting WL instance on 9.2 I got the above-mentioned error and I found that server is getting started but again it’s getting forced shutdown.


Solution:

Just go to that server instance directory and browse inside that for the
/local/BEA/weblogic92/domain-name/servers/server-name/data/ldap/ldapfiles directory path

You will get below listed files in that particular directory


-rw-r--r--   1 weblogic weblogic   79649      Sep 13  18:48 EmbeddedLDAP.data

-rw-r--r--   1 weblogic weblogic       0          Sep 13  18:48 EmbeddedLDAP.delete
-rw-r--r--   1 weblogic weblogic     648        Sep 13  18:48 EmbeddedLDAP.index
-rw-r--r--   1 weblogic weblogic       0          Sep 13  18:48 EmbeddedLDAP.lok
-rw-r--r--   1 weblogic weblogic   80126      Sep 13  18:48 EmbeddedLDAP.tran
-rw-r--r--   1 weblogic weblogic       8          Sep 13  18:48 EmbeddedLDAP.trpos

Just delete the below listed files inside the directory

-rw-r--r--   1 weblogic weblogic       0      Sep 13 18:48 EmbeddedLDAP.delete
-rw-r--r--   1 weblogic weblogic       0      Sep 13 18:48 EmbeddedLDAP.lok

Now restart the instance from the bin directory, this will get your Server up and running without issue.



Issue 1: JMS Issue 1


EOP messaging bridges failing frequently with error : "(java.lang.Exception: javax.resource.ResourceException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found). Because of this issue messages are being piled up on MQ and not being picked up by the bridge.


Soln:


    Domain:eopdom1 (1admin +2ms spread across 2 servers). Checked the bridge configuration (70 bridges in total). Then checked the pools-param in jma-xa-dap.rar (120 on m1 and 20 on m2). Changed this to 150 on both servers as each bridge needs atleast 2 connections from the adapter pool, then redeployed and restarted weblogic instances. Also applied patch WB1E (CR326720_920.jar) to resolve the known issue with the error mentioned.


Notes:


Live is running on 9.2.0 and test is running on 9.2.3, this should be brought in sycn. Also, planning a quick round of WLS health check on EOP.



Issue2: JMS 


Messaging bridge failed to connect with the source and target destinations and was giving below error: "failed to get one of the adapters from JNDI (javax.naming.NameNotFoundException: Unable to resolve 'eis.jms.WLSConnectionFactoryJNDIXA'. Resolved 'eis.jms'; remaining name 'WLSConnectionFactoryJNDIXA')". This would suggest that the adapter file jms-xa-adp.rar was either not targeted to the required managed server instance or perhaps the deployment of adapter failed with certain error.


Soln:


Found that the adapter was only targeted to managed2 server whereas the bridge was configured to run on managed1 server. Targeted the adapter to managed1 server as well and restarted the instances.


Issue3: JMS 


A newly configured messaging bridge failed to become Active and following 2 error messages were seen: " Unable to connect to source destination" and "Configured QoS is not reachable".


Soln:


"Unable to connect to source destination" found that the source URL had a space between the "//" and IP, removed this and now the bridge was able to connect to source destination. "configured QoS is not reachable" found that the "QoS degradation allowed" was checked for earlier bridges but was unchecked for this new bridge and QoS was configured for "Exactly One" delivery, enabled this and the messaging bridge became Active upon bounce of weblogic instances.


Notes:  


 QoS "Exactly Once" required the messaging to be XA enabled i.e. the connection factory should be XA enabled and the destinations should be configured to use jms XA adapter.


Issue4: Deployer


Unable to deploy application from the console and getting following error on the console page ""Deployer:149150]An IOException occurred while reading input.; nested exception is: java.net.SocketException: Connection reset; nested exception is: java.net.SocketException: Connection reset".


Soln:


The only error message in the logs was indicating that the application is attempting to connect to java.sun.com on port 80 over internet but this was disabled due to firewall restrictions, reported this to application team. As a work-around added a manual entry in config.xml for application and restarted the admin and managed server instances and the application got deployed sucessfully.


Notes:


One of the argument was that the application was getting deployed properly on another test instance even with the same error. Though we were never able to replicate this again, one theory is that while deploying through console the deployer was attempting to connect to java.sun.com again and again, eventually timing out but by adding entry in config.xml and restart it just attempted once and moved over with other tasks which would have higher priority during restart.



Issue4: Startup


“/wls_domains/wlmrtnept/servers/managed3_wlmrtnept/tmp/managed3_wlmrtnept.lok : java.io.IOException: No locks available”


Soln:


This could be due to incorrect NFS setup (if NFS filesystem is used), check if the hosts have correct permissions on the NFS server. Check for the below nfs libraries they should be installed.yum list | grep nfs


*Note*: Red Hat Network repositories are not listed below. You must run this command as root to access RHN repositories.

nfs-utils.x86_64                           1:1.0.9-40.el5         installed
nfs-utils-lib.x86_64                       1.0.8-7.2.z2           installed
Also rpc.statd and rpc.idmapd processes should be running.


Issue5:  Cluster


For quite sometime we were observing multicast packet loss issues triggering various other problems on Weblogic like managed servers dropping out of cluster, jms messages not delivered properly to distributed queues.


A recurring message similar to below appears in the logs, although it is an informational message only but it in turn acts as a trigger to various other issues, so messages like this should not be neglected.


<Mar 23, 2010 12:14:04 PM GMT> <Info> <Cluster> <host1> <managed1> <[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1269346444069> <BEA-000112> <Removing managed2 jvmid:-1616739071273980991S:host2:[61002,61002,-1,-1,-1,-1,-1]:host1:61001,host2:61002,host3:61003,host4:61004:domain-name:managed2 from cluster view due to timeout.>


Soln:


We used multicast test utility to see if there is in fact any issue with multicasting


java utils. MulticastTest –n <name> -a <multicast-address> -p <multicast-port>


The result showed that the multicast packets are intermittently being dropped within the vLAN causing the above issue. We then liaised with the OS experts to narrow down the issue and to see whether the multicast packets are being transmitted correctly amongst the servers. This did not help much as from the server perspective all the packets were being transmitted correctly. 


Next we involved network experts to seek their help. After thorough investigations of the network logs and various switch configurations it was concluded that this was down to the multicast address range being used and the way the local switches acknowledged that multicast range. They also suggested that in future we make use of Link Local Multicast IP Addresses for Weblogic multicasting purposes.


A note on Link Local IP can be found at: http://www.iana.org/assignments/multicast-addresses/


In short, Multicast Link-local addresses (actually, the link-local mac-addresses) are treated as broadcasts by the local switches so all web logic servers on the same vlan will see them. Other multicast addresses are dropped by the switches as default unless further action is taken:

Disable IGMP snooping on the vlan or the whole switch – otherwise the switch just drops the multicast packet because Web logic doesn’t use IGMP so the switch never sees an IGMP join request to the multicast group (and thus never maps the mac address to the switch port).  OR
Configure static multicast mac addresses for the relevant switch ports.

Both the above 2 options add network complexity and are costly to implement, test and maintain. Link-local multicast addresses completely avoid these issues. Some previous implementations using non link local multicast addresses may have worked OK if the switch had IGMP snooping disabled globally or per vlan.


5.Threads count :


 Check the logs for any  Errors and Exceptions

 Check the status of instances & connection pools
 Check the CPU usage.
 Take the thread dump if possible and Analyze the thread dump
 Check with Other Subsystems
 Check with the DB team if any Issues related to Database.
6.Stack overflow:

 Checkout the Server logs as well as Out logs and also the access logs at the time of Stack Overflow Occurrence. Restart the instance if required

 Xss=.